[seL4] Re: shoehorn & fudge factor

3 Feb 2023


      On Thu, Feb 2, 2023 at 1:56 PM Kent Mcleod <kent.mcleod72@gmail.com> wrote:
...
On Fri, Feb 3, 2023 at 8:26 AM Sam Leffler via Devel <devel@sel4.systems>
wrote:
...
I have a target platform with only 4M of memory. When the system image is
generated and the shoehorn helper script is used to find a place in
memory
...
to load the build artifacts it tacks on an extra 4M of memory use (aka
fudge_factor). The comment in the code
<
https://github.com/AmbiML/sparrow-seL4_tools/blame/master/cmake-tool/helpers...
says this is to accommodate sel4test_driver. Needless to say this breaks
on
my 4M target platform. So I made the fudge-factor settable from the cmd
line with a default of 0 and changed the sel4test build glue to set 4M
when
building elfloader. Works fine for my target platform. But this change
breaks building a bootable image for rpi3 (AARCH64=1 bcm28367)--shoehorn
places elfloader s.t. it overlaps the image; e.g.
ELF-loader started on CPU: ARM Ltd. Cortex-A53 r0p4
...
paddr=[335000..51a0ff]
No DTB passed in from boot loader.
Looking for DTB in CPIO archive...found at 378778.
Loaded DTB from 378778.
   paddr=[237000..23afff]
ELF-loading image 'kernel' to 0
  paddr=[0..236fff]
  vaddr=[ffffff8000000000..ffffff8000236fff]
  virt_entry=ffffff8000000000
ELF-loading image 'capdl-loader' to 23b000
  paddr=[23b000..33bfff]
  vaddr=[400000..500fff]
  virt_entry=4009a8
ERROR: image load address overlaps with ELF-loader!
ERROR: Physical address range invalid
ERROR: Could not load user image ELF
Debug output of shoehorn for this case:
shoehorn: debug: found CPIO identifying sequence b'070701' at offset 0x40
...
in
/usr/local/google/home/sleffler/shodan/out/cantrip/aarch64-unknown-elf/release/elfloader/archive.o
...
shoehorn: debug: encountered CPIO entry name: kernel.elf
shoehorn: debug: encountered CPIO entry name: kernel.dtb
shoehorn: debug: encountered CPIO entry name: capdl-loader
shoehorn: debug: setting marker to 0x0 (region 0 start)
shoehorn: debug: setting marker to 0x237000 (kernel_end)
shoehorn: debug: setting marker to 0x23b000 (dtb_end)
shoehorn: debug: setting marker to 0x335000 (end of rootserver)
So two questions:
1. Where is the 4M under-count of sel4test_driver? (the code indicates
this
might be explained in JIRA SELFOUR-2335 but I couldn't locate it)
Here is the referred to Jira issue, but it doesn't provide any
additional context: https://sel4.atlassian.net/browse/SELFOUR-2335
shoehorn is attempting to calculate how the kernel and root server
binaries will be unpacked into memory in order to place the
elfloader's start address above the unpacked region. shoehorn
calculates the region by iterating over the PT_LOAD segments from each
ELF file. The elfloader then unpacks each ELF file at runtime by
iterating over the PT_LOAD segments.
For some reason, the two implementations don't agree. In your case,
the offline calculation expects that the root server is loaded from
[0x23b000, 0x335000) whereas the online calculation attempts:
[0x23b000, 0x33bfff). Are you able to print the segment headers for
the root server image you are loading?
I'm guessing (from quickly looking at the code) the issue is that the
shoehorn calculation only sums the p_memsz amounts for each PT_LOAD
segment and isn't taking into account any gaps between segments in the
virtual address space.
Yes, that appears to be the issue. readelf of capdl-loader shows:

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000
                 0x00000000000a9130 0x00000000000a9130  RWE    0x1000
  LOAD           0x0000000000000000 0x00000000004b0000 0x00000000004b0000
                 0x0000000000000000 0x0000000000050168  RW     0x1000

so there's a gap between the two load segments that isn't accounted for.
Attached is a change that seems to DTRT. It also appears to eliminate the
need for fudge_factor (in quick testing). You'll probably want to write
your own fix as my python fu is basic.
...
A fudge-factor wouldn't be needed if these two calculations weren't out of
sync.
...
2. Should zero'ing fudge_factor work? If yes, where should I look to
remedy
the above?
I looked upstream for changes that might address this issue but didn't
see
anything.
I suspect I can invert my logic and default fudge_factor to some value
and
then override as needed (e.g. 0 for my sparrow platform & 4M for sel4test
builds).
This seems fine to me.
...
-Sam
_______________________________________________
Devel mailing list -- devel@sel4.systems
To unsubscribe send an email to devel-leave@sel4.systems