I'm trying to build a low memory footprint of RISC-V based seL4. I'm using QEMU/spike for now. In seL4 12.0 I can modify spike.dts to 8.5 MB of memory 0x00880000. 42 memory@80000000 { 43 device_type = "memory"; 44 reg = <0x00000000 0x80000000 0x00000000 0x00880000>; 45 }; This is the smallest value that builds, lower than this I get the error: [0/1] Re-running CMake... -- Found GCC with prefix riscv64-unknown-linux-gnu- STATUS,KernelDTSList: tools/dts/spike.dts -- /mnt/data/femur_12.0/build/kernel/gen_headers/plat/machine/devices_gen.h is out of date. Regenerating... STATUS,platform_yaml: /mnt/data/femur_12.0/build/kernel/gen_headers/plat/machine/platform_gen.yaml -- Detecting cached version of: musllibc -- Found valid cache entry for musllibc -- Configuring done -- Generating done -- Build files have been written to: /mnt/data/femur_12.0/build [18/36] Generating gen_headers/image_start_addr.h FAILED: elfloader-tool/gen_headers/image_start_addr.h cd /mnt/data/femur_12.0/build/elfloader-tool && sh -c "/mnt/data/femur_12.0/tools/elfloader-tool/../cmake-tool/helpers/shoehorn.py /mnt/data/femur_12.0/build/kernel/gen_headers/plat/machine/platform_gen.yaml /mnt/data/femur_12.0/build/elfloader-tool/archive.o > /mnt/data/femur_12.0/build/elfloader-tool/gen_headers//image_start_addr.h" shoehorn: fatal error: ELF-loader image "/mnt/data/femur_12.0/build/elfloader-tool/archive.o" of size 0x107548 does not fit within any memory region described in "{'devices': [{'end': 2147483648, 'start': 0}, {'end': 549755809792, 'start': 2155872256}], 'memory': [{'end': 2155872256, 'start': 2149580800}]}" ninja: build stopped: subcommand failed. What do I need to modify to get to < 2 MB? Thanks!
Hi there! At 2020-12-11T21:29:05-0000, porter.188@osu.edu wrote:
I'm trying to build a low memory footprint of RISC-V based seL4. I'm using QEMU/spike for now.
In seL4 12.0 I can modify spike.dts to 8.5 MB of memory 0x00880000.
42 memory@80000000 { 43 device_type = "memory"; 44 reg = <0x00000000 0x80000000 0x00000000 0x00880000>; 45 };
This is the smallest value that builds, lower than this I get the error: [...] What do I need to modify to get to < 2 MB? [...] shoehorn: fatal error: ELF-loader image "/mnt/data/femur_12.0/build/elfloader-tool/archive.o" of size 0x107548 does not fit within any memory region described in "{'devices': [{'end': 2147483648, 'start': 0}, {'end': 549755809792, 'start': 2155872256}], 'memory': [{'end': 2155872256, 'start': 2149580800}]}" ninja: build stopped: subcommand failed.
With a bit of hex-to-decimal arithmetic and sorting, maybe the complaint will become more obvious. The archive is 1078600 bytes in decimal, a little over 1 megabyte. Your memory map looks like this: 000000000000 - 002147483648: (2GB) device memory 002149580800 - 002155872256: physical memory 002155872256 - 549755809792: device memory The device memory is not backed by real storage, or at least cannot be used that way by the boot loader, ELF-loader, or operating system. Your physical memory window lies between 2149580800 and 2155872256. The difference is 6291456. More readably, 6 291 456. So, a bit over 6 megabytes. What are the demands on physical memory at the time the ELF-loader is running? There is the ELF-loader itself, and the payload it is trying to unpack. That payload typically consists of the seL4 kernel itself, a device tree blob/binary (DTB) that the kernel will parse as part of its initialization, and the root server, analogous to the Unix init process. Moreover, the contents of the archive will, once it is unpacked, briefly be in memory twice: first in the archive itself, and as its unpacked contents with the kernel, device tree, and root server mapped to their "final" locations. These objects have to be page aligned, which is not usually a big deal, but they also have to fit within the existing state of memory usage at the time the ELF-loader runs, and the ELF-loader was loaded into memory by the boot loader. The ELF-loader strives to be simple and the shoehorn tool, pessimistic; that is, it assumes a worst-case unpacking scenario with maximal duplication. You might be able to get some space wins by studying the former, carefully planning memory usage, adjusting the boot loader to put the ELF-loader in a convenient place (perhaps arranging so that parts of it get overwritten by the unpacking payload after the last point at which an ELF-loader page will be used). I think there is probably not an easy answer to your question, no knob you can fiddle to get things packed in tighter. Remember that the size of the kernel plus DTB plus rootserver is an absolute lower bound on the amount of physical memory you have to configure. In practice the limit will be higher; any loader program will take up non-zero space, and any loader program that is clever enough to unpack the CPIO file archive.o in chunks so that it can unmap pages of it as it goes will be more complex and therefore larger. Optimization is an iterative process that requires study of the running system's actual needs and aggressive culling of anything that isn't used. To truly get a minimal memory footprint at boot, you might need to write your own bootloader, and engineer your own solution for unpacking the kernel+DTB+rootserver in a way that has minimal overhead. Let me know if this helps, or doesn't. Regards, Branden
Thanks Branden! This is really helpful. I think based on this I know that I need to get my system below my target size and account for the elf-loader as well. I see all of these details in the shoehorn.py script as you described. I also see the unpacking process in common.c of elfloader-tool. It seems like my biggest immediate gains will be in modifying the rootserver--which if I've understood this correctly is my project. In this case rootserver is a simple "Hello, world!" application. Thanks for this great response. I think I can calculate the memory map for any seL4 system built using these standard scripts now. That is very helpful to me. Jeremy
Hi, Here's what I'm doing to run seL4 on an Ariane processor (64 bit) with 16 MB RAM (and some internal BRAM from the FPGA). So far I've only tried the CAmkES adder example. This is the memory layout: 0x80000000: 256K BRAM for OpenSBI 0x80200000: 256K BRAM for ELF-Loader 0x84000000: 16M external RAM for seL4 & servers The BRAM sizes could be reduced to 128kB and 64kB. By default, the final image created by the build system includes the CPIO archive with the DTB and the root server. I modified it to not link the generated archive.o file. Instead, I read the CPIO archive from a memory-mapped flash. This allows me to place the ELF-Loader in a BRAM instead of the SRAM and gives more freedom over the memory layout. https://github.com/seL4-drone-project/seL4_tools/tree/seL4-drone-project There are 4 options in elfloader-tool/CMakeLists.txt * ElfloaderExistingArchive * ElfloaderRelocateArchive * ElfloaderArchiveStart * ElfloaderArchiveEnd Note that there's an option to copy the flash memory to the RAM. This is only there as a workaround for a bug in my FPGA design. shoehorn.py will still complain, but you can disable it by setting IMAGE_START_ADDR. The value doesn't matter because it's not used for RISC-V builds. I set it to 0x80200000. These are the memories in the DTS: memory@80000000 { device_type = "memory"; reg = <0x0 0x80000000 0x0 0x00040000>; }; memory@80200000 { device_type = "memory"; reg = <0x0 0x80200000 0x0 0x00040000>; }; memory@84000000 { device_type = "memory"; reg = <0x0 0x84000000 0x0 0x01000000>; }; hardware_gen.py in the seL4 repository adds an offset for the reserved bootloader section to the first memory, which causes an assertion to fail (start < end) in seL4, so I don't include it in the generated header file: https://github.com/seL4-drone-project/seL4/commit/3a7966ca919157be43e8e1d291... capdl-loader reports 10MB of untyped memory. It probably won't get you to 2MB, but maybe it helps.
I still have some questions regarding this project: I'm still a little confused about the starting point: 0x80200000. I see that I need to add 0x80200000 to the image size/memory usage to get the end point. One of the builds the end point is 0x80396F48. That's about 3.58 MB. Am I understanding correctly that there is 2 MB reserved for the BBL and then another ~1.6 MB for the image/application? Is it possible to move this up in memory? That means give the BBL less than 2 MB and put the payload after that before 0x80200000? I did give this a try and ran into alignment issues. So I don't know if this is possible. I've made some progress on my efforts to reduce the memory footprint (< 2 MB). Here are some items I've found: 1. In shoehorn.py lines 207-209 (https://github.com/seL4/seL4_tools/blob/2f97243012afb42975d29a24752675bf68f0...) add 4 MB for seL4test. These lines can be commented and this immediately reduced the footprint from ~7 MB default, to ~3.5 MB. 2. The items in the archive.o are the kernel.elf, kernel.dtb, and hello.elf (rootserver). The two elf files can be stripped after the build and this reduces the footprint further. Additionally, the elfloader can be stripped before putting this into Berkeley Boot Loader for RISC-V. This again reduces the footprint a little. 3. I've noticed that the memory footprint of hello the rootserver application, takes about 1.18 MB. This lines up exactly with what is in the ELF files PT_LOAD mem_sz fields.
The 2MiB large page used for BBL is protected by Physical Memory Protection, preventing any part of that region of physical memory being used by supervisor or user modes. Get Outlook for Android<https://aka.ms/ghei36> ________________________________ From: porter.188@osu.edu <porter.188@osu.edu> Sent: Tuesday, January 19, 2021 9:00:49 AM To: devel@sel4.systems <devel@sel4.systems> Subject: [seL4] Re: Reducing memory footprint - RISC-V I still have some questions regarding this project: I'm still a little confused about the starting point: 0x80200000. I see that I need to add 0x80200000 to the image size/memory usage to get the end point. One of the builds the end point is 0x80396F48. That's about 3.58 MB. Am I understanding correctly that there is 2 MB reserved for the BBL and then another ~1.6 MB for the image/application? Is it possible to move this up in memory? That means give the BBL less than 2 MB and put the payload after that before 0x80200000? I did give this a try and ran into alignment issues. So I don't know if this is possible. I've made some progress on my efforts to reduce the memory footprint (< 2 MB). Here are some items I've found: 1. In shoehorn.py lines 207-209 (https://github.com/seL4/seL4_tools/blob/2f97243012afb42975d29a24752675bf68f0...) add 4 MB for seL4test. These lines can be commented and this immediately reduced the footprint from ~7 MB default, to ~3.5 MB. 2. The items in the archive.o are the kernel.elf, kernel.dtb, and hello.elf (rootserver). The two elf files can be stripped after the build and this reduces the footprint further. Additionally, the elfloader can be stripped before putting this into Berkeley Boot Loader for RISC-V. This again reduces the footprint a little. 3. I've noticed that the memory footprint of hello the rootserver application, takes about 1.18 MB. This lines up exactly with what is in the ELF files PT_LOAD mem_sz fields. _______________________________________________ Devel mailing list -- devel@sel4.systems To unsubscribe send an email to devel-leave@sel4.systems
Is there any way around that? I can modify BBL to put the payload at a lower address than 0x80200000 still properly page aligned, just not on the mega page size. Also, the elfloader in seL4 relies on this alignment. I suppose that would need to be rewritten as well. Or is that this is not possible at all? I need to run something in less than 2 MB if possible. If BBL doesn't do it, then what is the alternative? Do I write my own bootloader?
On Tue, 2021-01-19 at 18:32 +0000, porter.188@osu.edu wrote:
Is there any way around that? I can modify BBL to put the payload at a lower address than 0x80200000 still properly page aligned, just not on the mega page size. Also, the elfloader in seL4 relies on this alignment. I suppose that would need to be rewritten as well.
Or is that this is not possible at all?
A few contributing factors to what is going on here: 1) BBL Will always align an integrated payload to a large page (2MiB on rv39 and rv48) with the next boundary after the start of BBL itself being 0x80200000. https://github.com/riscv/riscv-pk/blob/master/bbl/payload.S#L12 2) The elfloader has a hardcoded base to start at 0x80400000, but is actually loaded at 0x80200000, Due to risc-v being generally position independent, this doesn't appear to break anything. https://github.com/seL4/seL4_tools/blob/master/elfloader-tool/src/arch-riscv... 3) The generated memory regions for seL4 skip the first mega page entirely, assuming it is only used for BBL (although the hardware_gen.py script appears to assume that this size is always 2MiB when it is actually 4MiB for rv32...) https://github.com/seL4/seL4/blob/master/tools/hardware/config.py#L45-L51 4) The rv32 seL4 kernel requires that the kernel image falls entirely within a single 2MiB page, for rv64 it must vall within a single 1GiB region with the way that the self-mapping is implemented. 5) The kernel defines a KERNEL_ELF_PADDR_BASE and KERNEL_ELF_BASE which indicate the start of the kernel image in physical and virtual memory respectively. This is placed 64MiB after the start of physical memory (and I am unsure as to why such a large region is left). https://github.com/seL4/seL4/blob/master/include/arch/riscv/arch/32/mode/har... https://github.com/seL4/seL4/blob/master/include/arch/riscv/arch/64/mode/har... 6) The elfloader requires that its own ELF is aligned to 2MiB (even for rv32). https://github.com/seL4/seL4_tools/blob/master/elfloader-tool/src/arch-riscv... 7) The elfloader also requires that it falls within the boundary of a single page table of large pages (i.e. in a single 4GiB region on 32bit or 1GiB region on 64bit). 8) BBL enables PMP protection of its own image excluding its own payload. Given that the BBL itself only use and protect 76KiB at the start of memory, we could be using everything immediately after that. BBL could be modified to place its payload immediately after itself in physical memory and if that payload is position-independent, as elfloader actually appears to be, that would not be an issue. The elfloader would still need to be modified such that when it maps *itself* into virtual memory, it accounts for any misalignment rather than simply throwing an error. This change would not be all that difficult. The kernel ELF then either needs to be placed after the BBL payload *OR* the elfloader needs to be architected such that the kernel location overlapping with its own payload is not an issue. The latter option also removes an apparent upper limit on the size of the initial task image of about 64MiB. Another alternative may be to allow the kernel not to have a statically assigned physical base address. Ultimately it seems that you would at least need to modify BBL to prevent it from putting any payload at the 2MiB boundary. You'd then need to at least modify the mapping routines in the elfloader to deal with finer-grained alignment when remapping itself. To then fit the kernel into that 2MiB region along with the boot loader and the elf loader, you'd then need to modify the static physical address the kernel assumes for itself or enable it to be set dynamically. This former option would be straightforward but would also limit the size of the user image as the entire elfloader payload needs to fit below the base kernel ELF address.
I need to run something in less than 2 MB if possible. If BBL doesn't do it, then what is the alternative? Do I write my own bootloader?
From what I can see, with minimal patching to BBL, some modifications to elfloader, and adjusting the constant definition in seL4, you should be able to fit everything into 2MiB if your initial task is small enough. The most difficult part there would be the elfloader modifications. I'm not certain how much of that would be accepted upstream for any of those repositories however. Hope all of this helps. Cheers, Curtis
I've been working through these changes in BBL and the elfloader. The BBL as mentioned seems to be easy (though I'm not ruling out issues here). The elfloader is a little more problematic. I have made some modifications there, but they result in a lot of page table entries that seems erroneous. At the point where the virtual memory translations are committed (writing to satp register) it hangs and cannot access memory at the next instruction's address. Here is what I've done to map_kernel_window in boot.c of the elfloader: #define NEW_PT_LEVEL_2_BITS = 17 //instead of 21 65 void map_kernel_window(struct image_info *kernel_info) 66 { 67 uint32_t index; 68 unsigned long *lpt; 69 70 /* Map the elfloader into the new address space */ 71 printf("_text: %x\n", (uintptr_t)_text); 72 index = GET_PT_INDEX((uintptr_t)_text, PT_LEVEL_1); 73 printf("73 index: %x\n", index); 74 75 #if __riscv_xlen == 32 76 lpt = l1pt; 77 #else 78 lpt = l2pt_elf; 79 l1pt[index] = PTE_CREATE_NEXT((uintptr_t)l2pt_elf); 80 index = GET_PT_INDEX((uintptr_t)_text, PT_LEVEL_2); 81 printf("81 index: %x\n", index); 82 #endif 83 84 if (IS_ALIGNED((uintptr_t)_text, NEW_PT_LEVEL_2_BITS)) { 85 for (int page = 0; index < PTES_PER_PT; index++, page++) { 86 lpt[index] = PTE_CREATE_LEAF((uintptr_t)_text + 87 (page << NEW_PT_LEVEL_2_BITS)); 88 printf("_text: %x\n", (uintptr_t)_text); 89 printf("lpt[index]: %x\n", lpt[index]); 90 printf("page: %x\n", page); 91 printf("index: %x\n", index); 92 } 93 } else { 94 printf("Elfloader not properly aligned\n"); 95 abort(); 96 } 97 98 /* Map the kernel into the new address space */ 99 printf("kernel_info->virt_region_start: %x\n", kernel_info->virt_region_start); 100 index = GET_PT_INDEX(kernel_info->virt_region_start, PT_LEVEL_1); 101 printf("101 index: %x\n", index); 102 103 #if __riscv_xlen == 64 104 lpt = l2pt; 105 l1pt[index] = PTE_CREATE_NEXT((uintptr_t)l2pt); 106 index = GET_PT_INDEX(kernel_info->virt_region_start, PT_LEVEL_2); 107 printf("107 index: %x\n", index); 108 #endif 109 if (VIRT_PHYS_ALIGNED(kernel_info->virt_region_start, 110 kernel_info->phys_region_start, NEW_PT_LEVEL_2_BITS)) { 111 for (int page = 0; index < PTES_PER_PT; index++, page++) { 112 lpt[index] = PTE_CREATE_LEAF(kernel_info->phys_region_start + 113 (page << NEW_PT_LEVEL_2_BITS)); 114 printf("kernel_info->virt_region_start: %x\n", kernel_info->virt_region_start); 115 printf("lpt[index]: %x\n", lpt[index]); 116 printf("page: %x\n", page); 117 printf("117 index: %x\n", index); 118 } 119 } else { 120 printf("Kernel not properly aligned\n"); 121 abort(); 122 } 123 printf("l1pt: %x\n", l1pt); 124 } Can anyone see what I've done wrong here?
participants (4)
-
Alexander Fasching
-
G. Branden Robinson
-
Millar, Curtis (Data61, Kensington NSW)
-
porter.188@osu.edu