Some questions regarding seL4, U-Boot, and Raspberry Pi 3 patches
Hello *, I hope you're all doing well. I recently got seL4 running on my Raspberry Pi 3, and wanted to report some notes here and ask some questions regarding the port, and perhaps get some help working on a patch (or twelve, I'm sure.) Recently I followed the Data61 blog post here to get a recent copy of seL4 on my RPi, but hit a few errors along the way that burned some time. In particular I got burned by the U-Boot glitch and getting seL4 building due to a glitch in the 'repo' setup for sel4test: https://research.csiro.au/ tsblog/sel4-raspberry-pi-3/ I ended up writing my own blog post, with effectively "fully reproducible" instructions (e.g. one of the codepad.org links on the RPi3 wiki page was dead) for a recent build. This includes proper setup for uboot.env in order to `fatload` and `bootm` the seL4 ELF binary off disk, the right U-Boot version, and using Docker to build the seL4 image: https://inner-haven.net/posts/2017-09-15-sel4-rpi3.html That's all fine, but... I got curious about this U-Boot glitch, as even a recent (git) copy of U-Boot still suffered from the issue reported in the original Data61 blog post. I found it strange this regression would exist for nearly 9 months, unless it was possibly something intentional (e.g. an intended behavioral change) or it went unreported. The glitch is this: after booting the seL4 binary, the system immediately hangs with no output after trying to start the sel4test application. This glitch did not occur in versions of U-Boot up-to v2017.01. Versions v2017.03 and later (to now) suffer. I decided to spend some time bisecting this problem and found the issue. Versions of U-Boot after v2017.01 include the following patch, which causes the data cache to be enabled prior to U-Boot loading and executing ELF binaries: https://github.com/thoughtpolice/u-boot/commit/ 995eab8b5b580b67394312b1621c60a71042cd18 This behavior previously existed as a workaround for QNX, which required that the dcache be disabled before execution. U-Boot, previously, disabled the dcache when booting *any* ELF binary at all (the binary can of course enable it or check to see if it isn't enabled already), as a simplistic workaround. This patch changes it to only disable the dcache on U-Boot images (uImages) marked as QNX ELF binaries, so that QNX systems boot properly still. It would seem then that seL4 -- or at least sel4test -- requires the dcache to be disabled upon entry for the RPi3. If the dcache is enabled, as in the above patch, it otherwise hangs immediately. In fact, a recent copy of U-Boot *can* execute seL4 properly: you simply have to use `./tools/mkimage` inside U-Boot to mark the application as `mkimage -C none -A arm -T kernel -O qnx -d <input> <output>` first -- this creates a uImage with the appropriate header to mark the ELF as a QNX binary. U-Boot recognizes this and appropriately disables the dcache before application entry, once you use `bootm` or something. I took the time to write a patch to U-Boot in order to add a new OS definition -- "sel4" -- in order to address this problem more clearly. My change is here: https://github.com/thoughtpolice/u-boot/commit/ 720caf6c398584658abfa883d4d390f4b6200269 With this change, you can mark any seL4 binary as `-O sel4` to create an appropriately bootable uImage, with the dcache appropriately disabled. Assuming this is in place, the following U-Boot commands will execute seL4: U-Boot> setenv bootfile uImage U-Boot> fatload mmc 0 ${loadaddr} ${bootfile} U-Boot> bootm ${loadaddr} ## Booting kernel from Legacy Image at 00200000 ... Image Name: Image Type: ARM seL4 Kernel Image (uncompressed) Data Size: 3385548 Bytes = 3.2 MiB Load Address: 00000000 Entry Point: 00000000 Verifying Checksum ... OK Loading Kernel Image ... OK CACHE: Misaligned operation at range [20000000, 200078bc] CACHE: Misaligned operation at range [200078bc, 2032a720] CACHE: Misaligned operation at range [2032c000, 2033c820] ## Starting application at 0x20000000 ... ELF-loader started on CPU: ARM Ltd. Cortex-A53 r0p4 paddr=[20000000..2033c81f] ELF-loading image 'kernel' paddr=[0..32fff] vaddr=[e0000000..e0032fff] virt_entry=e0000000 ELF-loading image 'sel4test-driver' paddr=[33000..44dfff] vaddr=[10000..42afff] virt_entry=2a03c Enabling MMU and paging Jumping to kernel-image entry point... ---- So with that all done, it's clear there's a change needed *somewhere* but I'm not sure where or what to do: 1) Is the above the right approach? Clearly, seL4 on the RPi3 assumed a disabled dcache during its development, but this has now changed, so for this platform something will need to be fixed, such as the above. 2) Or perhaps this is not a U-Boot problem, but should be fixed in seL4. Or in sel4test. For example, is there an option to assume the dcache is already enabled in the configuration? Maybe that should be toggled instead. Or does seL4 *always* assume it is disabled at boot? What about non-RPi3 platforms like other ARM systems? These likely use U-Boot as well, but it's unclear if the cache should be disabled here, too. In this case, the above U-Boot patch is (in theory) wrong, but if that's true, it was always wrong to begin with (as it always disabled the cache prior to ELF loading). If something like this was the case, then the rpi3 configurations (or perhaps some bootinfo code) needs to change. 3) Alternatively, this is just an outright bug in seL4 that should be fixed and U-Boot was previously masking it. (I'm not sure if the proofs cover this one :) 4) Perhaps it should be a mix of both. For example, perhaps the sel4/sel4test code should be changed to assume an already-enabled dcache, or fixed outright. But perhaps a U-Boot patch would still be good to submit upstream: for example, I could submit the above patch upstream, but *remove* the dcache code. This would allow U-Boot users to mark seL4 binaries appropriately for clarity and future compatibility and make U-Boot "officially support" seL4. And in the future, should seL4 need more patches/workarounds/requirements from U-Boot, it would be easier to add and maintain them. --- I'd prefer not to run myself into dead ends trying to get the wrong solution upstream to either seL4 or U-Boot, and it would be great to fix this for users. I'd also like to look into more adventurous stuff in the future (like multicore on the RPi3 which seems unsupported), so any advice is appreciated to get started. Even then: bisection and the patch creation only took an hour, so if it's wrong, not much was lost. -- Regards, Austin - PGP: 4096R/0x91384671
Hi Austin, Thanks for the detective work! seL4 (or more specifically in this case the elfloader-tool) should be able to tolerate being started with either the cache enabled or disabled. In fact if you look in the mmu code (https://github.com/seL4/seL4_tools/blob/master/elfloader-tool/src/arch-arm/a...) you should notice some rough logic of if cache enabled clean cache disable cache invalidate cache That code only operates on the architected caches though, and not on any platform specific cache controllers, although to my knowledge the rpi3 only has ARM architected caches. I would not be surprised if there were any errors in the dcache macro that walks through the cache levels though, which would be my guess as to where the problem is. I will create an issue internally to track this down, but will probably be a little while before anyone has time so if you're interested in fixing the elfloader then you're both welcome and encouraged to. Either way appreciate your efforts thus far. Adrian On Sun 17-Sep-2017 4:21 AM, Austin Seipp wrote: Hello *, I hope you're all doing well. I recently got seL4 running on my Raspberry Pi 3, and wanted to report some notes here and ask some questions regarding the port, and perhaps get some help working on a patch (or twelve, I'm sure.) Recently I followed the Data61 blog post here to get a recent copy of seL4 on my RPi, but hit a few errors along the way that burned some time. In particular I got burned by the U-Boot glitch and getting seL4 building due to a glitch in the 'repo' setup for sel4test: https://research.csiro.au/tsblog/sel4-raspberry-pi-3/ I ended up writing my own blog post, with effectively "fully reproducible" instructions (e.g. one of the codepad.org<http://codepad.org> links on the RPi3 wiki page was dead) for a recent build. This includes proper setup for uboot.env in order to `fatload` and `bootm` the seL4 ELF binary off disk, the right U-Boot version, and using Docker to build the seL4 image: https://inner-haven.net/posts/2017-09-15-sel4-rpi3.html That's all fine, but... I got curious about this U-Boot glitch, as even a recent (git) copy of U-Boot still suffered from the issue reported in the original Data61 blog post. I found it strange this regression would exist for nearly 9 months, unless it was possibly something intentional (e.g. an intended behavioral change) or it went unreported. The glitch is this: after booting the seL4 binary, the system immediately hangs with no output after trying to start the sel4test application. This glitch did not occur in versions of U-Boot up-to v2017.01. Versions v2017.03 and later (to now) suffer. I decided to spend some time bisecting this problem and found the issue. Versions of U-Boot after v2017.01 include the following patch, which causes the data cache to be enabled prior to U-Boot loading and executing ELF binaries: https://github.com/thoughtpolice/u-boot/commit/995eab8b5b580b67394312b1621c6... This behavior previously existed as a workaround for QNX, which required that the dcache be disabled before execution. U-Boot, previously, disabled the dcache when booting *any* ELF binary at all (the binary can of course enable it or check to see if it isn't enabled already), as a simplistic workaround. This patch changes it to only disable the dcache on U-Boot images (uImages) marked as QNX ELF binaries, so that QNX systems boot properly still. It would seem then that seL4 -- or at least sel4test -- requires the dcache to be disabled upon entry for the RPi3. If the dcache is enabled, as in the above patch, it otherwise hangs immediately. In fact, a recent copy of U-Boot *can* execute seL4 properly: you simply have to use `./tools/mkimage` inside U-Boot to mark the application as `mkimage -C none -A arm -T kernel -O qnx -d <input> <output>` first -- this creates a uImage with the appropriate header to mark the ELF as a QNX binary. U-Boot recognizes this and appropriately disables the dcache before application entry, once you use `bootm` or something. I took the time to write a patch to U-Boot in order to add a new OS definition -- "sel4" -- in order to address this problem more clearly. My change is here: https://github.com/thoughtpolice/u-boot/commit/720caf6c398584658abfa883d4d39... With this change, you can mark any seL4 binary as `-O sel4` to create an appropriately bootable uImage, with the dcache appropriately disabled. Assuming this is in place, the following U-Boot commands will execute seL4: U-Boot> setenv bootfile uImage U-Boot> fatload mmc 0 ${loadaddr} ${bootfile} U-Boot> bootm ${loadaddr} ## Booting kernel from Legacy Image at 00200000 ... Image Name: Image Type: ARM seL4 Kernel Image (uncompressed) Data Size: 3385548 Bytes = 3.2 MiB Load Address: 00000000 Entry Point: 00000000 Verifying Checksum ... OK Loading Kernel Image ... OK CACHE: Misaligned operation at range [20000000, 200078bc] CACHE: Misaligned operation at range [200078bc, 2032a720] CACHE: Misaligned operation at range [2032c000, 2033c820] ## Starting application at 0x20000000 ... ELF-loader started on CPU: ARM Ltd. Cortex-A53 r0p4 paddr=[20000000..2033c81f] ELF-loading image 'kernel' paddr=[0..32fff] vaddr=[e0000000..e0032fff] virt_entry=e0000000 ELF-loading image 'sel4test-driver' paddr=[33000..44dfff] vaddr=[10000..42afff] virt_entry=2a03c Enabling MMU and paging Jumping to kernel-image entry point... ---- So with that all done, it's clear there's a change needed *somewhere* but I'm not sure where or what to do: 1) Is the above the right approach? Clearly, seL4 on the RPi3 assumed a disabled dcache during its development, but this has now changed, so for this platform something will need to be fixed, such as the above. 2) Or perhaps this is not a U-Boot problem, but should be fixed in seL4. Or in sel4test. For example, is there an option to assume the dcache is already enabled in the configuration? Maybe that should be toggled instead. Or does seL4 *always* assume it is disabled at boot? What about non-RPi3 platforms like other ARM systems? These likely use U-Boot as well, but it's unclear if the cache should be disabled here, too. In this case, the above U-Boot patch is (in theory) wrong, but if that's true, it was always wrong to begin with (as it always disabled the cache prior to ELF loading). If something like this was the case, then the rpi3 configurations (or perhaps some bootinfo code) needs to change. 3) Alternatively, this is just an outright bug in seL4 that should be fixed and U-Boot was previously masking it. (I'm not sure if the proofs cover this one :) 4) Perhaps it should be a mix of both. For example, perhaps the sel4/sel4test code should be changed to assume an already-enabled dcache, or fixed outright. But perhaps a U-Boot patch would still be good to submit upstream: for example, I could submit the above patch upstream, but *remove* the dcache code. This would allow U-Boot users to mark seL4 binaries appropriately for clarity and future compatibility and make U-Boot "officially support" seL4. And in the future, should seL4 need more patches/workarounds/requirements from U-Boot, it would be easier to add and maintain them. --- I'd prefer not to run myself into dead ends trying to get the wrong solution upstream to either seL4 or U-Boot, and it would be great to fix this for users. I'd also like to look into more adventurous stuff in the future (like multicore on the RPi3 which seems unsupported), so any advice is appreciated to get started. Even then: bisection and the patch creation only took an hour, so if it's wrong, not much was lost. -- Regards, Austin - PGP: 4096R/0x91384671 _______________________________________________ Devel mailing list Devel@sel4.systems<mailto:Devel@sel4.systems> https://sel4.systems/lists/listinfo/devel
Hi Adrian, Thanks for the reply. I also got a reply on IRC about this. The extent of my understanding is: probably the bcm2837 cache code needs to be debugged, and also maybe written to some extent. (There is no L2 cache initialization code at all, for example, and it's unclear to me whether seL4 and/or ARM has some kind of generic L2 initialization). So there's maybe more to this than meets the eye. Although cleaning up/improving the bcm plat code and addressing this will probably take longer, it seems like this is the right thing to do. I'll spend some time digging the source to understand. Thank you for the pointer! I'd be interested in working on this in my limited time, and it does seem like elfloader is the right place to do this. In the mean time, do you have any suggestion for a workaround for affected users (such as Steven)? - Should we simply tell users to mark ELF seL4 executables as `qnx` with `mkimage` before loading? As of now, you must use `mkimage` in order to restore the previous U-Boot behavior exactly. But I don't think mkimage has ever been required before. This would probably be best to address on the wiki somewhere, I suppose. - Should I submit the upstream patch to U-Boot to disable d-cache for seL4 executables (and later, after fixing this, another patch to revert it for a later U-Boot release)? It's unclear to me how long this might take anyone to fix, including me, perhaps several weeks, in which case a clear upstream U-Boot workaround seems nicer (U-Boot releases frequently enough). However, given that I would write *another* patch to revert the dcache changes for a later U-Boot release, it would mostly be an "empty change", in the end. I suppose my thinking is: part of the problem is, in any case, you must use `mkimage` to get the right functionality as of now. But I don't think seL4 ever required this before on U-Boot systems: `elfloader` seems to be responsible for platform/bsp configuration, vaguely similar to U-Boot's purpose (I think), so it has not been needed. But `mkimage` seems to be what U-Boot works around. So perhaps from now on, "to be safe", it should be suggested to use `mkimage` for U-Boot platforms to package seL4 images (if you don't suggest it already). This would make seL4-specific fixes and cooperation easier, and in this case, #2 from above I think makes more sense -- in case you ever needed *other* platform workarounds. But if you do not want to make mkimage "necessary" for users, then a temporary workaround like #1 seems to make more sense, and is less of a burden -- no major doc updates, dealing with U-Boot upstream, etc. I guess it's a question of how robust/smooth you want the integration to be over time, especially if users will probably be running mixes of U-Boot and seL4 together, as happened with this bug. On Sun, Sep 17, 2017 at 8:49 PM, <Adrian.Danis@data61.csiro.au> wrote:
Hi Austin,
Thanks for the detective work!
seL4 (or more specifically in this case the elfloader-tool) should be able to tolerate being started with either the cache enabled or disabled. In fact if you look in the mmu code (https://github.com/seL4/seL4_ tools/blob/master/elfloader-tool/src/arch-arm/armv/armv7-a/32/mmu.S) you should notice some rough logic of
if cache enabled clean cache disable cache invalidate cache
That code only operates on the architected caches though, and not on any platform specific cache controllers, although to my knowledge the rpi3 only has ARM architected caches. I would not be surprised if there were any errors in the dcache macro that walks through the cache levels though, which would be my guess as to where the problem is.
I will create an issue internally to track this down, but will probably be a little while before anyone has time so if you're interested in fixing the elfloader then you're both welcome and encouraged to. Either way appreciate your efforts thus far.
Adrian
On Sun 17-Sep-2017 4:21 AM, Austin Seipp wrote:
Hello *,
I hope you're all doing well. I recently got seL4 running on my Raspberry Pi 3, and wanted to report some notes here and ask some questions regarding the port, and perhaps get some help working on a patch (or twelve, I'm sure.)
Recently I followed the Data61 blog post here to get a recent copy of seL4 on my RPi, but hit a few errors along the way that burned some time. In particular I got burned by the U-Boot glitch and getting seL4 building due to a glitch in the 'repo' setup for sel4test: https://research.csiro.au/tsblog/sel4-raspberry-pi-3/
I ended up writing my own blog post, with effectively "fully reproducible" instructions (e.g. one of the codepad.org links on the RPi3 wiki page was dead) for a recent build. This includes proper setup for uboot.env in order to `fatload` and `bootm` the seL4 ELF binary off disk, the right U-Boot version, and using Docker to build the seL4 image: https://inner-haven.net/posts/2017-09-15-sel4-rpi3.html
That's all fine, but...
I got curious about this U-Boot glitch, as even a recent (git) copy of U-Boot still suffered from the issue reported in the original Data61 blog post. I found it strange this regression would exist for nearly 9 months, unless it was possibly something intentional (e.g. an intended behavioral change) or it went unreported.
The glitch is this: after booting the seL4 binary, the system immediately hangs with no output after trying to start the sel4test application. This glitch did not occur in versions of U-Boot up-to v2017.01. Versions v2017.03 and later (to now) suffer.
I decided to spend some time bisecting this problem and found the issue. Versions of U-Boot after v2017.01 include the following patch, which causes the data cache to be enabled prior to U-Boot loading and executing ELF binaries:
https://github.com/thoughtpolice/u-boot/commit/995eab8b5b580 b67394312b1621c60a71042cd18
This behavior previously existed as a workaround for QNX, which required that the dcache be disabled before execution. U-Boot, previously, disabled the dcache when booting *any* ELF binary at all (the binary can of course enable it or check to see if it isn't enabled already), as a simplistic workaround. This patch changes it to only disable the dcache on U-Boot images (uImages) marked as QNX ELF binaries, so that QNX systems boot properly still.
It would seem then that seL4 -- or at least sel4test -- requires the dcache to be disabled upon entry for the RPi3. If the dcache is enabled, as in the above patch, it otherwise hangs immediately.
In fact, a recent copy of U-Boot *can* execute seL4 properly: you simply have to use `./tools/mkimage` inside U-Boot to mark the application as `mkimage -C none -A arm -T kernel -O qnx -d <input> <output>` first -- this creates a uImage with the appropriate header to mark the ELF as a QNX binary. U-Boot recognizes this and appropriately disables the dcache before application entry, once you use `bootm` or something.
I took the time to write a patch to U-Boot in order to add a new OS definition -- "sel4" -- in order to address this problem more clearly. My change is here:
https://github.com/thoughtpolice/u-boot/commit/720caf6c39858 4658abfa883d4d390f4b6200269
With this change, you can mark any seL4 binary as `-O sel4` to create an appropriately bootable uImage, with the dcache appropriately disabled.
Assuming this is in place, the following U-Boot commands will execute seL4:
U-Boot> setenv bootfile uImage U-Boot> fatload mmc 0 ${loadaddr} ${bootfile} U-Boot> bootm ${loadaddr} ## Booting kernel from Legacy Image at 00200000 ... Image Name: Image Type: ARM seL4 Kernel Image (uncompressed) Data Size: 3385548 Bytes = 3.2 MiB Load Address: 00000000 Entry Point: 00000000 Verifying Checksum ... OK Loading Kernel Image ... OK CACHE: Misaligned operation at range [20000000, 200078bc] CACHE: Misaligned operation at range [200078bc, 2032a720] CACHE: Misaligned operation at range [2032c000, 2033c820] ## Starting application at 0x20000000 ...
ELF-loader started on CPU: ARM Ltd. Cortex-A53 r0p4 paddr=[20000000..2033c81f] ELF-loading image 'kernel' paddr=[0..32fff] vaddr=[e0000000..e0032fff] virt_entry=e0000000 ELF-loading image 'sel4test-driver' paddr=[33000..44dfff] vaddr=[10000..42afff] virt_entry=2a03c Enabling MMU and paging Jumping to kernel-image entry point...
----
So with that all done, it's clear there's a change needed *somewhere* but I'm not sure where or what to do:
1) Is the above the right approach? Clearly, seL4 on the RPi3 assumed a disabled dcache during its development, but this has now changed, so for this platform something will need to be fixed, such as the above.
2) Or perhaps this is not a U-Boot problem, but should be fixed in seL4. Or in sel4test. For example, is there an option to assume the dcache is already enabled in the configuration? Maybe that should be toggled instead. Or does seL4 *always* assume it is disabled at boot? What about non-RPi3 platforms like other ARM systems? These likely use U-Boot as well, but it's unclear if the cache should be disabled here, too. In this case, the above U-Boot patch is (in theory) wrong, but if that's true, it was always wrong to begin with (as it always disabled the cache prior to ELF loading).
If something like this was the case, then the rpi3 configurations (or perhaps some bootinfo code) needs to change.
3) Alternatively, this is just an outright bug in seL4 that should be fixed and U-Boot was previously masking it. (I'm not sure if the proofs cover this one :)
4) Perhaps it should be a mix of both. For example, perhaps the sel4/sel4test code should be changed to assume an already-enabled dcache, or fixed outright. But perhaps a U-Boot patch would still be good to submit upstream: for example, I could submit the above patch upstream, but *remove* the dcache code. This would allow U-Boot users to mark seL4 binaries appropriately for clarity and future compatibility and make U-Boot "officially support" seL4. And in the future, should seL4 need more patches/workarounds/requirements from U-Boot, it would be easier to add and maintain them.
---
I'd prefer not to run myself into dead ends trying to get the wrong solution upstream to either seL4 or U-Boot, and it would be great to fix this for users. I'd also like to look into more adventurous stuff in the future (like multicore on the RPi3 which seems unsupported), so any advice is appreciated to get started. Even then: bisection and the patch creation only took an hour, so if it's wrong, not much was lost.
-- Regards, Austin - PGP: 4096R/0x91384671
_______________________________________________ Devel mailing listDevel@sel4.systemshttps://sel4.systems/lists/listinfo/devel
-- Regards, Austin - PGP: 4096R/0x91384671
Hi Austin, In terms of workarounds right now you can obviously still use older U-boot, or use the `mkimage` as suggested by you. For the medium term I am in favor of telling people to use `mkimage`, as it's also available right now and there's no 'clean up' to be done once there is a proper fix. For the longer term fix the first question to be asked is whether or not the bcm2837 has a non standard cache controller and needs special platform specific code. If it does, then naturally this needs to be written, both in the elfloader and in the kernel. If it doesn't then the existing cache code needs to be debugged. Ideally the elfloader (and as a consequence the kernel) should function correctly regardless of the choice of bootloader (assuming the bootloader can fulfill the expectations of the elfloader), be it various versions of U-boot or even some other 'wacky' loading scheme such as kexec from Linux. Of course the contract between the elfloader and its loader is currently not explicit and has largely evolved as "the elfloader expects the environment that is given by u-boot", which is what leads to these problems. Adrian On Thu 21-Sep-2017 5:06 AM, Austin Seipp wrote:
Hi Adrian,
Thanks for the reply. I also got a reply on IRC about this. The extent of my understanding is: probably the bcm2837 cache code needs to be debugged, and also maybe written to some extent. (There is no L2 cache initialization code at all, for example, and it's unclear to me whether seL4 and/or ARM has some kind of generic L2 initialization). So there's maybe more to this than meets the eye. Although cleaning up/improving the bcm plat code and addressing this will probably take longer, it seems like this is the right thing to do. I'll spend some time digging the source to understand. Thank you for the pointer!
I'd be interested in working on this in my limited time, and it does seem like elfloader is the right place to do this. In the mean time, do you have any suggestion for a workaround for affected users (such as Steven)?
- Should we simply tell users to mark ELF seL4 executables as `qnx` with `mkimage` before loading? As of now, you must use `mkimage` in order to restore the previous U-Boot behavior exactly. But I don't think mkimage has ever been required before. This would probably be best to address on the wiki somewhere, I suppose.
- Should I submit the upstream patch to U-Boot to disable d-cache for seL4 executables (and later, after fixing this, another patch to revert it for a later U-Boot release)? It's unclear to me how long this might take anyone to fix, including me, perhaps several weeks, in which case a clear upstream U-Boot workaround seems nicer (U-Boot releases frequently enough). However, given that I would write *another* patch to revert the dcache changes for a later U-Boot release, it would mostly be an "empty change", in the end.
I suppose my thinking is: part of the problem is, in any case, you must use `mkimage` to get the right functionality as of now. But I don't think seL4 ever required this before on U-Boot systems: `elfloader` seems to be responsible for platform/bsp configuration, vaguely similar to U-Boot's purpose (I think), so it has not been needed. But `mkimage` seems to be what U-Boot works around. So perhaps from now on, "to be safe", it should be suggested to use `mkimage` for U-Boot platforms to package seL4 images (if you don't suggest it already). This would make seL4-specific fixes and cooperation easier, and in this case, #2 from above I think makes more sense -- in case you ever needed *other* platform workarounds. But if you do not want to make mkimage "necessary" for users, then a temporary workaround like #1 seems to make more sense, and is less of a burden -- no major doc updates, dealing with U-Boot upstream, etc.
I guess it's a question of how robust/smooth you want the integration to be over time, especially if users will probably be running mixes of U-Boot and seL4 together, as happened with this bug.
On Sun, Sep 17, 2017 at 8:49 PM, <Adrian.Danis@data61.csiro.au> wrote:
Hi Austin,
Thanks for the detective work!
seL4 (or more specifically in this case the elfloader-tool) should be able to tolerate being started with either the cache enabled or disabled. In fact if you look in the mmu code (https://github.com/seL4/seL4_ tools/blob/master/elfloader-tool/src/arch-arm/armv/armv7-a/32/mmu.S) you should notice some rough logic of
if cache enabled clean cache disable cache invalidate cache
That code only operates on the architected caches though, and not on any platform specific cache controllers, although to my knowledge the rpi3 only has ARM architected caches. I would not be surprised if there were any errors in the dcache macro that walks through the cache levels though, which would be my guess as to where the problem is.
I will create an issue internally to track this down, but will probably be a little while before anyone has time so if you're interested in fixing the elfloader then you're both welcome and encouraged to. Either way appreciate your efforts thus far.
Adrian
On Sun 17-Sep-2017 4:21 AM, Austin Seipp wrote:
Hello *,
I hope you're all doing well. I recently got seL4 running on my Raspberry Pi 3, and wanted to report some notes here and ask some questions regarding the port, and perhaps get some help working on a patch (or twelve, I'm sure.)
Recently I followed the Data61 blog post here to get a recent copy of seL4 on my RPi, but hit a few errors along the way that burned some time. In particular I got burned by the U-Boot glitch and getting seL4 building due to a glitch in the 'repo' setup for sel4test: https://research.csiro.au/tsblog/sel4-raspberry-pi-3/
I ended up writing my own blog post, with effectively "fully reproducible" instructions (e.g. one of the codepad.org links on the RPi3 wiki page was dead) for a recent build. This includes proper setup for uboot.env in order to `fatload` and `bootm` the seL4 ELF binary off disk, the right U-Boot version, and using Docker to build the seL4 image: https://inner-haven.net/posts/2017-09-15-sel4-rpi3.html
That's all fine, but...
I got curious about this U-Boot glitch, as even a recent (git) copy of U-Boot still suffered from the issue reported in the original Data61 blog post. I found it strange this regression would exist for nearly 9 months, unless it was possibly something intentional (e.g. an intended behavioral change) or it went unreported.
The glitch is this: after booting the seL4 binary, the system immediately hangs with no output after trying to start the sel4test application. This glitch did not occur in versions of U-Boot up-to v2017.01. Versions v2017.03 and later (to now) suffer.
I decided to spend some time bisecting this problem and found the issue. Versions of U-Boot after v2017.01 include the following patch, which causes the data cache to be enabled prior to U-Boot loading and executing ELF binaries:
https://github.com/thoughtpolice/u-boot/commit/995eab8b5b580 b67394312b1621c60a71042cd18
This behavior previously existed as a workaround for QNX, which required that the dcache be disabled before execution. U-Boot, previously, disabled the dcache when booting *any* ELF binary at all (the binary can of course enable it or check to see if it isn't enabled already), as a simplistic workaround. This patch changes it to only disable the dcache on U-Boot images (uImages) marked as QNX ELF binaries, so that QNX systems boot properly still.
It would seem then that seL4 -- or at least sel4test -- requires the dcache to be disabled upon entry for the RPi3. If the dcache is enabled, as in the above patch, it otherwise hangs immediately.
In fact, a recent copy of U-Boot *can* execute seL4 properly: you simply have to use `./tools/mkimage` inside U-Boot to mark the application as `mkimage -C none -A arm -T kernel -O qnx -d <input> <output>` first -- this creates a uImage with the appropriate header to mark the ELF as a QNX binary. U-Boot recognizes this and appropriately disables the dcache before application entry, once you use `bootm` or something.
I took the time to write a patch to U-Boot in order to add a new OS definition -- "sel4" -- in order to address this problem more clearly. My change is here:
https://github.com/thoughtpolice/u-boot/commit/720caf6c39858 4658abfa883d4d390f4b6200269
With this change, you can mark any seL4 binary as `-O sel4` to create an appropriately bootable uImage, with the dcache appropriately disabled.
Assuming this is in place, the following U-Boot commands will execute seL4:
U-Boot> setenv bootfile uImage U-Boot> fatload mmc 0 ${loadaddr} ${bootfile} U-Boot> bootm ${loadaddr} ## Booting kernel from Legacy Image at 00200000 ... Image Name: Image Type: ARM seL4 Kernel Image (uncompressed) Data Size: 3385548 Bytes = 3.2 MiB Load Address: 00000000 Entry Point: 00000000 Verifying Checksum ... OK Loading Kernel Image ... OK CACHE: Misaligned operation at range [20000000, 200078bc] CACHE: Misaligned operation at range [200078bc, 2032a720] CACHE: Misaligned operation at range [2032c000, 2033c820] ## Starting application at 0x20000000 ...
ELF-loader started on CPU: ARM Ltd. Cortex-A53 r0p4 paddr=[20000000..2033c81f] ELF-loading image 'kernel' paddr=[0..32fff] vaddr=[e0000000..e0032fff] virt_entry=e0000000 ELF-loading image 'sel4test-driver' paddr=[33000..44dfff] vaddr=[10000..42afff] virt_entry=2a03c Enabling MMU and paging Jumping to kernel-image entry point...
----
So with that all done, it's clear there's a change needed *somewhere* but I'm not sure where or what to do:
1) Is the above the right approach? Clearly, seL4 on the RPi3 assumed a disabled dcache during its development, but this has now changed, so for this platform something will need to be fixed, such as the above.
2) Or perhaps this is not a U-Boot problem, but should be fixed in seL4. Or in sel4test. For example, is there an option to assume the dcache is already enabled in the configuration? Maybe that should be toggled instead. Or does seL4 *always* assume it is disabled at boot? What about non-RPi3 platforms like other ARM systems? These likely use U-Boot as well, but it's unclear if the cache should be disabled here, too. In this case, the above U-Boot patch is (in theory) wrong, but if that's true, it was always wrong to begin with (as it always disabled the cache prior to ELF loading).
If something like this was the case, then the rpi3 configurations (or perhaps some bootinfo code) needs to change.
3) Alternatively, this is just an outright bug in seL4 that should be fixed and U-Boot was previously masking it. (I'm not sure if the proofs cover this one :)
4) Perhaps it should be a mix of both. For example, perhaps the sel4/sel4test code should be changed to assume an already-enabled dcache, or fixed outright. But perhaps a U-Boot patch would still be good to submit upstream: for example, I could submit the above patch upstream, but *remove* the dcache code. This would allow U-Boot users to mark seL4 binaries appropriately for clarity and future compatibility and make U-Boot "officially support" seL4. And in the future, should seL4 need more patches/workarounds/requirements from U-Boot, it would be easier to add and maintain them.
---
I'd prefer not to run myself into dead ends trying to get the wrong solution upstream to either seL4 or U-Boot, and it would be great to fix this for users. I'd also like to look into more adventurous stuff in the future (like multicore on the RPi3 which seems unsupported), so any advice is appreciated to get started. Even then: bisection and the patch creation only took an hour, so if it's wrong, not much was lost.
-- Regards, Austin - PGP: 4096R/0x91384671
_______________________________________________ Devel mailing listDevel@sel4.systemshttps://sel4.systems/lists/listinfo/devel
participants (2)
-
Adrian.Danis@data61.csiro.au
-
Austin Seipp