Hello Christian, On 2026-01-13 16:16, Christian Bruel via Devel wrote:
I would like to share an issue found with the port of seL4 and VMs to the STM32MP2 SoC, which features Armv8-A dual Cortex-A35 cores: When activating the secondary core, The RET instruction crashes while executing the arm_enable_mmu(). The exception stack and ESR register show no useful information.
Thanks for sharing this. We have a similar issue in 32-bit mode on the zynqmp, but there it only fails sometimes. Does it consistently fail for you or just sometimes? I suspect the latter, as Cortex-A35 is mostly in-order (with some out-of-order memory accesses). We didn't run into any of these issues with the tqma board, which also has a Cortex-A35, but is an imx8qxp SoC.
Simply replacing the RET instruction with BR X30 resolves the issue.
Interesting.
In “Requirements for branch predictor maintenance” [2], branch prediction maintenance operations must be performed after enabling or disabling the MMU. However, these operations translate as a NOP if the branch predictor hardware is not architecturally visible.
That's a link to an ARMv7 document. The 32-bit ARM Elfoader code does a BPIALL before enabling the MMU, but the 64-bit one doesn't. However, other than having unnecessarily wrong branch predictions, I don't see how clearing branch predictor state would be architecturally required. It seems more likely that the correct branch prediction causes issues that don't show up when mispredicting the branch. Do you have an aarch64 reference?
Therefore, the ‘BL/RET’ behavior from different translation regime seems to to be implementation-dependent or dependent on optional architectural features
That seems unlikely: The instructions are well-defined. I suspect we are seeing the effect of a missing synchronisation or other subtle problem, not the cause of the problem here.
is enough to start seL4 with SMP=on NUM_NODES=2 and run a dual core Linux on the CAmkES VM.
Do you see the same problem when booting with one core, with unmodified code? It would help to limit this to unicore to rule out any SMP problems. The hyp and non-hyp boot code is noticeably different, so it is a bit surprising that both fail the same way.
This is semantically equivalent to the BL X30 workaround, but this is more explicit regarding the MMU translation regime around BL/RET.
But like the BR work-around, it's a work-around and doesn't fix the source of the problem. Does adding BPIALL make it boot too? I guess it will because it has the same effect. It seems strange that any of this makes a difference though, considering that there is both a DSB as well as an ISB before the RET, at least in src/arch-arm/armv/armv8-a/64/mmu-hyp.S: enable_mmu sctlr_el2, x8 ic ialluis dsb ish isb tlbi alle2is dsb ish isb ldp x29, x30, [sp], #16 ret END_FUNC(arm_enable_hyp_mmu) So to double check, do your elfloader files match these: https://github.com/seL4/seL4_tools/blob/master/elfloader-tool/src/arch-arm/a... https://github.com/seL4/seL4_tools/blob/master/elfloader-tool/src/arch-arm/a... Or are you running some old version? Greetings, Indan