On Tue, Mar 22, 2022 at 9:20 AM WILLIAMS Stephen
Following further investigation this does not obviously appear to be a traditional case of memory corruption. With the following code in a single threaded component the problem is being exhibited:
#include
#include #include void* debug_memalign(size_t align, size_t size) { void* addr = memalign(align, size); printf("memalign: align = 0x%x, size = 0x%x. Returned address = %p\n",align, size, addr); return addr; }
int run_main(ps_io_ops_t *io_ops) { debug_memalign(0x40, 0x1000); debug_memalign(0x40, 0x1000);
// Loop forever while (1); } CAMKES_POST_INIT_MODULE_DEFINE(run_main_, run_main);
This code, performing two calls to memalign as the first processing within the component, results in the following output:
ELF-loader started on CPU: ARM Ltd. Cortex-A53 r0p4 paddr=[408b3000..40c21117] No DTB passed in from boot loader. Looking for DTB in CPIO archive...found at 409ff090. Loaded DTB from 409ff090. paddr=[4023f000..40248fff] ELF-loading image 'kernel' to 40000000 paddr=[40000000..4023efff] vaddr=[ffffff8040000000..ffffff804023efff] virt_entry=ffffff8040000000 ELF-loading image 'capdl-loader' to 40249000 paddr=[40249000..404c1fff] vaddr=[400000..678fff] virt_entry=408e78 Enabling MMU and paging Jumping to kernel-image entry point...
Warning: gpt_cntfrq 8333333, expected 8000000 Bootstrapping kernel available phys memory regions: 1 [40000000..c0000000] reserved virt address space regions: 3 [ffffff8040000000..ffffff804023f000] [ffffff804023f000..ffffff8040249000] [ffffff8040249000..ffffff80404c2000] Booting all finished, dropped to user space memalign: align = 0x40, size = 0x1000. Returned address = 0x57d1c0 memalign: align = 0x40, size = 0x1000. Returned address = 0x57d1c0
As can be seen the same address is returned for both calls and there is no earlier processing in this case to corrupt the bookkeeping of the memalign / malloc routines. I have a few observations that may be of interest:
1. As previously noted, if the two calls to ‘memalign’ are replaced with calls to ‘malloc’ then no problems are seen; two non-overlapping memory regions are allocated.
2. The behaviour appears to depend on the other code / data linked into the executable. If the calls to ‘memalign’ are the only functionality performed by the component then two non-overlapping memory regions are allocated as expected, however if the calls to ‘memalign’ are followed by calls to the U-Boot code we are porting to seL4 then the unexpected behaviour is observed.
The memalign definition provided by musl is a weak symbol and would be overridden by any stronger definition encountered in sources such as from U-Boot. Alternatively a header from U-Boot could introduce a macro that redefines memalign (but it doesn't look like you have any U-Boot includes). Are you able to confirm that the code being for memalign in the final binary is in fact the definition provided by the musl library? Are you able to provide a minimal reproducible example? Are you also only using the malloc provided by the musl fork? If memalign is calling a different malloc and they have different internal bookkeeping, then that would cause data corruption.
3. The behaviour is being observed on a board for which we are in the process of adding support (the Avnet MaaXBoard, and i.MX8 based board). I can confirm however that seL4test passes on the board as well tests of an ethernet driver (using the Ethdriver and PicoServer global components) and tests of a MMC/SDHC driver. As such there is a degree of confidence in the board’s level of support.
Any insights or suggestions on how to proceed with determining the cause of this issue would be greatly appreciated.
Thanks, Stephen