Following further investigation this does not obviously appear to be a traditional case of memory corruption. With the following code in a single threaded component the problem is being exhibited:
#include
#include
#include
void* debug_memalign(size_t align, size_t size)
{
void* addr = memalign(align, size);
printf("memalign: align = 0x%x, size = 0x%x. Returned address = %p\n",align, size, addr);
return addr;
}
int run_main(ps_io_ops_t *io_ops)
{
debug_memalign(0x40, 0x1000);
debug_memalign(0x40, 0x1000);
// Loop forever
while (1);
}
CAMKES_POST_INIT_MODULE_DEFINE(run_main_, run_main);
This code, performing two calls to memalign as the first processing within the component, results in the following output:
ELF-loader started on CPU: ARM Ltd. Cortex-A53 r0p4
paddr=[408b3000..40c21117]
No DTB passed in from boot loader.
Looking for DTB in CPIO archive...found at 409ff090.
Loaded DTB from 409ff090.
paddr=[4023f000..40248fff]
ELF-loading image 'kernel' to 40000000
paddr=[40000000..4023efff]
vaddr=[ffffff8040000000..ffffff804023efff]
virt_entry=ffffff8040000000
ELF-loading image 'capdl-loader' to 40249000
paddr=[40249000..404c1fff]
vaddr=[400000..678fff]
virt_entry=408e78
Enabling MMU and paging
Jumping to kernel-image entry point...
Warning: gpt_cntfrq 8333333, expected 8000000
Bootstrapping kernel
available phys memory regions: 1
[40000000..c0000000]
reserved virt address space regions: 3
[ffffff8040000000..ffffff804023f000]
[ffffff804023f000..ffffff8040249000]
[ffffff8040249000..ffffff80404c2000]
Booting all finished, dropped to user space
memalign: align = 0x40, size = 0x1000. Returned address = 0x57d1c0
memalign: align = 0x40, size = 0x1000. Returned address = 0x57d1c0
As can be seen the same address is returned for both calls and there is no earlier processing in this case to corrupt the bookkeeping of the memalign / malloc routines. I have a few observations that may be of interest:
1. As previously noted, if the two calls to ‘memalign’ are replaced with calls to ‘malloc’ then no problems are seen; two non-overlapping memory regions are allocated.
2. The behaviour appears to depend on the other code / data linked into the executable. If the calls to ‘memalign’ are the only functionality performed by the component then two non-overlapping memory regions are allocated as expected, however if the calls to ‘memalign’ are followed by calls to the U-Boot code we are porting to seL4 then the unexpected behaviour is observed.
3. The behaviour is being observed on a board for which we are in the process of adding support (the Avnet MaaXBoard, and i.MX8 based board). I can confirm however that seL4test passes on the board as well tests of an ethernet driver (using the Ethdriver and PicoServer global components) and tests of a MMC/SDHC driver. As such there is a degree of confidence in the board’s level of support.
Any insights or suggestions on how to proceed with determining the cause of this issue would be greatly appreciated.
Thanks,
Stephen
On 17 Mar 2022, at 23:30, WILLIAMS Stephen mailto:stephen.williams@capgemini.com> wrote:
Thanks for the information.
The issue is being observed within a single threaded camkes component, therefore I’m assuming that thread safety of the muslc functions it not relevant here.
In an attempt to investigate memory corruption I have run the software under valgrind on my host machine, which does not highlight any issues. Obviously the interaction with the actual devices was needed to be simulated to perform this run on the host however I believe that all of the same memory allocation and access routines should have been performed. This suggests that memory corruption may not be the cause.
One potentially interesting observation is that if I replace all calls to memalign with the equivalent calls to malloc (noting that the alignment appears to be for performance rather than functional correctness reasons) I no longer have overlapping memory regions allocated and the code functions as expected.
Thanks,
Stephen
On 16 Mar 2022, at 23:23, Kent Mcleod mailto:kent.mcleod72@gmail.com> wrote:
***This mail has been sent by an external source***
On Thu, Mar 17, 2022 at 8:26 AM WILLIAMS Stephen via Devel
mailto:devel@sel4.systems> wrote:
Hi,
I’m currently working on a project porting drivers from U-Boot to seL4 and have run into an unexpected problem seemingly triggered by use of the memalign within the U-Boot drivers.
What I am seeing is that calls to memalign from within my CAmkES component can return pointers to regions which overlap with those previously returned by malloc. Obviously this leads to the two allocated regions trampling over each other and resulting corruption of data.
I’m at a complete loss to explain this behaviour and would be very grateful to receive any suggestions or pointers.
Both malloc and memalign in camkes are provided by our fork of
libmuslc (https://github.com/sel4/musllibc/). Internally, memalign
calls malloc and so it seems like your issue can be reduced to
multiple calls to malloc are returning overlapping regions. This could
be for a couple reasons:
- Within the default camkes runtime, muslc functions such as malloc
aren't thread safe and so must be called from critical sections
guarded by a lock to avoid races. Many camkes components use a global
lock when performing operations that mutate state:
https://github.com/seL4/global-components/blob/master/components/TimeServer/...,
or they don't use dynamic memory allocation after initialization (as
initialization is single threaded). This lack of thread safety is a
bit nasty and the runtime should do more to protect developers from
this, but currently I don't think it does.
- You have memory corruption somewhere else that's causing malloc's
bookkeeping structures to be corrupted.
Thanks for your help,
Stephen
This message contains information that may be privileged or confidential and is the property of the Capgemini Group. It is intended only for the person to whom it is addressed. If you are not the intended recipient, you are not authorized to read, print, retain, copy, disseminate, distribute, or use this message or any part thereof. If you receive this message in error, please notify the sender immediately and delete all copies of this message.
_______________________________________________
Devel mailing list -- devel@sel4.systemsmailto:devel@sel4.systems
To unsubscribe send an email to devel-leave@sel4.systemsmailto:devel-leave@sel4.systems