[seL4] Re: Overlapping pointer returned by memalign

22 Mar 2022


      On Tue, Mar 22, 2022 at 9:20 AM WILLIAMS Stephen
<stephen.williams@capgemini.com> wrote:
...
Following further investigation this does not obviously appear to be a traditional case of memory corruption. With the following code in a single threaded component the problem is being exhibited:
#include <camkes.h>
#include <sel4platsupport/io.h>
#include <stdio.h>
void* debug_memalign(size_t align, size_t size)
{
    void* addr = memalign(align, size);
    printf("memalign: align = 0x%x, size = 0x%x. Returned address = %p\n",align, size, addr);
    return addr;
}
int run_main(ps_io_ops_t *io_ops)
{
    debug_memalign(0x40, 0x1000);
    debug_memalign(0x40, 0x1000);
<snip, continue processing>
// Loop forever
    while (1);
}
CAMKES_POST_INIT_MODULE_DEFINE(run_main_, run_main);
This code, performing two calls to memalign as the first processing within the component, results in the following output:
ELF-loader started on CPU: ARM Ltd. Cortex-A53 r0p4
  paddr=[408b3000..40c21117]
No DTB passed in from boot loader.
Looking for DTB in CPIO archive...found at 409ff090.
Loaded DTB from 409ff090.
   paddr=[4023f000..40248fff]
ELF-loading image 'kernel' to 40000000
  paddr=[40000000..4023efff]
  vaddr=[ffffff8040000000..ffffff804023efff]
  virt_entry=ffffff8040000000
ELF-loading image 'capdl-loader' to 40249000
  paddr=[40249000..404c1fff]
  vaddr=[400000..678fff]
  virt_entry=408e78
Enabling MMU and paging
Jumping to kernel-image entry point...
Warning:  gpt_cntfrq 8333333, expected 8000000
Bootstrapping kernel
available phys memory regions: 1
  [40000000..c0000000]
reserved virt address space regions: 3
  [ffffff8040000000..ffffff804023f000]
  [ffffff804023f000..ffffff8040249000]
  [ffffff8040249000..ffffff80404c2000]
Booting all finished, dropped to user space
memalign: align = 0x40, size = 0x1000. Returned address = 0x57d1c0
memalign: align = 0x40, size = 0x1000. Returned address = 0x57d1c0
As can be seen the same address is returned for both calls and there is no earlier processing in this case to corrupt the bookkeeping of the memalign / malloc routines. I have a few observations that may be of interest:
1. As previously noted, if the two calls to ‘memalign’ are replaced with calls to ‘malloc’ then no problems are seen; two non-overlapping memory regions are allocated.
2. The behaviour appears to depend on the other code / data linked into the executable. If the calls to ‘memalign’ are the only functionality performed by the component then two non-overlapping memory regions are allocated as expected, however if the calls to ‘memalign’ are followed by calls to the U-Boot code we are porting to seL4 then the unexpected behaviour is observed.
The memalign definition provided by musl is a weak symbol and would be
overridden by any stronger definition encountered in sources such as
from U-Boot. Alternatively a header from U-Boot could introduce a
macro that redefines memalign (but it doesn't look like you have any
U-Boot includes). Are you able to confirm that the code being for
memalign in the final binary is in fact the definition provided by the
musl library?  Are you able to provide a minimal reproducible example?
Are you also only using the malloc provided by the musl fork? If
memalign is calling a different malloc and they have different
internal bookkeeping, then that would cause data corruption.
...
3. The behaviour is being observed on a board for which we are in the process of adding support (the Avnet MaaXBoard, and i.MX8 based board). I can confirm however that seL4test passes on the board as well tests of an ethernet driver (using the Ethdriver and PicoServer global components) and tests of a MMC/SDHC driver. As such there is a degree of confidence in the board’s level of support.
Any insights or suggestions on how to proceed with determining the cause of this issue would be greatly appreciated.
Thanks,
Stephen

[seL4] Re: Overlapping pointer returned by memalign

Kent Mcleod