[seL4] Re: Some questions about group of camkes (adjust the format)

28 Dec 2021

...
On Dec 23, 2021, at 08:50, yadong.li <yadong.li@horizon.ai> wrote:
Hi
*   We learn the group of camkes from the https://github.com/seL4/camkes-tool/blob/master/docs/index.md
 *   group can colocate two component instances in a single address space, we want to use the share var by this way, like directcall transfer string virtual address, But we feel confused about the global symbols who have the same symbol name include function and variable in a single address space, Their behavior seems undefined 。
My information on how this works is out of date and I can’t answer all your questions, but I can tell you how this feature was originally implemented.

Single address space components were originally implemented using post-compilation symbol mangling. As you’ve noticed, naive linking of two component instances together presents two problems:
1. Unrelated homonyms between the instances now conflict. A symbol called `foo` in component instance A and a symbol called `foo` in component instance B now refer to the same thing when linked together.
2. References to glue code have the opposite problem: their names may differ in component instance A and component instance B, but need to refer to the same functions when the two are linked together.

Both problems were solved with the same mechanism: GNU objcopy. The invocations to objcopy were template-generated so they could take advantage of knowledge about the system begin compiled. A generated objcopy invocation would do the following:
1. Adjust all non-generated symbols to have internal visibility. Component instance A and component instance B have already been (partially) linked, so at this point the only unresolved symbols that need to remain externally visible are those related to the connection(s) between them. This solves problem 1 above.
2. Name-mangle all remaining symbols to something prefixed with the relevant connection instance’s name. IIRC the name mangling scheme was something like ‘<connection instance name><space><original symbol name>’. This guaranteed uniqueness because <space> is not a character that existed in C or assembly symbols. Whether using a space in a symbol name is legal or not, I don’t know, but all the binutils seemed fine with this.
3. Rename the second component in the name mangling above to a common name. I don’t recall exactly how this name was chosen, but this solves problem 2 above.

This sounds pretty unorthodox and brittle, but it actually worked surprisingly well. All combinations of single address space component systems seemed to Just Work, with a few notable sharp edges:
* Anything involving MMIO was tricky. These symbols frequently needed to remain externally visible and the two component instances would often have differing names but the same addresses for them.
* GNU ld was more or less required. The multiple steps involving partial linking was only supported by GNU ld and Gold at the time. LLVM’s lld may have caught up in the interim years.
* The objcopy name mangling broke cross-section references used by GCC’s implementation of Link-Time Optimization. As a result, any LTO compilation degraded to LTO being disabled. This wouldn’t have been a big deal except that one of the primary reasons to put two components in a single address space is to enable cross-component inlining, usually facilitated by LTO. AFAICT this (playing objcopy tricks and expecting LTO to still work) was simply not a supported use case. We explored how to work around this and got some one-off efforts working for benchmarking, but proper support would have involved altering the way binutils work.

[seL4] Re: Some questions about group of camkes (adjust the format)

Matthew Fernandez