On Dec 23, 2021, at 08:50, yadong.li
<yadong.li(a)horizon.ai> wrote:
Hi
* We learn the group of camkes from the
https://github.com/seL4/camkes-tool/blob/master/docs/index.md
* group can colocate two component instances in a single address space, we want to use
the share var by this way, like directcall transfer string virtual address, But we feel
confused about the global symbols who have the same symbol name include function and
variable in a single address space, Their behavior seems undefined 。
My information on how this works is out of date and I can’t answer all your questions, but
I can tell you how this feature was originally implemented.
Single address space components were originally implemented using post-compilation symbol
mangling. As you’ve noticed, naive linking of two component instances together presents
two problems:
1. Unrelated homonyms between the instances now conflict. A symbol called `foo` in
component instance A and a symbol called `foo` in component instance B now refer to the
same thing when linked together.
2. References to glue code have the opposite problem: their names may differ in component
instance A and component instance B, but need to refer to the same functions when the two
are linked together.
Both problems were solved with the same mechanism: GNU objcopy. The invocations to objcopy
were template-generated so they could take advantage of knowledge about the system begin
compiled. A generated objcopy invocation would do the following:
1. Adjust all non-generated symbols to have internal visibility. Component instance A and
component instance B have already been (partially) linked, so at this point the only
unresolved symbols that need to remain externally visible are those related to the
connection(s) between them. This solves problem 1 above.
2. Name-mangle all remaining symbols to something prefixed with the relevant connection
instance’s name. IIRC the name mangling scheme was something like ‘<connection instance
name><space><original symbol name>’. This guaranteed uniqueness because
<space> is not a character that existed in C or assembly symbols. Whether using a
space in a symbol name is legal or not, I don’t know, but all the binutils seemed fine
with this.
3. Rename the second component in the name mangling above to a common name. I don’t recall
exactly how this name was chosen, but this solves problem 2 above.
This sounds pretty unorthodox and brittle, but it actually worked surprisingly well. All
combinations of single address space component systems seemed to Just Work, with a few
notable sharp edges:
* Anything involving MMIO was tricky. These symbols frequently needed to remain externally
visible and the two component instances would often have differing names but the same
addresses for them.
* GNU ld was more or less required. The multiple steps involving partial linking was only
supported by GNU ld and Gold at the time. LLVM’s lld may have caught up in the interim
years.
* The objcopy name mangling broke cross-section references used by GCC’s implementation of
Link-Time Optimization. As a result, any LTO compilation degraded to LTO being disabled.
This wouldn’t have been a big deal except that one of the primary reasons to put two
components in a single address space is to enable cross-component inlining, usually
facilitated by LTO. AFAICT this (playing objcopy tricks and expecting LTO to still work)
was simply not a supported use case. We explored how to work around this and got some
one-off efforts working for benchmarking, but proper support would have involved altering
the way binutils work.