[seL4] Re: capdl-loader questions

8 Dec 2021

      [thanks for the extensive answers]

On Mon, Dec 6, 2021 at 10:13 PM Kent Mcleod <kent.mcleod72@gmail.com> wrote:
...
On Wed, Dec 1, 2021 at 9:59 AM Sam Leffler via Devel <devel@sel4.systems>
wrote:
...
Some updates inline. Especially interested in answers for #5 and #8.
On Wed, Nov 24, 2021 at 9:12 PM Sam Leffler <sleffler@google.com> wrote:
...
I've been investigating how capd-loader works and have some questions.
My
...
...
apologies if these were previously discussed.
1. CDL_Model is mutable. The capDL specification looks like a perfect
candidate for being in .rodata but it's stored+treated as mutable.
Why? I
see capdl-loader uses the mutability to store mapped page frame caps
and,
for risc-v, VSpace roots, but both can be easily handled w/o modifying
the
spec. Having this immutable has many benefits.
It's currently mutable because of the updates to it as it is
processed. If these updates were removed, then it would be a good idea
to make it immutable. How would you handle the mapped_frame_cap
bookkeeping without storing it in the spec as a single frame could
have multiple shared mappings which each require a cap?
I handled the 2 cases on risc-v by:

1. Instead of marking vspace roots in the model, keeping a separate set of
obj id's. I collect the set on the stack while creating objects and then
record it "elsewhere" for later use. I actually use a data structure that
has a fixed amount of storage & automatically spills to the heap on
overflow. 32 entries works for me w/o spilling but obviously it's
spec-dependent.
2. For the shared mappings I re-use the equivalent of
the capdl_to_sel4_copy array. Since that array is only used for TCB & CNode
dup'd caps there appear to be no conflicts/overlaps.

I think collecting vspace roots by checking for CDL_TCB_VTable_Slot in the
TCB works for all platforms. The benefit of tracking them separately is you
can avoid walking the object list in init_pd_asids & init_vspace.
...
...
...
2. All page frames are treated as shared. CAPDL_SHARED_FRAMES is
#define'd
in the code so every frame is treated as shared. This inflates the
size of
the orig_caps array and the kernel capabilities table but seems
unnecessary. Why? I've verified that instead of doing the Copy op on
each
page you can just check the return of the seL4_Page_Map call and if
necessary clone the cap. Better would be for camkes to identify shared
page
frames so this doesn't require 2x syscalls (which spam's the console w/
CONFIG_PRINTING enabled).
What you suggest would be possible.  For some object types (like an
endpoint) the original cap is special and can be used to revoke all
children caps that have been derived from it. All frame types don't
have this property which means that the original cap is no different
from any copies of it. Embedding whether the frame is shared into the
object as an attribute would make this easier to optimize as I agree
that it becomes pretty inefficient in typical systems with many more
private frames than shared.
Thanks for pointing out the issue with endpoints. I was only suggesting
doing this copy-on-write style handling for CDL_Frame objects.  This
approach lets me halve the size of capdl_to_sel4_copy & capdl_to_sel4_orig.
...
...
...
3. Why does copy_addr_with_pt exist? There are two memory regions
created
for mapping page frames into the rootserver address space for doing
fill
ops. One is 64KB and the other is 1 page (nominally 4KB). init_frame
tries
the 4KB region first then falls back to the 64K region if mapping into
the
4KB region fails. Why (i.e. why would the 1st mapping call fail)? On my
risc-v system I never use the 64K region and I'm not sure when it
might be
used? Even if it were used, why is 64K always allocated; seems like the
region should be sized according to the target platform. Having this
extra
region has a cost that can be significant on small systems.
I got this backwards; on risc-v at least the 64KB region is used but not
the page immediately after the ipc buffer. On my system I eliminated the
latter and shrank the bss region to 1 page. Looking for feedback on
whether
this will fail on other platforms.
Short answer: For copy_addr_with_pt, yes we could make the 64kiB
definition aarch32 specific and 4kiB on other platforms.
Long answer: It looks like copy_addr_with_pt is defined to be an
address range that is guaranteed to have mapped page tables down to
the last level of the page table hierarchy and is therefore suitable
for mapping 4kiB pages into (or 64k pages only on aarch32 which is the
only supported arch which allows larger pages to also be mapped in at
the lowest level page table entries.  copy_addr on the other hand is
guaranteed to not have a mapped page table at the lowest level of the
page table hierarchy which allows for 1MiB, 2MiB, 4MiB and 16MiB page
sizes to be mapped here and achieves this by picking a virtual address
with alignment of 16MiB above the end of the image but still expecting
this to be within the page table that is one level up from the lowest
level page table in the hierarchy.
Then init_frame tries either region when trying to initialize a frame
of varying size between 4kiB and 16MiB.  Frames the next size up,
1GiB, would not be able to be mapped by the loader without adding a
new copy_addr variable that has larger alignment to guarantee it
doesn't collide with a page table at the next level up.
Thank you. Still not sure whether this means it's possible to eliminate the
copy_addr support (sounds like it might be arch-dependent?). Is the
multiple page sizes for an x86 target? (so I can test my approaches).
...
...
...
4. console output. Why is sel4_DebugPutChar (or similar) not a standard
part of the kernel? This forces capdl-loader to depend on
libsel4platsupport for console output which in turn affects cap space.
I
...
can't imagine an seL4 system that cannot write to the console and I
think
it's entirely reasonable to have a critical system service like the
rootserver use that.
The verified kernel configurations don't have a uart driver in the
kernel.  This decision follows from an L4 design principle that the
kernel doesn't provide an implementation for anything that could be
provided with an implementation outside of the kernel.  In practice,
if you know that the loader is only going to be used on a specific
platform, then you could write a simple specialized uart driver that
you link in instead of the entirety of libplatsupport.
...
...
5. duplicate_caps.  The scheme for constructing a running system from a
spec is straightforward except for duplicating caps. Why are all TCB's
and
CNode caps dup'd? Why does init_tcbs call init_tcb on the original
object
but immediately configure_tcb on the dup? There are other places where
the
dup cap is used instead of the original cap (e.g. init_cnode_slot).
None of
this seems to matter on risc-v; maybe this matters on some other
architectures?
I don't know the reason for this and it's older than the git history I
have access to.  It looks like its unnecessary for TCBs, and is likely
still necessary for CNodes when loading specs that have CNodes that
are moved into slots in other CNodes when constructing a multi-cnode
CSpace.
Is there an example in the tree of a nested CNode spec? I'm still not clear
on why the dup is required so being able to repro this would be helpful.
...
...
...
6. On risc-v all pages appear to be marked executable--including ipc
buffers! This seems just wrong and a potential security hole.
This looks to be a bug. Marking the NX bit on risc-v when the grant right
is not set gives me what I want. arm already did this.
It's a bug because it doesn't do it the same way as Arm, but the way
arm does it is a bit misleading. The kernel itself only uses the read
and write bits from cap rights for frame mappings, and a cap with
write-only rights gets downgraded to kernel-only mapping, so really
the kernel only lets you describe read-write, read-only or none.  For
both read-write and read-only cap abilities, the frame could be mapped
exec or non-exec.  The read right implies executable rights from the
kernel's perspective.  The capdl loader does add a little bit of extra
security with using the grant right in the spec to imply whether the
frame has executable permissions and tries to create the mapping
accordingly (except on riscv which is a bug), but if the frame cap for
this mapping is also given to the component in order to do things like
cache maintenance invocations, then the component could use that
capability to convert it's non-exec mappings to executable mappings.
There's probably a good argument for overhauling how the kernel
handles frame rights so that you can express executeNever as a
capability right and associate it with the grant or grantreply bit in
the existing capRights.  I think this wasn't done originally because
either there were no spare bits in the cap structure at the time, or
armv6 didn't support executeNever mappings (but you'd have to
fact-check me on that).
...
...
7. Why does seL4_TCB_WriteRegisters take a _mutable_ seL4_UserContext?
I'm
...
guessing this is a mistake in the xml spec (but haven't looked).
This looks to be a shortcoming in the python script that generates the
syscall stubs.
Correct.
...
...
8. init_cspaces mystifies me. There are two loops over all CNodes; one
that does "COPY" operations and one that does "MOVE" operations. I see
nothing explaining what's happening. I've read this code many times and
still don't get it; please explain.
As I mentioned higher up, some caps are special as originals and allow
additional operations to be performed on them compared to their copies
even if the cap rights are the same for both.  Initializing all of the
cnodes in the system involves moving capabilities (seL4_CNode_Move) or
deriving capabilities from source capabilities either with the same
rights (seL4_CNode_Copy) or lesser rights (seL4_CNode_Mint).  Its
split into two passes because the derivation of capabilities need to
happen before the source capabilities are moved into the destination
CNode.
...
...
9. According to comments, the use of a global static to setup the
UserContext for a new TCB is done because one cannot take the address
of a
local variable. On what architecture is this true? On risc-v the
context is
copied into "registers" before calling the kernel so this doesn't
appear to
be an issue with crossing the user-kernel protection boundary.
This is true for the kernel with the proofs.  The very first
implementation of this loader was as part of a project that wanted to
verify it using the same tooling as the kernel proofs.  So that
comment is probably documenting a requirement that is no longer
present.
...
...
10. Memory reclamation. I'm unclear what happens to the memory used by
capdl-loader. The process does a TCB_Suspend call when it's done. Does
this
cause the CSpace & VSpace to be reclaimed? For CAmkES this doesn't
matter
as there's no support for allocating memory but I'm building a system
where
this isn't true. How do I release memory (in particular) back to the
untyped pool?
This was discussed on the developers zoom call so no need to belabor it.
Sounds like I've got my work cut out for me...
Yea, this one is a bit more involved.
It appears I need this so may be back w/ more questions :).
...
...
...
I have a bunch of niggly code complaints/questions but let's start with
the above.
-Sam
_______________________________________________
Devel mailing list -- devel@sel4.systems
To unsubscribe send an email to devel-leave@sel4.systems