On Wed, Dec 8, 2021 at 7:08 AM Sam Leffler
[thanks for the extensive answers]
On Mon, Dec 6, 2021 at 10:13 PM Kent Mcleod
wrote: On Wed, Dec 1, 2021 at 9:59 AM Sam Leffler via Devel
wrote: Some updates inline. Especially interested in answers for #5 and #8.
On Wed, Nov 24, 2021 at 9:12 PM Sam Leffler
wrote: I've been investigating how capd-loader works and have some questions. My apologies if these were previously discussed.
1. CDL_Model is mutable. The capDL specification looks like a perfect candidate for being in .rodata but it's stored+treated as mutable. Why? I see capdl-loader uses the mutability to store mapped page frame caps and, for risc-v, VSpace roots, but both can be easily handled w/o modifying the spec. Having this immutable has many benefits.
It's currently mutable because of the updates to it as it is processed. If these updates were removed, then it would be a good idea to make it immutable. How would you handle the mapped_frame_cap bookkeeping without storing it in the spec as a single frame could have multiple shared mappings which each require a cap?
I handled the 2 cases on risc-v by:
1. Instead of marking vspace roots in the model, keeping a separate set of obj id's. I collect the set on the stack while creating objects and then record it "elsewhere" for later use. I actually use a data structure that has a fixed amount of storage & automatically spills to the heap on overflow. 32 entries works for me w/o spilling but obviously it's spec-dependent. 2. For the shared mappings I re-use the equivalent of the capdl_to_sel4_copy array. Since that array is only used for TCB & CNode dup'd caps there appear to be no conflicts/overlaps.
I think collecting vspace roots by checking for CDL_TCB_VTable_Slot in the TCB works for all platforms. The benefit of tracking them separately is you can avoid walking the object list in init_pd_asids & init_vspace.
For 2. How would this work if a frame was shared between 3 or more vspaces, requiring multiple duplication operations?
2. All page frames are treated as shared. CAPDL_SHARED_FRAMES is #define'd in the code so every frame is treated as shared. This inflates the size of the orig_caps array and the kernel capabilities table but seems unnecessary. Why? I've verified that instead of doing the Copy op on each page you can just check the return of the seL4_Page_Map call and if necessary clone the cap. Better would be for camkes to identify shared page frames so this doesn't require 2x syscalls (which spam's the console w/ CONFIG_PRINTING enabled).
What you suggest would be possible. For some object types (like an endpoint) the original cap is special and can be used to revoke all children caps that have been derived from it. All frame types don't have this property which means that the original cap is no different from any copies of it. Embedding whether the frame is shared into the object as an attribute would make this easier to optimize as I agree that it becomes pretty inefficient in typical systems with many more private frames than shared.
Thanks for pointing out the issue with endpoints. I was only suggesting doing this copy-on-write style handling for CDL_Frame objects. This approach lets me halve the size of capdl_to_sel4_copy & capdl_to_sel4_orig.
3. Why does copy_addr_with_pt exist? There are two memory regions created for mapping page frames into the rootserver address space for doing fill ops. One is 64KB and the other is 1 page (nominally 4KB). init_frame tries the 4KB region first then falls back to the 64K region if mapping into the 4KB region fails. Why (i.e. why would the 1st mapping call fail)? On my risc-v system I never use the 64K region and I'm not sure when it might be used? Even if it were used, why is 64K always allocated; seems like the region should be sized according to the target platform. Having this extra region has a cost that can be significant on small systems.
I got this backwards; on risc-v at least the 64KB region is used but not the page immediately after the ipc buffer. On my system I eliminated the latter and shrank the bss region to 1 page. Looking for feedback on whether this will fail on other platforms.
Short answer: For copy_addr_with_pt, yes we could make the 64kiB definition aarch32 specific and 4kiB on other platforms. Long answer: It looks like copy_addr_with_pt is defined to be an address range that is guaranteed to have mapped page tables down to the last level of the page table hierarchy and is therefore suitable for mapping 4kiB pages into (or 64k pages only on aarch32 which is the only supported arch which allows larger pages to also be mapped in at the lowest level page table entries. copy_addr on the other hand is guaranteed to not have a mapped page table at the lowest level of the page table hierarchy which allows for 1MiB, 2MiB, 4MiB and 16MiB page sizes to be mapped here and achieves this by picking a virtual address with alignment of 16MiB above the end of the image but still expecting this to be within the page table that is one level up from the lowest level page table in the hierarchy. Then init_frame tries either region when trying to initialize a frame of varying size between 4kiB and 16MiB. Frames the next size up, 1GiB, would not be able to be mapped by the loader without adding a new copy_addr variable that has larger alignment to guarantee it doesn't collide with a page table at the next level up.
Thank you. Still not sure whether this means it's possible to eliminate the copy_addr support (sounds like it might be arch-dependent?). Is the multiple page sizes for an x86 target? (so I can test my approaches).
Multiple page sizes could be encountered on all architectures. If you look at the `levels` method defined for each seL4 architecture class here (https://github.com/seL4/capdl/blob/master/python-capdl-tool/capdl/util.py#L2...), it returns an ordered array of `Level`s which corresponds to the layout of the virtual memory system architecture. The last entry in the array is the lowest level page objects that require mapping here use the `copy_addr_with_pt` mapping location in the loader. Page objects that are mapped at the second-last level in the levels array, which for RISCV32 would be `seL4_RISCV_Mega_Page`s (aliased as seL4_LargePageObject in a lot of places) would still need to use the `copy_addr` mapping but this shouldn't consume physical memory as it's not allocated in any loadable segment in the final ELF for the loader. Except for aarch32, all architectures have 4k page objects in the last level and so only require copy_addr_with_pt to be 4kiB sized. class RISCV32Arch(Arch): def levels(self): return [ Level(2 ** 32, [ObjectType.seL4_LargePageObject], ObjectType.seL4_PageTableObject, PageTable, "pt"), Level(2 ** 22, [ObjectType.seL4_SmallPageObject], ObjectType.seL4_PageTableObject, PageTable, "pt"), ]
4. console output. Why is sel4_DebugPutChar (or similar) not a standard part of the kernel? This forces capdl-loader to depend on libsel4platsupport for console output which in turn affects cap space. I can't imagine an seL4 system that cannot write to the console and I think it's entirely reasonable to have a critical system service like the rootserver use that.
The verified kernel configurations don't have a uart driver in the kernel. This decision follows from an L4 design principle that the kernel doesn't provide an implementation for anything that could be provided with an implementation outside of the kernel. In practice, if you know that the loader is only going to be used on a specific platform, then you could write a simple specialized uart driver that you link in instead of the entirety of libplatsupport.
5. duplicate_caps. The scheme for constructing a running system from a spec is straightforward except for duplicating caps. Why are all TCB's and CNode caps dup'd? Why does init_tcbs call init_tcb on the original object but immediately configure_tcb on the dup? There are other places where the dup cap is used instead of the original cap (e.g. init_cnode_slot). None of this seems to matter on risc-v; maybe this matters on some other architectures?
I don't know the reason for this and it's older than the git history I have access to. It looks like its unnecessary for TCBs, and is likely still necessary for CNodes when loading specs that have CNodes that are moved into slots in other CNodes when constructing a multi-cnode CSpace.
Is there an example in the tree of a nested CNode spec? I'm still not clear on why the dup is required so being able to repro this would be helpful.
It's not something that camkes supports because it doesn't come up in static systems where the total number of cap slots needed is known in advance and there hasn't been any camkes connectors written that use a shared CNode as a way of sending/receiving objects. There is a slight example in the capDL-Tool folder itself but I don't think you'd find it useful as it can't be turned into a loaded system: https://github.com/seL4/capdl/blob/master/capDL-tool/cap-dist-elf-simpleserv... Looking some more, none of the CNode caps are ever moved out of their source slots by the loader it seems. So maybe in both cases the dup caps mechanism is unnecessary.
6. On risc-v all pages appear to be marked executable--including ipc buffers! This seems just wrong and a potential security hole.
This looks to be a bug. Marking the NX bit on risc-v when the grant right is not set gives me what I want. arm already did this.
It's a bug because it doesn't do it the same way as Arm, but the way arm does it is a bit misleading. The kernel itself only uses the read and write bits from cap rights for frame mappings, and a cap with write-only rights gets downgraded to kernel-only mapping, so really the kernel only lets you describe read-write, read-only or none. For both read-write and read-only cap abilities, the frame could be mapped exec or non-exec. The read right implies executable rights from the kernel's perspective. The capdl loader does add a little bit of extra security with using the grant right in the spec to imply whether the frame has executable permissions and tries to create the mapping accordingly (except on riscv which is a bug), but if the frame cap for this mapping is also given to the component in order to do things like cache maintenance invocations, then the component could use that capability to convert it's non-exec mappings to executable mappings.
There's probably a good argument for overhauling how the kernel handles frame rights so that you can express executeNever as a capability right and associate it with the grant or grantreply bit in the existing capRights. I think this wasn't done originally because either there were no spare bits in the cap structure at the time, or armv6 didn't support executeNever mappings (but you'd have to fact-check me on that).
7. Why does seL4_TCB_WriteRegisters take a _mutable_ seL4_UserContext? I'm guessing this is a mistake in the xml spec (but haven't looked).
This looks to be a shortcoming in the python script that generates the syscall stubs.
Correct.
8. init_cspaces mystifies me. There are two loops over all CNodes; one that does "COPY" operations and one that does "MOVE" operations. I see nothing explaining what's happening. I've read this code many times and still don't get it; please explain.
As I mentioned higher up, some caps are special as originals and allow additional operations to be performed on them compared to their copies even if the cap rights are the same for both. Initializing all of the cnodes in the system involves moving capabilities (seL4_CNode_Move) or deriving capabilities from source capabilities either with the same rights (seL4_CNode_Copy) or lesser rights (seL4_CNode_Mint). Its split into two passes because the derivation of capabilities need to happen before the source capabilities are moved into the destination CNode.
9. According to comments, the use of a global static to setup the UserContext for a new TCB is done because one cannot take the address of a local variable. On what architecture is this true? On risc-v the context is copied into "registers" before calling the kernel so this doesn't appear to be an issue with crossing the user-kernel protection boundary.
This is true for the kernel with the proofs. The very first implementation of this loader was as part of a project that wanted to verify it using the same tooling as the kernel proofs. So that comment is probably documenting a requirement that is no longer present.
10. Memory reclamation. I'm unclear what happens to the memory used by capdl-loader. The process does a TCB_Suspend call when it's done. Does this cause the CSpace & VSpace to be reclaimed? For CAmkES this doesn't matter as there's no support for allocating memory but I'm building a system where this isn't true. How do I release memory (in particular) back to the untyped pool?
This was discussed on the developers zoom call so no need to belabor it. Sounds like I've got my work cut out for me...
Yea, this one is a bit more involved.
It appears I need this so may be back w/ more questions :).
I have a bunch of niggly code complaints/questions but let's start with the above.
-Sam
_______________________________________________ Devel mailing list -- devel@sel4.systems To unsubscribe send an email to devel-leave@sel4.systems