Thank you for the feedback, Thomas. I understand the utility of the double-lookup design in general -- it's an elegant solution for the problem at hand! The part I'm confused by is specifically why capability resolution depth mismatches are necessary.
To elaborate on what's going on with the example I showed, and why the shift is there:
seL4_TCB_Configure does not, itself, do any resolution on the fault endpoint cap it is provided. It simply stores the cap for later use. When a fault occurs in the configured thread, seL4 resolves the cap, at a depth of 32 (wordSize) bits, relative to the configured root CSpace for the thread. With the slot offset in the high bits of slot_for_tcb, resolution will find the copied endpoint and stop.
But seL4_CNode_Copy will fail with a capability resolution depth mismatch if we attempt to copy with depth 32, because resolution will stop with 24 extra bits left over. So I passed depth=8 for the copy call, because that's the radix of the CNode we're copying into, and it has no child slots to recurse into. That means that the slot offset should be in the low 8 bits of slot_for_copy.
Things seem like they would be easier/more flexible if the capability resolution depth mismatches weren't checked for. Then a slot in a node could be transparently expanded (at least, transparent in the common case). So: start with an empty 256-slot CNode; the last slot has canonical address 0xFF000000. Allocate a child 256-slot CNode, and rotate it into the last slot; now the canonical address 0xFF000000 refers to the first slot of the child, and 0xFF01... through 0xFFFF... are available. Currently this sort of transparent upgrade doesn't work because the slots must be distinguished based on their depth.
Having thought about this scheme a little more, the primary downside I can see is that depending on the starting node, an expansion can change the result of resolution. Example: with four radix-8 CNodes organized akin to a linked list, A->B->C->D, the address 0 resolves to the first slot of D regardless of whether resolution starts from A or B. But expand the 0'th slot of D to point to E, and now resolution from B will return E[0] but resolution from A stops at D[0]. Still, since depths are still available, a careful client can use them to ensure stable address resolution.
So it seems to me, naively, that removing the depth check would make CSpace management simpler for clients that use CSpaces in simple/stylized ways, and would not impede sophisticated clients from using CSpaces in arbitrary ways. Am I missing some other fatal flaw?