[seL4] Re: User-level proofs and capability revocation

26 Mar 2025

      Thank you! This makes sense. However, it still seems that in some cases 
revocation is more of a nuisance than help. 

For example, in a classical Xen split device driver model, i.e., a 
shared device driver VM (or a process in seL4) multiplexes a single 
physical device across multiple client VMs. In Xen each client connects 
to the driver VM and provides a set of pages to establish a region of 
shared memory for communication.  The fact that these pages provided by 
the client protects the driver from a denial of service attack. In Xen 
it's impossible to revoke the pages -- once they are granted they belong 
to the driver VM (at least if I remember correctly, there might be some 
nuances and I didn't look at Xen recently). This scenario seems hard in 
seL4 as the shared pages can be revoked at any time. I guess, in seL4 
world one structures the system like this: there is a protocol which 
requires the client VM to work with the TCB along these lines:  give up 
N pages from it's memory allocation back to the TCB which then "moves" 
them into the driver. The TCB has to be "below" the client in the 
capability chain to make sure that the client looses the ability to 
revoke the pages from the driver, 

Effectively, in seL4, if my understanding above is correct, one 
structures the system in such a manner that revocation is eliminated, 
i.e., the client cannot revoke its pages. But then a logical question:  
why do we need revocation in the first place? In the end the client 
trusts the driver to release the pages back via the TCB when connection 
is teared down. This seems natural -- there is a degree of trust and 
cooperation between clients and the driver.

In other words, what is the killer use-case for fine-grained revocation 
in your opinion? I.e., what kind of system is impossible without it? Or 
even: did you see scenarios on top of seL4 when people rely on 
fine-grained revocation (i.e., revoke pages, memory regions, access to 
devices) instead of a rather coarse-grained termination of the entire 
processes? 

Anton

On Tue, Mar 25, 2025 at 09:52:51PM +0000, Gerwin Klein wrote:
...
On 26 Mar 2025, at 04:51, Anton Burtsev via Devel <devel@sel4.systems> wrote:
I have a couple of questions on how user-level proofs are constructed in
the seL4 ecosystem.
Just to start, can someone point me to an example of a user-level proof,
just so I can take a look at how specifications are exported to user
processes and how everything is stitched together.
There aren???t any fully stitched-together user-level proofs yet. Mostly, because user-level verification requires a concurrency logic on top of the kernel. There is work underway in multiple groups to address that.
One exception to needing concurrency is the user-level system 
initialiser (because it is the only process that runs at that time).  
Proofs here: https://github.com/seL4/l4v/tree/master/sys-init
Other current approaches to user-level verification are to ignore the concurrency aspect and use sequential verification via SMT or other techniques for user-level programs, with a manually crafted adaptation of the seL4 spec. The UNSW folks have had some success with that, and I???ll let them comment on it.
But also, I'm a bit confused on what kind of assumptions are needed to
build verified user-level code in the face of revocable memory.  I.e.,
it seems that one needs to assume (and trust) that the memory of any
verified (or even rather mission critical) process is never revoked.
Yes, of course. In static systems that is simple and guaranteed by correct system initialisation. In a normal CAmkES or Microkit system, nobody in the system has the authority to revoke the memory of other components after initialisation, and seL4???s integrity theorem gives you that this stays this way.
In dynamic systems, that is less easy and requires a proof about the components that do have authority to revoke memory of other components.
Maybe I'm wrong and missing something and you have a way in mind for how
to bypass such trust. Or maybe one can argue that such trust is
reasonable, which is probably true for a "flat" system in which the
mission critical domain is created right from the trusted boot process
or some trusted TCB above it.
Those systems don???t need to flat, they could easily be hierarchical. The important part is that they are static in the sense that authority to revoke no longer exists in the system after (verified) user-level initialisation is done. In that kind of system, the problem is not a problem. Most current high-assurance systems are in that class.
Dynamic systems are more interesting, but that is because resource re-use inherently has this problem. It doesn???t really come from anything related to seL4. If you can destroy a resource, the chain that leads to that action needs to be trusted. If you cannot destroy resources, but can create them, you immediately have the potential for denial of service. seL4 provides the mechanisms to do all that, but it doesn???t prescribe how you use them.
Yet it seems hard to argue that such trust
is reasonable when the recursion is deeper, e.g., the boot process
launches Linux, and then Linux tries to start some kind of a mission
critical process like a microkernel-protected software TEE or something
(out of its memory hence it can later revoke it). And one wants to build
a proof about this TCB yet such proof needs to deal with memory that can
be revoked at any time or assume that revocation is impossible.
You would never give Linux authority to directly create and revoke memory that others need to trust. In fact, something like Linux should never get much direct seL4 authority at all. It???ll usually be managed by a virtual machine monitor and even that would be constrained in authority as much as you can.
If you want an untrusted Linux process to start something trusted, you???d add a separate management component that is small and can be trusted, which Linux can then signal to start something. In a dynamic system, you???d usually want to do that anyway, just for keeping dynamic seL4 resource and authority handling confined and verifiable (you could imagine such management components to be nested and hierarchical as well).
I would still question the wisdom of letting a large untrusted code base like Linux trigger anything trusted. How do you know it does so when you want it to? It could just pretend to any client that it triggered the trusted component, but fake a result instead. Etc. There are ways around most of these problems, but it???d usually better to architect the system in a way that doesn???t require that trust inversion in the first place. The exception is when you build multiple system versions to migrate between architectures, then an intermediate state like that makes sense.
Just wandering what is the take on this. I apologize if this all is
covered somewhere in seL4 docs and I just didn't dig deep enough.
I don???t think it is covered explicitly in the docs, it???s too high level for that. It would be a good idea write up some of higher-level concepts of how things can be organised to minimise trust.
Cheers,
Gerwin