seL4 for general purpose/desktop operating systems?

older
Deprecating old boards in the CI...

Isaac Beckett

8 Mar 2025 8 Mar '25

6:34 a.m.

Hello! I saw Microkit 2.0 has just been released, and it’s prompted a question I’ve been thinking about since before Microkit, when CaMKes was the main thing: Both Microkit and Camkes are designed for embedded and/or static systems, where there is no detection/configuration of hardware at runtime, and in Microkit no swapping of components such as drivers. Is there any work ongoing to use seL4 for a general purpose operating system? Like, a situation where you may have e.g. PC style hardware that can have all sorts of reconfiguration between power-off and next boot, or even while booted and running. Of course this introduces a lot of attack surface, but it makes a lot more sense for a general purpose computer, rather than an embedded one. Is there any recent work/prior art on using seL4 outside of the context of deeply embedded systems? Thanks, Isaac

Show replies by date

Gernot Heiser

8 Mar 8 Mar

6:57 a.m.

On 8 Mar 2025, at 06:34, Isaac Beckett via Devel <devel@sel4.systems> wrote:

...

Is there any work ongoing to use seL4 for a general purpose operating system? Like, a situation where you may have e.g. PC style hardware that can have all sorts of reconfiguration between power-off and next boot, or even while booted and running.

There is, but it’s presently unfunded and as such on the back burner: https://trustworthy.systems/projects/smos/ It’s not (at this time) meant to be a PC OS, but a proof-of-concept for a general-purpose OS. Gernot

Andrew Warkentin

10:29 a.m.

On Fri, Mar 7, 2025 at 8:37 PM Isaac Beckett via Devel <devel@sel4.systems> wrote:

...

Hello! I saw Microkit 2.0 has just been released, and it’s prompted a question I’ve been thinking about since before Microkit, when CaMKes was the main thing:

Both Microkit and Camkes are designed for embedded and/or static systems, where there is no detection/configuration of hardware at runtime, and in Microkit no swapping of components such as drivers.

Is there any work ongoing to use seL4 for a general purpose operating system? Like, a situation where you may have e.g. PC style hardware that can have all sorts of reconfiguration between power-off and next boot, or even while booted and running.

Of course this introduces a lot of attack surface, but it makes a lot more sense for a general purpose computer, rather than an embedded one.

Is there any recent work/prior art on using seL4 outside of the context of deeply embedded systems?

I was trying to base the QNX-like OS I'm writing < https://gitlab.com/uxrt/uxrt-toplevel> on seL4, but I gave up and decided to fork it when I couldn't efficiently implement QNX-like IPC (which involves copying arbitrary-length buffers directly between address spaces) on it. Eventually it will probably end up diverging from seL4 quite significantly. At some point I may write a companion OS based on seL4 proper with a compatible subset of the API for static deeply-embedded systems, but that isn't a priority for me. I'm a bit skeptical of the benefits of kernel verification alone for what I'm writing, since most of security is managed by the process server, which is unverified. Verification is going to be significantly more difficult for dynamic systems in general, since whatever user-level servers are responsible for handing out capabilities will themselves have to be verified if kernel verification is to actually be meaningful (and verifying such servers is probably going to be more difficult than verifying the kernel).

Gernot Heiser

1:09 p.m.

On 8 Mar 2025, at 10:29, Andrew Warkentin via Devel <devel@sel4.systems> wrote:

...

I was trying to base the QNX-like OS I'm writing < https://gitlab.com/uxrt/uxrt-toplevel> on seL4, but I gave up and decided to fork it when I couldn't efficiently implement QNX-like IPC (which involves copying arbitrary-length buffers directly between address spaces) on it. Eventually it will probably end up diverging from seL4 quite significantly.

I don’t see a reason for providing bulk data copying as a kernel service (despite Liedtke doing this in the original L4, in violation of his own minimality principle). It can (and therefore should) be provided as a user-level service, which is why we dropped it in seL4. And, done properly, it’ll easily outperform QNX’s in-kernel implementation.

...

I'm a bit skeptical of the benefits of kernel verification alone for what I'm writing, since most of security is managed by the process server, which is unverified. Verification is going to be significantly more difficult for dynamic systems in general, since whatever user-level servers are responsible for handing out capabilities will themselves have to be verified if kernel verification is to actually be meaningful (and verifying such servers is probably going to be more difficult than verifying the kernel).

Our design (details of which are still very much in flux and thus too early to publish) reduces the TCB to something that should be verifiable. Gernot

Andrew Warkentin

10:02 p.m.

On Sat, Mar 8, 2025 at 3:12 AM Gernot Heiser via Devel <devel@sel4.systems> wrote:

...

I don’t see a reason for providing bulk data copying as a kernel service (despite Liedtke doing this in the original L4, in violation of his own minimality principle). It can (and therefore should) be provided as a user-level service, which is why we dropped it in seL4. And, done properly, it’ll easily outperform QNX’s in-kernel implementation.

I can't come up with a way for such a service to determine the server's address space short of tying endpoints to server address spaces, which is something I want to avoid, since I want the option of handing off connections to threads/processes with different address spaces. The other issue is with mapping the source and destination address spaces; the most "straightforward" ways I can think of to do this would either be to map both source and destination buffers on every IPC, which would be extremely inefficient (especially with the requirement to have a separate capability for each VSpace a page or table is mapped into, meaning one system call would become potentially hundreds or more depending on the buffer sizes), or have an identity mapping of all of physical memory with user-level shadow page tables to look up each page before copying it, which could also be somewhat slower than what could be done in the kernel, where you could have table-sized address space windows reserved for source and destination address spaces, with only two table updates required if both buffers lie within a single table's address space. I did actually end up using the kernel's physical address space window and software walking of page tables in my initial in-kernel implementation since it might have ended up being slightly easier to do initially, but at some point I want to replace it with a table-window-based setup.

...

Our design (details of which are still very much in flux and thus too early to publish) reduces the TCB to something that should be verifiable.

Presumably this has a novel API designed specifically for verification and relegates Unix programs to a "penalty box"? I'm specifically trying to write a better QNX than QNX and a better Linux than Linux that allows incremental porting and provides all of its advanced features to Unix software, rather than a completely new system that has a separate limited compatibility environment for existing software, so being able to verify a dynamic system with a native API that's incompatible with everything else doesn't help me at all. I suspect that verifying a Unix-like OS (even something like what I'm writing, with read and write being its only real primitives) could end up being rather difficult, although I don't have a background in formal verification so I'm not completely sure on that.

Gernot Heiser

9 Mar 9 Mar

8:49 a.m.

On 8 Mar 2025, at 22:02, Andrew Warkentin via Devel <devel@sel4.systems> wrote:

...

On Sat, Mar 8, 2025 at 3:12 AM Gernot Heiser via Devel <devel@sel4.systems> wrote:

...
I don’t see a reason for providing bulk data copying as a kernel service (despite Liedtke doing this in the original L4, in violation of his own minimality principle). It can (and therefore should) be provided as a user-level service, which is why we dropped it in seL4. And, done properly, it’ll easily outperform QNX’s in-kernel implementation.

I can't come up with a way for such a service to determine the server's address space short of tying endpoints to server address spaces, which is something I want to avoid, since I want the option of handing off connections to threads/processes with different address spaces. The other issue is with mapping the source and destination address spaces; the most "straightforward" ways I can think of to do this would either be to map both source and destination buffers on every IPC, which would be extremely inefficient (especially with the requirement to have a separate capability for each VSpace a page or table is mapped into, meaning one system call would become potentially hundreds or more depending on the buffer sizes), or have an identity mapping of all of physical memory with user-level shadow page tables to look up each page before copying it, which could also be somewhat slower than what could be done in the kernel, where you could have table-sized address space windows reserved for source and destination address spaces, with only two table updates required if both buffers lie within a single table's address space. I did actually end up using the kernel's physical address space window and software walking of page tables in my initial in-kernel implementation since it might have ended up being slightly easier to do initially, but at some point I want to replace it with a table-window-based setup.

Why so complicated? The equivalence of putting the functionality in the kernel, where the kernel has access to all user state, is to have a trusted server which has access to all client state (especially CSpaces). The TCB is the same, but you don’t break verification and retain the option to verify the trusted server. This approach will work for all designs. In a specific system design you’re likely have opportunities to simplify/modularise/reduce the power of the server, making verification easier. But even the model of the all-powerful server is better (more assured) than the model of hacking the kernel and completely abandoning verification. Gernot

Demi Marie Obenour

9:04 a.m.

On 3/8/25 4:49 PM, Gernot Heiser via Devel wrote:

...

On 8 Mar 2025, at 22:02, Andrew Warkentin via Devel <devel@sel4.systems> wrote:

...
On Sat, Mar 8, 2025 at 3:12 AM Gernot Heiser via Devel <devel@sel4.systems> wrote:

...
I don’t see a reason for providing bulk data copying as a kernel service (despite Liedtke doing this in the original L4, in violation of his own minimality principle). It can (and therefore should) be provided as a user-level service, which is why we dropped it in seL4. And, done properly, it’ll easily outperform QNX’s in-kernel implementation.

I can't come up with a way for such a service to determine the server's address space short of tying endpoints to server address spaces, which is something I want to avoid, since I want the option of handing off connections to threads/processes with different address spaces. The other issue is with mapping the source and destination address spaces; the most "straightforward" ways I can think of to do this would either be to map both source and destination buffers on every IPC, which would be extremely inefficient (especially with the requirement to have a separate capability for each VSpace a page or table is mapped into, meaning one system call would become potentially hundreds or more depending on the buffer sizes), or have an identity mapping of all of physical memory with user-level shadow page tables to look up each page before copying it, which could also be somewhat slower than what could be done in the kernel, where you could have table-sized address space windows reserved for source and destination address spaces, with only two table updates required if both buffers lie within a single table's address space. I did actually end up using the kernel's physical address space window and software walking of page tables in my initial in-kernel implementation since it might have ended up being slightly easier to do initially, but at some point I want to replace it with a table-window-based setup.

Why so complicated?

The equivalence of putting the functionality in the kernel, where the kernel has access to all user state, is to have a trusted server which has access to all client state (especially CSpaces). The TCB is the same, but you don’t break verification and retain the option to verify the trusted server. This approach will work for all designs.

In a specific system design you’re likely have opportunities to simplify/modularise/reduce the power of the server, making verification easier. But even the model of the all-powerful server is better (more assured) than the model of hacking the kernel and completely abandoning verification.

There are a couple classes of designs I am not sure how to implement efficiently in this model: 1. Systems that run on many loosely coupled cores where statically partitioning the kernel heap requires either reserving too much memory at startup, or playing silly tricks like adding and removing page table entries in response to page faults to keep the kernel heap size down. 2. Systems where the syscall overhead is too high because of speculative execution mitigations. On x86, each context switch between userspace processes requires an IBPB, but entering the kernel often doesn’t. I suspect there is other hardware that tags various internal data structures with the current execution privilege as well, and requires costly flushes when switching between mutually distrusting code running at the same CPU privilege level. The trusted server requires 2 such switches, whereas kernel extensions only need 1. -- Sincerely, Demi Marie Obenour (she/her/hers)

Gernot Heiser

9:56 a.m.

On 9 Mar 2025, at 09:04, Demi Marie Obenour via Devel <devel@sel4.systems> wrote:

...

On 3/8/25 4:49 PM, Gernot Heiser via Devel wrote:

...
Why so complicated?

The equivalence of putting the functionality in the kernel, where the kernel has access to all user state, is to have a trusted server which has access to all client state (especially CSpaces). The TCB is the same, but you don’t break verification and retain the option to verify the trusted server. This approach will work for all designs.

In a specific system design you’re likely have opportunities to simplify/modularise/reduce the power of the server, making verification easier. But even the model of the all-powerful server is better (more assured) than the model of hacking the kernel and completely abandoning verification.

There are a couple classes of designs I am not sure how to implement efficiently in this model:

1. Systems that run on many loosely coupled cores where statically partitioning the kernel heap requires either reserving too much memory at startup, or playing silly tricks like adding and removing page table entries in response to page faults to keep the kernel heap size down.

How does Linux do this? As I keep saying, you’ll need to look at seL4 IPC (or PPC as I prefer to call it) as the seL4 equivalent of a Linux system call: the mechanism to switch protection state. In Linux you don’t call into the kernel on a different core, you change protection state on you own core (and the kernel may chose to hand off to a worker thread somewhere else). In that model it’s clear that you invoke the server on the same core. Depending on design, this may be invoking a shared, passive server locally, which hand off to a worker. Or you invoke a per-core server (meaning there’s a server thread and endpoint per core (or group of closely coupled core). This is system-specific policy. Trying to do this all in the kernel just forces policy in the kernel, which is exactly what a microkernel should avoid.

...

2. Systems where the syscall overhead is too high because of speculative execution mitigations. On x86, each context switch between userspace processes requires an IBPB, but entering the kernel often doesn’t.

This is a non-sequitur. You need to take action for timing-channel prevention when you cross a security domain, Going into and out of the kernel doesn’t, and as such doesn’t require such action. The same is going into and out of a trusted server. Preventing the server from leaking through any shared state (architected or not) is not different from preventing the kernel from leaking through shared state.

...

I suspect there is other hardware that tags various internal data structures with the current execution privilege as well, and requires costly flushes when switching between mutually distrusting code running at the same CPU privilege level. The trusted server requires 2 such switches, whereas kernel extensions only need 1.

The typical fallacy of trying to solve securty problems “transparently” in hardare, which always forces doing things unnecessarily. Having said that, the argument that syscalls/IPC overhead limits performance doesn’t become more true by repeating it (it’s seeing a renaissance in the scientific literature too). I’m yet to see a realistic use case where seL4 IPC cost is performance limiting, except with poor user-level design. Gernot

Gernot Heiser

10:05 a.m.

On 9 Mar 2025, at 09:56, Gernot Heiser via Devel <devel@sel4.systems> wrote: Having said that, the argument that syscalls/IPC overhead limits performance doesn’t become more true by repeating it (it’s seeing a renaissance in the scientific literature too). I’m yet to see a realistic use case where seL4 IPC cost is performance limiting, except with poor user-level design. Note: this IPC-cost argument is most obviously irrelevant in the present context: a bulk-data copying service. The copying cost will completely dominate the IPC cost. Gernot

Demi Marie Obenour

5:26 a.m.

On 3/7/25 9:09 PM, Gernot Heiser via Devel wrote:

...

On 8 Mar 2025, at 10:29, Andrew Warkentin via Devel <devel@sel4.systems> wrote:

...
I was trying to base the QNX-like OS I'm writing < https://gitlab.com/uxrt/uxrt-toplevel> on seL4, but I gave up and decided to fork it when I couldn't efficiently implement QNX-like IPC (which involves copying arbitrary-length buffers directly between address spaces) on it. Eventually it will probably end up diverging from seL4 quite significantly.

I don’t see a reason for providing bulk data copying as a kernel service (despite Liedtke doing this in the original L4, in violation of his own minimality principle). It can (and therefore should) be provided as a user-level service, which is why we dropped it in seL4. And, done properly, it’ll easily outperform QNX’s in-kernel implementation.

...
I'm a bit skeptical of the benefits of kernel verification alone for what I'm writing, since most of security is managed by the process server, which is unverified. Verification is going to be significantly more difficult for dynamic systems in general, since whatever user-level servers are responsible for handing out capabilities will themselves have to be verified if kernel verification is to actually be meaningful (and verifying such servers is probably going to be more difficult than verifying the kernel).

Our design (details of which are still very much in flux and thus too early to publish) reduces the TCB to something that should be verifiable.

What do you consider to be the TCB? To an end user, a computing system is defined by its inputs and outputs. This means that whatever the system’s identity is from the perspective of the outside world is part of the TCB. For a server, that identity can be a cryptographic secret key that only trusted code can use. For other systems, that identity is the ability to access sensor data and control whatever effectors the system has. This means that drivers for critical sensors and effectors are part of the TCB. In the case of a headful system (desktop/laptop/mobile), the most critical effector is the display, and the most critical sensors are some combination of a keyboard, mouse, touchpad, and/or touchscreen. This means that the drivers for these are also in the TCB, because they can impersonate the user (to the system) or the system (to the user). Do you plan to verify these drivers? -- Sincerely, Demi Marie Obenour (she/her/hers)

Ihor Kuz

9:24 a.m.

...

On 8 Mar 2025, at 06:34, Isaac Beckett via Devel <devel@sel4.systems> wrote:

Is there any work ongoing to use seL4 for a general purpose operating system? Like, a situation where you may have e.g. PC style hardware that can have all sorts of reconfiguration between power-off and next boot, or even while booted and running.

Kry10 (www.kry10.com) has the Kry10 OS, which is a (commercial) seL4-based OS that allows components to be started, stopped, added and removed during runtime. While it's not designed as a general purpose OS (like a desktop or server OS) it is targeting industrial (and other) systems where upgrades and some dynamicity are desired. Have a look at our various seL4 summit presentations for more details. Ihor.

155

Age (days ago)

156

Last active (days ago)

List overview

Download

10 comments

5 participants

participants (5)

Andrew Warkentin
Demi Marie Obenour
Gernot Heiser
Ihor Kuz
Isaac Beckett