Operating Systems built on top of seL4

Sidhartha Agrawal

31 Jul 2021 31 Jul '21

6:21 a.m.

Hello sel4-devs, I am a first-year graduate student at the University of British Columbia at their Systopia Lab <https://systopia.cs.ubc.ca/> working on operating systems research. My advisor is Prof. Margo Seltzer <https://www.seltzer.com/margo>. In a past life, I have worked as a software engineer at Oracle and Arista Networks. My research is on General Purpose Isolation Mechanism in the OS. We ask whether we really need to have N different isolation mechanisms or whether instead, we could develop a framework in which all these different mechanisms represent points on a continuum. Here is a write-up <https://sid-agrawal.ca/#research-projects> with some more details. I am using seL4 as the microkernel base to build my research idea. At this stage, I am not looking in to the formal verification aspect of seL4, but that might change once I have a better idea about the type of mechanism I want to build. I would also like to keep the changes in the kernel to be minimum and try to demonstrate my ideas using user space processes. I have tried out the tutorials on the seL4 website. Currently, I am wading through the different helper libraries in seL4/seL4_libs <https://github.com/seL4/seL4_libs>. As someone who is just starting with seL4 ecosystem, I had a few high-level questions: - Are there are example OSes built with seL4, which would be a good place to look at? Specifically, I am looking for components such as basic versions of a process-server, vm-server, pager, file-systems server. - I came across refOS <https://github.com/seL4/refos>, but wasn't sure if it is still being maintained. - Another option was Genode <https://genode.org/documentation/general-overview/index>. This seemed to be maintained by another community and has many layers of abstractions since it supports multiple kernel backends. I am looking forward to engaging with this community, learning about seL4 and making meaningful contributions. Best, Sid sid-agrawal.ca

Show replies by date

Andrew Warkentin

31 Jul 31 Jul

11:49 a.m.

On 7/30/21, Sidhartha Agrawal <siagraw@cs.ubc.ca> wrote:

...

Hello sel4-devs, I am a first-year graduate student at the University of British Columbia at their Systopia Lab <https://systopia.cs.ubc.ca/> working on operating systems research. My advisor is Prof. Margo Seltzer <https://www.seltzer.com/margo>. In a past life, I have worked as a software engineer at Oracle and Arista Networks. My research is on General Purpose Isolation Mechanism in the OS. We ask whether we really need to have N different isolation mechanisms or whether instead, we could develop a framework in which all these different mechanisms represent points on a continuum. Here is a write-up <https://sid-agrawal.ca/#research-projects> with some more details. I am using seL4 as the microkernel base to build my research idea. At this stage, I am not looking in to the formal verification aspect of seL4, but that might change once I have a better idea about the type of mechanism I want to build. I would also like to keep the changes in the kernel to be minimum and try to demonstrate my ideas using user space processes.

I have tried out the tutorials on the seL4 website. Currently, I am wading through the different helper libraries in seL4/seL4_libs <https://github.com/seL4/seL4_libs>. As someone who is just starting with seL4 ecosystem, I had a few high-level questions:

- Are there are example OSes built with seL4, which would be a good place to look at? Specifically, I am looking for components such as basic versions of a process-server, vm-server, pager, file-systems server. - I came across refOS <https://github.com/seL4/refos>, but wasn't sure if it is still being maintained. - Another option was Genode <https://genode.org/documentation/general-overview/index>. This seemed to be maintained by another community and has many layers of abstractions since it supports multiple kernel backends.

I'm working on my own QNX-like OS based on seL4 <https://gitlab.com/uxrt/>, although I haven't gotten very far and am still trying to finish up the memory allocators. UX/RT will have its own kind of general purpose isolation mechanism based around file security. It will be the first OS that I'm aware of under which everything truly is a file (even things like process heaps/stacks and the connection to the VFS itself will be files). Each process will have a list of files it is allowed to access (entries will apply to either a particular file or to all files in a particular directory). Since everything will be a file, basically all security will be reduced to file permissions, unlike in OSes with non-file-based primitives. On top of the file permissions there will be a daemon that implements a flexible form of role-based access control with user-definable roles. There will also be mechanisms to securely grant access to files, avoiding confused deputy vulnerabilities but only requiring relatively minor modifications to existing Unix applications rather than porting to a completely new system. There's also the Robigalia project <https://gitlab.com/robigalia/>, although they currently aren't as far along as I am (they abandoned their previous allocators, which I ended up forking, and are still only working on kernel bindings). Unlike what I am trying to do with UX/RT (write an OS that is superior to mainstream OSes while still being natively compatible with existing applications), they are wanting to build something completely new (with only limited sandbox-based backwards compatibility) based on persistent processes and a capability-oriented architecture. Otherwise, I'm not aware of any open seL4-based OSes besides the ones you mentioned. There are a few companies selling seL4-based embedded OSes but they are proprietary AFAIK.

Hugo V.C.

5:10 p.m.

Hi Andrew, your project looked interesting to me too so I decided to give it a look and I coud read here: https://gitlab.com/uxrt/core-supervisor/sel4-kernel "The seL4 microkernel with patches for UX/RT" But my question is: by modifying the seL4 kernel, don't you loose the security of the formal verification? I mean, the nice thing of a formally verified code is you can forget about it's security. Don't you fear you can introduce vulnerabilities? Cheers, El sáb., 31 jul. 2021 3:53, Andrew Warkentin <andreww591@gmail.com> escribió:

...

On 7/30/21, Sidhartha Agrawal <siagraw@cs.ubc.ca> wrote:

...
Hello sel4-devs, I am a first-year graduate student at the University of British Columbia at their Systopia Lab <https://systopia.cs.ubc.ca/> working on operating systems research. My advisor is Prof. Margo Seltzer <https://www.seltzer.com/margo>. In a past life, I have worked as a software engineer at Oracle and Arista Networks. My research is on General Purpose Isolation Mechanism in the OS. We ask whether we really need to have N different isolation mechanisms or whether instead, we could develop a framework in which all these different mechanisms represent points on a continuum. Here is a write-up < https://sid-agrawal.ca/#research-projects> with some more details. I am using seL4 as the microkernel base to build my research idea. At this stage, I am not looking in to the formal verification aspect of seL4, but that might change once I have a better idea about the type of mechanism I want to build. I would also like to keep the changes in the kernel to be minimum and try to demonstrate my ideas using user space processes.

I have tried out the tutorials on the seL4 website. Currently, I am wading through the different helper libraries in seL4/seL4_libs <https://github.com/seL4/seL4_libs>. As someone who is just starting with seL4 ecosystem, I had a few high-level questions:

- Are there are example OSes built with seL4, which would be a good place to look at? Specifically, I am looking for components such as basic versions of a process-server, vm-server, pager, file-systems server. - I came across refOS <https://github.com/seL4/refos>, but wasn't sure if it is still being maintained. - Another option was Genode <https://genode.org/documentation/general-overview/index>. This seemed to be maintained by another community and has many layers of abstractions since it supports multiple kernel backends.

I'm working on my own QNX-like OS based on seL4 <https://gitlab.com/uxrt/>, although I haven't gotten very far and am still trying to finish up the memory allocators. UX/RT will have its own kind of general purpose isolation mechanism based around file security. It will be the first OS that I'm aware of under which everything truly is a file (even things like process heaps/stacks and the connection to the VFS itself will be files). Each process will have a list of files it is allowed to access (entries will apply to either a particular file or to all files in a particular directory). Since everything will be a file, basically all security will be reduced to file permissions, unlike in OSes with non-file-based primitives. On top of the file permissions there will be a daemon that implements a flexible form of role-based access control with user-definable roles. There will also be mechanisms to securely grant access to files, avoiding confused deputy vulnerabilities but only requiring relatively minor modifications to existing Unix applications rather than porting to a completely new system.

There's also the Robigalia project <https://gitlab.com/robigalia/>, although they currently aren't as far along as I am (they abandoned their previous allocators, which I ended up forking, and are still only working on kernel bindings). Unlike what I am trying to do with UX/RT (write an OS that is superior to mainstream OSes while still being natively compatible with existing applications), they are wanting to build something completely new (with only limited sandbox-based backwards compatibility) based on persistent processes and a capability-oriented architecture.

Otherwise, I'm not aware of any open seL4-based OSes besides the ones you mentioned. There are a few companies selling seL4-based embedded OSes but they are proprietary AFAIK. _______________________________________________ Devel mailing list -- devel@sel4.systems To unsubscribe send an email to devel-leave@sel4.systems

Andrew Warkentin

5:35 p.m.

On 7/31/21, Hugo V.C. <skydivebcn@gmail.com> wrote:

...

Hi Andrew,

your project looked interesting to me too so I decided to give it a look and I coud read here:

https://gitlab.com/uxrt/core-supervisor/sel4-kernel

"The seL4 microkernel with patches for UX/RT"

But my question is: by modifying the seL4 kernel, don't you loose the security of the formal verification? I mean, the nice thing of a formally verified code is you can forget about it's security. Don't you fear you can introduce vulnerabilities?

Cheers,

The patches are only for the boot code at the moment, which isn't part of the verification. However, I may end up having to fork seL4 at some point because the seL4 project's priorities seem to be heavily focused on static deeply embedded systems rather than desktops, servers, and large embedded systems. UX/RT will be extremely centralized in its capability management with CSpaces, VSpaces, and UTSpaces all managed exclusively by the process server and no raw seL4 APIs exposed to other processes (the base standard library will of course call them but the symbols won't be exported). Therefore the process server will be almost as important as the kernel when it comes to security. I'd be surprised if it doesn't end up being way too complex to verify since it will be highly dynamic and will directly implement (a subset of) the Unix API, which was never designed with verification in mind, so losing kernel verification probably won't make much of a difference. Even without verification, UX/RT should be far more secure than any mainstream OS.

Gernot Heiser

6:19 p.m.

On 31 Jul 2021, at 17:35, Andrew Warkentin <andreww591@gmail.com> wrote:

...

The patches are only for the boot code at the moment, which isn't part of the verification. However, I may end up having to fork seL4 at some point

While we cannot prevent forking (but strongly discourage it), please note that you cannot use the name seL4 for an incompatible fork (with a research exception for “experimental” versions, see https://sel4.systems/Foundation/Trademark/).

...

because the seL4 project's priorities seem to be heavily focused on static deeply embedded systems rather than desktops, servers, and large embedded systems.

seL4’s notion of priorities are microkernel-style: simple mechanisms that are as policy-free as possible. Actual policies (like the Linux CFS) can (and should) easily be implemented at user mode. See Lyons et al [2018] which implemented EDF scheduling at user level: https://trustworthy.systems/publications/csiroabstracts/Lyons_MAH_18.abstrac... Gernot

Andrew Warkentin

6:57 p.m.

On 7/31/21, Gernot Heiser <gernot@unsw.edu.au> wrote:

...

While we cannot prevent forking (but strongly discourage it), please note that you cannot use the name seL4 for an incompatible fork (with a research exception for “experimental” versions, see https://sel4.systems/Foundation/Trademark/).

I'm well aware of that. If I do end up forking I would rename my fork. Since there will be no raw API access or multi-personality support, seL4 will really just be an implementation detail as far as everything outside the process server and base system library will be concerned.

...

seL4’s notion of priorities are microkernel-style: simple mechanisms that are as policy-free as possible. Actual policies (like the Linux CFS) can (and should) easily be implemented at user mode. See Lyons et al [2018] which implemented EDF scheduling at user level: https://trustworthy.systems/publications/csiroabstracts/Lyons_MAH_18.abstrac...

I don't really want to add stuff to the kernel unless I absolutely have no other options. I can think of at least a few relatively simple features I want that probably wouldn't get accepted. One is a system call origin limit to allow trapping foreign system calls for things like Linux binary compatibility. I'd mentioned this earlier and someone said something like that could be done with virtualization, but it is completely unacceptable to require that hardware virtualization be enabled just to run Linux binaries, not to mention the requirement for a server to intercept every system call (foreign or native) in such an environment, whereas an origin limit would allow for a purely library-based compatibility layer that can replace traps with function calls as they occur. Another I can think of is a hook for a per-core thread that automatically migrates a thread from another core when the original assigned core is running another thread but another core is free. I can't think of any way to do it well without a hook in the kernel to run a core's migration thread only when that core is idle, since the migration thread would always have to be runnable. Is there any way automatic migration can be done without such a hook and without the migration thread having to sleep? Also, if cross-core IPC gets removed that would pose a problem since UX/RT will have a uniform Unix-superset IPC model (with extra read/write-type functions that expose registers and a shared buffer) that needs to work between arbitrary threads regardless of if they are on different cores. Support for passive threads will be exposed to user programs, but I definitely don't want to require that one thread in every IPC exchange must be passive.

Gernot Heiser

8:38 p.m.

On 31 Jul 2021, at 18:57, Andrew Warkentin <andreww591@gmail.com> wrote:

...

One is a system call origin limit to allow trapping foreign system calls for things like Linux binary compatibility. I'd mentioned this earlier and someone said something like that could be done with virtualization, but it is completely unacceptable to require that hardware virtualization be enabled just to run Linux binaries, not to mention the requirement for a server to intercept every system call (foreign or native) in such an environment, whereas an origin limit would allow for a purely library-based compatibility layer that can replace traps with function calls as they occur.

I’m not sure I understand what you’re after here. Obviously, a requirement to run (presumably unmodified) Linux libraries *is* a virtualisation case, by definition. Running in a hardware VM is one way to do this. Paravirtualising the (dynamically-linked) Linux libraries to IPC to a Linux server rather than trying to do a direct Linux syscall would be another one, and it shouldn’t be hard to do this form of paravirtualisation automatically. (We did it way back with the actual Linux kernel, which is a much harder thing to do.) But even if for some reason that isn’t clear to me you insist on no changes at all to the Linux program (not even dynamic libraries? why?) then that would be quite solvable as well: Any seL4 syscall must be authorised by a cap, and the Linux syscalls won’t present a cap, so trigger an exception, and get forwarded to an exception handler. That receives the fault address and can emulate the Linux functionality.

...

Another I can think of is a hook for a per-core thread that automatically migrates a thread from another core when the original assigned core is running another thread but another core is free. I can't think of any way to do it well without a hook in the kernel to run a core's migration thread only when that core is idle, since the migration thread would always have to be runnable. Is there any way automatic migration can be done without such a hook and without the migration thread having to sleep?

Detecting an idle core is trivial: You have a lowest-prio thread on each core that executes a loop which does nothing but signal the load balancer, which is a high-prio thread waiting on that notification. That’s your “hook”. Rest is left as an exercise for the reader ;-)

...

Also, if cross-core IPC gets removed that would pose a problem since UX/RT will have a uniform Unix-superset IPC model (with extra read/write-type functions that expose registers and a shared buffer) that needs to work between arbitrary threads regardless of if they are on different cores. Support for passive threads will be exposed to user programs, but I definitely don't want to require that one thread in every IPC exchange must be passive.

seL4 IPC isn’t like Linux IPC – it’s a user-controlled context switch, a mechanism for implementing protected procedure calls (https://microkerneldude.wordpress.com/2019/03/07/how-to-and-how-not-to-use-s...). Hence doing seL4 IPC across cores makes no sense. Layering a Linux-like IPC model on top (probably using Notifications rather than IPC) doesn’t require kernel changes. Gernot

Andrew Warkentin

1 Aug 1 Aug

9:25 a.m.

On 7/31/21, Gernot Heiser <gernot@unsw.edu.au> wrote:

...

Paravirtualising the (dynamically-linked) Linux libraries to IPC to a Linux server rather than trying to do a direct Linux syscall would be another one, and it shouldn’t be hard to do this form of paravirtualisation automatically. (We did it way back with the actual Linux kernel, which is a much harder thing to do.)

But even if for some reason that isn’t clear to me you insist on no changes at all to the Linux program (not even dynamic libraries? why?) then that would be quite solvable as well: Any seL4 syscall must be authorised by a cap, and the Linux syscalls won’t present a cap, so trigger an exception, and get forwarded to an exception handler. That receives the fault address and can emulate the Linux functionality.

I don't want to require Linux applications to use a specific patched libc since that would create all kinds of compatibility issues. I'm writing a general-purpose OS that is just as likely to be used on desktops and servers as on embedded systems, and I want to support as many Linux binaries as possible (ideally everything that doesn't depend on certain types of kernel modules and doesn't manage sessions/logins). Requiring a specific libc would significantly limit compatibility. Wouldn't there be a risk that a Linux system call would present an argument that happens to look like a capability and not get intercepted if system calls were implemented by just catching the existing invalid-syscall exceptions?

...

seL4 IPC isn’t like Linux IPC – it’s a user-controlled context switch, a mechanism for implementing protected procedure calls (https://microkerneldude.wordpress.com/2019/03/07/how-to-and-how-not-to-use-s...). Hence doing seL4 IPC across cores makes no sense. Layering a Linux-like IPC model on top (probably using Notifications rather than IPC) doesn’t require kernel changes.

QNX has IPC that is similar in concept to seL4 (i.e. functioning as a fast context switch whenever possible) but has no caveats or restrictions on cross-core use that I could find. And UX/RT's Unix-style filesystem API won't be one RPC service among many like on most academic and/or "Mach-like" microkernel OSes. It will be the sole form of user-visible IPC. This is sort of like on QNX but will go a bit further (QNX still provides user access to raw kernel IPC but it uses filesystem-based naming for basically all servers, at least in Neutrino). Like I said, there will be additional "rare mode" register- and shared-buffer-based versions of read/write (interoperable with the traditional versions) to allow for zero-copy IPC and eliminate the need for user-visible raw IPC. No support for cross-core endpoints would mean that UX/RT wouldn't be able to use endpoints for IPC at all and would have to only use notifications, which would significantly impact performance since the fastpath would never be used.

Gernot Heiser

10:09 a.m.

On 1 Aug 2021, at 09:25, Andrew Warkentin <andreww591@gmail.com<mailto:andreww591@gmail.com>> wrote: Wouldn't there be a risk that a Linux system call would present an argument that happens to look like a capability and not get intercepted if system calls were implemented by just catching the existing invalid-syscall exceptions? I believe (but am not definite, things change over time) that seL4 syscall numbers don’t overlap Linux syscall numbers which would preclude that totally. But even if not, the chances of parameters to a Linux syscall magically matching arguments of a valid seL4 syscall are pretty remote. This is not a security issue, of course, all that app can do is mess with itself. Gernot

Jimmy Brush

10:43 a.m.

On 8/1/21, Gernot Heiser wrote:

...

I believe (but am not definite, things change over time) that seL4 syscall numbers don’t overlap Linux syscall numbers which would preclude that totally. But even if not, the chances of parameters to a Linux syscall magically matching arguments of a valid seL4 syscall are pretty remote. This is not a security issue, of course, all that app can do is mess with itself.

Note that on some architectures seL4 puts the system call number in a different register than linux does. This includes x86_64 and aarch64. - JB

William ML Leslie

10:12 a.m.

On Sun, 1 Aug 2021 at 09:28, Andrew Warkentin <andreww591@gmail.com> wrote:

...

I don't want to require Linux applications to use a specific patched libc since that would create all kinds of compatibility issues. I'm writing a general-purpose OS that is just as likely to be used on desktops and servers as on embedded systems, and I want to support as many Linux binaries as possible (ideally everything that doesn't depend on certain types of kernel modules and doesn't manage sessions/logins). Requiring a specific libc would significantly limit compatibility.

+1. Using a custom libc won't work for statically linked programs.

...

Wouldn't there be a risk that a Linux system call would present an argument that happens to look like a capability and not get intercepted if system calls were implemented by just catching the existing invalid-syscall exceptions?

Not if you don't have any capabilities mapped into the address space. -- William Leslie Q: What is your boss's password? A: "Authentication", clearly Notice: Likely much of this email is, by the nature of copyright, covered under copyright law. You absolutely MAY reproduce any part of it in accordance with the copyright law of the nation you are reading this in. Any attempt to DENY YOU THOSE RIGHTS would be illegal without prior contractual agreement.

Gernot Heiser

10:31 a.m.

On 1 Aug 2021, at 10:12, William ML Leslie <william.leslie.ttg@gmail.com> wrote:

...

...
Wouldn't there be a risk that a Linux system call would present an argument that happens to look like a capability and not get intercepted if system calls were implemented by just catching the existing invalid-syscall exceptions?

Not if you don't have any capabilities mapped into the address space.

Correct, which would be a reasonable design for a legacy subsystem, it forces each syscall to raise an exception. Silly me for not pointing this out in the first place. Gernot

Andrew Warkentin

11:31 a.m.

On 7/31/21, Gernot Heiser <gernot@unsw.edu.au> wrote:

...

On 31 Jul 2021, at 20:20, Hugo V.C. <skydivebcn@gmail.com> wrote:

...
"Therefore the process server will be almost as important as the kernel when it comes to security"

I'm not developer, so please excuse my ignorance in some topics (like OS dev), but, I've experience in exploiting OS stuff and I always prefer a ring 0 vuln than a ring 3 vuln... If the process server is implemented at user space (ring 3) then I don't get how we can compare security in so different cpu execution modes. Am I missing something?

I agree. It seems that Andrew’s design looks pretty monolithic.

It will be comparable to QNX in its degree of modularization. It will still have a multi-server architecture, but vertical modularity will be avoided for most subsystems. Each server process will normally contain the full stack of a subsystem instance; e.g. the process server will contain the memory/process manager and VFS (which will be very tightly integrated), the disk server will contain the disk device driver for a particular disk or HBA, partition table drivers, and disk filesystem drivers, and the network stack will contain the driver for a particular device and all protocols up to and including the transport layer. Basically all servers other than the process server will support multiple instances at least to some extent (servers with device drivers will typically be run as one process per device). In situations where separating the layers of a subsystem instance actually make sense, certain servers will optionally support it through stub drivers. IMO vertical modularity usually isn't worth it because protection domains tend to correspond to subsystems more often than they do to layers (e.g. the disk device driver, partition table driver, and disk filesystem driver usually all deal with the same data just at different levels and separating them into multiple processes will usually do nothing but add IPC overhead without any increase in security). On 7/31/21, Gernot Heiser <gernot@unsw.edu.au> wrote:

...

On 1 Aug 2021, at 10:12, William ML Leslie <william.leslie.ttg@gmail.com> wrote:

...
Not if you don't have any capabilities mapped into the address space.

Correct, which would be a reasonable design for a legacy subsystem, it forces each syscall to raise an exception. Silly me for not pointing this out in the first place.

That won't be an option on UX/RT. Its native environment will already be quite Linux-like and I want to treat Linux processes as native processes that have been loaded by an alternate binary loader and run in an alternate filesystem namespace with Linux libraries. With a system call handler that dynamically replaces Linux kernel traps with equivalent library functions that call native UX/RT APIs, performance should be equivalent to native processes or nearly so.

Jimmy Brush

11:36 a.m.

On 8/1/21, Gernot Heiser wrote:

...

On 1 Aug 2021, at 10:12, William ML Leslie <william.leslie.ttg@gmail.com> wrote:

...
...
Wouldn't there be a risk that a Linux system call would present an argument that happens to look like a capability and not get intercepted if system calls were implemented by just catching the existing invalid-syscall exceptions?

Not if you don't have any capabilities mapped into the address space.

Correct, which would be a reasonable design for a legacy subsystem, it forces each syscall to raise an exception. Silly me for not pointing this out in the first place.

I think there are some syscalls that a linux application could issue that would not trigger a fault: * The yield syscall takes no arguments * It looks to me like the non-blocking send syscalls do not raise exceptions I don't know how likely they would be. - JB

Gerwin Klein

5:15 p.m.

On 1 Aug 2021, at 11:36, Jimmy Brush via Devel <devel@sel4.systems<mailto:devel@sel4.systems>> wrote: On 8/1/21, Gernot Heiser wrote: On 1 Aug 2021, at 10:12, William ML Leslie <william.leslie.ttg@gmail.com<mailto:william.leslie.ttg@gmail.com>> wrote: Wouldn't there be a risk that a Linux system call would present an argument that happens to look like a capability and not get intercepted if system calls were implemented by just catching the existing invalid-syscall exceptions? Not if you don't have any capabilities mapped into the address space. Correct, which would be a reasonable design for a legacy subsystem, it forces each syscall to raise an exception. Silly me for not pointing this out in the first place. I think there are some syscalls that a linux application could issue that would not trigger a fault: * The yield syscall takes no arguments * It looks to me like the non-blocking send syscalls do not raise exceptions I can confirm that "yield" is the only syscall in seL4 that does not need a cap[*]. That is probably something that should be changed. Cheers, Gerwin [*]: You can double-check this fairly easily by following https://github.com/seL4/l4v/blob/master/spec/abstract/Syscall_A.thy#L326 one or two layers down

Gernot Heiser

5:55 p.m.

On 1 Aug 2021, at 17:15, Gerwin Klein <kleing@unsw.edu.au> wrote:

...

I can confirm that "yield" is the only syscall in seL4 that does not need a cap[*]. That is probably something that should be changed.

Indeed, it’s been on my list for a long time. In a cap system there shouldn’t be any syscalls that aren’t authorised by a cap. Main issue is that we didn’t come up with a reasonable choice for the type of cap that would authorise yield. We’ll probably have to introduce an object just for that purpose. Gernot

Jimmy Brush

2 Aug 2 Aug

9:35 a.m.

I've done some further analysis of the situation on aarch64 and x86_64. On aarch64 the cpu register used for the seL4 system call number is not part of the linux syscall ABI. Therefore, any linux system call could trigger one of the seL4 syscalls that do not raise an exception resulting in the system call not being emulated. On x86_64 the cpu register used for the seL4 system call number is used as the third argument to a linux system call. Therefore, if any linux system call happens to use one of seL4 system call number constants that do not raise an exception as its third argument the system call will not be emulated. As such, I do not think it is possible to reliably trap linux system calls on these architectures. This should be fixed. - JB -- Analysis aarch64: reg seL4[3] linux[2] --- -------- -------- x0 dest/src arg1 x1 info arg2 x2 mr0 arg3 x3 mr1 arg4 x4 mr2 arg5 x5 mr3 arg6 x6 reply - x7 sysno - x8 dest (*) sysno x86_64: reg seL4[1] linux[2] --- -------- -------- rax - sysno rdx sysno arg3 rdi dest/src arg1 rsi info arg2 r8 mr1 arg5 r9 mr2 arg6 r10 mr0 arg4 r12 reply - r13 dest (*) - r15 mr3 - (*) only used with NBSendRecv The seL4 system call number constants of concern (on MCS)[4] include: NBSend: -6 Yield: -11 -- References [1] https://github.com/seL4/seL4/blob/master/libsel4/sel4_arch_include/x86_64/se... [2] https://www.systutorials.com/docs/linux/man/2-syscall/ [3] https://github.com/seL4/seL4/blob/master/libsel4/sel4_arch_include/aarch64/s... [4] https://github.com/seL4/seL4/blob/master/libsel4/include/api/syscall.xml

Stefan O'Rear

10:05 a.m.

On Sun, Aug 1, 2021, at 3:15 AM, Gerwin Klein wrote:

...

Not if you don't have any capabilities mapped into the address space. Correct, which would be a reasonable design for a legacy subsystem, it forces each syscall to raise an exception. Silly me for not pointing this out in the first place.

I think there are some syscalls that a linux application could issue that would not trigger a fault:

* The yield syscall takes no arguments * It looks to me like the non-blocking send syscalls do not raise exceptions

I can confirm that "yield" is the only syscall in seL4 that does not need a cap[*]. That is probably something that should be changed.

An idea I had a long time ago (and for different reasons) was that all system calls should fault if the _CSpace root_ for the current thread is null. This allows running "foreign system" threads without changing dataflow properties and without touching the fastpaths at all. However if I understand Andrew's use case correctly this does not help him; I believe he is asking to run single threads containing both microkernel-aware code and legacy code in the same thread, with seL4 recognizing syscalls from the address range containing microkernel-aware code, while syscalls from all other addresses fault. This is similar to a recently proposed Linux emulation feature (https://lkml.org/lkml/2020/11/18/815) and would require at least one additional load and test in the fastpaths. An alternative which would allow my understanding of Andrew's use case would be to consistently use the same syscall register as Linux with non-overlapping syscall numbers. IIRC there is old seL4 or L4 documentation which implies this is already the case, although as detailed in several other replies it is not. -s

Jimmy Brush

10:15 a.m.

On 8/2/21, Stefan O'Rear wrote:

...

An alternative which would allow my understanding of Andrew's use case would be to consistently use the same syscall register as Linux with non-overlapping syscall numbers. IIRC there is old seL4 or L4 documentation which implies this is already the case, although as detailed in several other replies it is not.

While this may work specifically for linux, it would be nice to have a general solution that would allow trapping syscalls for any operating system. - JB

Sidhartha Agrawal

3 Aug 3 Aug

6:02 a.m.

Hi Andrew, Thank you for your project. I will check out the root task in your project, that is what I was looking for. This might be a silly question, does UX/RT stand for *Unix Real Time*? -Sid On Sun, Aug 1, 2021 at 5:06 PM Stefan O'Rear <sorear@fastmail.com> wrote:

...

On Sun, Aug 1, 2021, at 3:15 AM, Gerwin Klein wrote:

...
Not if you don't have any capabilities mapped into the address space. Correct, which would be a reasonable design for a legacy subsystem, it forces each syscall to raise an exception. Silly me for not pointing this out in the first place.

I think there are some syscalls that a linux application could issue that would not trigger a fault:

* The yield syscall takes no arguments * It looks to me like the non-blocking send syscalls do not raise exceptions

I can confirm that "yield" is the only syscall in seL4 that does not need a cap[*]. That is probably something that should be changed.

An idea I had a long time ago (and for different reasons) was that all system calls should fault if the _CSpace root_ for the current thread is null. This allows running "foreign system" threads without changing dataflow properties and without touching the fastpaths at all.

However if I understand Andrew's use case correctly this does not help him; I believe he is asking to run single threads containing both microkernel-aware code and legacy code in the same thread, with seL4 recognizing syscalls from the address range containing microkernel-aware code, while syscalls from all other addresses fault. This is similar to a recently proposed Linux emulation feature (https://lkml.org/lkml/2020/11/18/815) and would require at least one additional load and test in the fastpaths.

An alternative which would allow my understanding of Andrew's use case would be to consistently use the same syscall register as Linux with non-overlapping syscall numbers. IIRC there is old seL4 or L4 documentation which implies this is already the case, although as detailed in several other replies it is not.

-s _______________________________________________ Devel mailing list -- devel@sel4.systems To unsubscribe send an email to devel-leave@sel4.systems

Andrew Warkentin

4 Aug 4 Aug

9:19 a.m.

On 8/2/21, Sidhartha Agrawal <siagraw@cs.ubc.ca> wrote:

...

Hi Andrew, Thank you for your project. I will check out the root task in your project, that is what I was looking for. This might be a silly question, does UX/RT stand for *Unix Real Time*?

Actually it stands for Universally eXtensible Real Time. Right now the root task doesn't do anything other than parsing the MBI and testing the allocators.

Hugo V.C.

1 Aug 1 Aug

4:38 p.m.

Interesting topic and potential dangerous area... El dom., 1 ago. 2021 2:34, Gernot Heiser <gernot@unsw.edu.au> escribió:

...

On 1 Aug 2021, at 10:12, William ML Leslie <william.leslie.ttg@gmail.com> wrote:

...
...
Wouldn't there be a risk that a Linux system call would present an argument that happens to look like a capability and not get intercepted if system calls were implemented by just catching the existing invalid-syscall exceptions?

Not if you don't have any capabilities mapped into the address space.

Correct, which would be a reasonable design for a legacy subsystem, it forces each syscall to raise an exception. Silly me for not pointing this out in the first place.

Gernot _______________________________________________ Devel mailing list -- devel@sel4.systems To unsubscribe send an email to devel-leave@sel4.systems

Hugo V.C.

31 Jul 31 Jul

8:20 p.m.

"Therefore the process server will be almost as important as the kernel when it comes to security" I'm not developer, so please excuse my ignorance in some topics (like OS dev), but, I've experience in exploiting OS stuff and I always prefer a ring 0 vuln than a ring 3 vuln... If the process server is implemented at user space (ring 3) then I don't get how we can compare security in so different cpu execution modes. Am I missing something? El sáb., 31 jul. 2021 9:38, Andrew Warkentin <andreww591@gmail.com> escribió:

...

On 7/31/21, Hugo V.C. <skydivebcn@gmail.com> wrote:

...
Hi Andrew,

your project looked interesting to me too so I decided to give it a look and I coud read here:

https://gitlab.com/uxrt/core-supervisor/sel4-kernel

"The seL4 microkernel with patches for UX/RT"

But my question is: by modifying the seL4 kernel, don't you loose the security of the formal verification? I mean, the nice thing of a formally verified code is you can forget about it's security. Don't you fear you can introduce vulnerabilities?

Cheers,

The patches are only for the boot code at the moment, which isn't part of the verification. However, I may end up having to fork seL4 at some point because the seL4 project's priorities seem to be heavily focused on static deeply embedded systems rather than desktops, servers, and large embedded systems.

UX/RT will be extremely centralized in its capability management with CSpaces, VSpaces, and UTSpaces all managed exclusively by the process server and no raw seL4 APIs exposed to other processes (the base standard library will of course call them but the symbols won't be exported). Therefore the process server will be almost as important as the kernel when it comes to security. I'd be surprised if it doesn't end up being way too complex to verify since it will be highly dynamic and will directly implement (a subset of) the Unix API, which was never designed with verification in mind, so losing kernel verification probably won't make much of a difference. Even without verification, UX/RT should be far more secure than any mainstream OS. _______________________________________________ Devel mailing list -- devel@sel4.systems To unsubscribe send an email to devel-leave@sel4.systems

Andrew Warkentin

8:52 p.m.

On 7/31/21, Hugo V.C. <skydivebcn@gmail.com> wrote:

...

"Therefore the process server will be almost as important as the kernel when it comes to security"

I'm not developer, so please excuse my ignorance in some topics (like OS dev), but, I've experience in exploiting OS stuff and I always prefer a ring 0 vuln than a ring 3 vuln... If the process server is implemented at user space (ring 3) then I don't get how we can compare security in so different cpu execution modes. Am I missing something?

(sorry, mistakenly sent this privately instead of to the list before) Even though the process server will run in user mode as far as the hardware and microkernel are concerned, functionally it will be akin to something above user processes but below the kernel. It will have full access to all kernel objects in the system including all user pages, CNodes, and endpoints (it won't actually map user pages except if a process opens the associated file for read/write I/O as opposed to mapping the file), and it will control access to all user-visible resources in the system by managing process file permission lists. This is very similar in scope to QNX's process server except that it won't be colocated in the kernel. Therefore, vulnerabilities in the process server will be almost equivalent to kernel vulnerabilities, so it will be written in Rust for better memory safety and will be kept as minimal as is reasonably possible.

Hugo V.C.

9:33 p.m.

"It will have full access to all kernel objects in the system including all user pages, CNodes, and endpoints (it won't actually map user pages except if a process opens the associated file for read/write I/O as opposed to mapping the file), and it will control access to all user-visible resources in the system by managing process file permission lists." Ok, I still don't have the required skills (know-how of the seL4 inners) to say if this is the right/secure way to do this job, but just in case it is (others should put some light on this) as an attacker I would put all my effort to find vulnerabilities in this piece of code as per your description it looks to me like a "sexy" entry point to the system. El sáb., 31 jul. 2021 12:54, Andrew Warkentin <andreww591@gmail.com> escribió:

...

On 7/31/21, Hugo V.C. <skydivebcn@gmail.com> wrote:

...
"Therefore the process server will be almost as important as the kernel when it comes to security"

I'm not developer, so please excuse my ignorance in some topics (like OS dev), but, I've experience in exploiting OS stuff and I always prefer a ring 0 vuln than a ring 3 vuln... If the process server is implemented at user space (ring 3) then I don't get how we can compare security in so different cpu execution modes. Am I missing something?

(sorry, mistakenly sent this privately instead of to the list before)

Even though the process server will run in user mode as far as the hardware and microkernel are concerned, functionally it will be akin to something above user processes but below the kernel. It will have full access to all kernel objects in the system including all user pages, CNodes, and endpoints (it won't actually map user pages except if a process opens the associated file for read/write I/O as opposed to mapping the file), and it will control access to all user-visible resources in the system by managing process file permission lists. This is very similar in scope to QNX's process server except that it won't be colocated in the kernel.

Therefore, vulnerabilities in the process server will be almost equivalent to kernel vulnerabilities, so it will be written in Rust for better memory safety and will be kept as minimal as is reasonably possible. _______________________________________________ Devel mailing list -- devel@sel4.systems To unsubscribe send an email to devel-leave@sel4.systems

William ML Leslie

10:17 p.m.

On Sat, 31 Jul 2021, 8:54 pm Andrew Warkentin, <andreww591@gmail.com> wrote:

...

Even though the process server will run in user mode as far as the hardware and microkernel are concerned, functionally it will be akin to something above user processes but below the kernel. It will have full access to all kernel objects in the system including all user pages, CNodes, and endpoints (it won't actually map user pages except if a process opens the associated file for read/write I/O as opposed to mapping the file),

Eeep. You want untyped access to cappages? I do understand that urge (for performance reasons). I would prefer to get around that by implementing an io_uring style async interface to kernel-implemented objects. The too-ing and fro-ing required to set up address mapping is eventually going to get onerous and will have to be solved with some rigour.

William ML Leslie

10:24 p.m.

On Sat, 31 Jul 2021, 10:17 pm William ML Leslie, < william.leslie.ttg@gmail.com> wrote:

...

On Sat, 31 Jul 2021, 8:54 pm Andrew Warkentin, <andreww591@gmail.com> wrote:

...
Even though the process server will run in user mode as far as the hardware and microkernel are concerned, functionally it will be akin to something above user processes but below the kernel. It will have full access to all kernel objects in the system including all user pages, CNodes, and endpoints (it won't actually map user pages except if a process opens the associated file for read/write I/O as opposed to mapping the file),

Eeep. You want untyped access to cappages?

Or maybe i read that wrong. It's reasonable to hold those sort of caps close rather than letting the application tinker with its own mapping, since in the unix domain it's a pain in the neck to maintain some invariants on fork or exec. Add CoW and demand paging to that and it's a real party.

...

Andrew Warkentin

1 Aug 1 Aug

7:31 a.m.

On 7/31/21, William ML Leslie <william.leslie.ttg@gmail.com> wrote:

...

On Sat, 31 Jul 2021, 8:54 pm Andrew Warkentin, <andreww591@gmail.com> wrote:

...
Even though the process server will run in user mode as far as the hardware and microkernel are concerned, functionally it will be akin to something above user processes but below the kernel. It will have full access to all kernel objects in the system including all user pages, CNodes, and endpoints (it won't actually map user pages except if a process opens the associated file for read/write I/O as opposed to mapping the file),

Eeep. You want untyped access to cappages?

The process server won't have any kind of raw access to the underlying memory of kernel objects like CNodes, since the API doesn't allow it. The only reason I can think of for adding support for anything like that might be to somehow allow raw read access (but certainly not write access) to make CSpace allocators more space-efficient by eliminating the bitmap used to track which slots are used, but that certainly isn't any kind of priority.

...

I do understand that urge (for performance reasons). I would prefer to get around that by implementing an io_uring style async interface to kernel-implemented objects. The too-ing and fro-ing required to set up address mapping is eventually going to get onerous and will have to be solved with some rigour.

I've thought about that as well. I was thinking it could possibly be solved by adding a batch system call interface where the caller writes multiple system calls into a buffer (with each entry in this buffer having a fixed length including all registers, even ones that aren't used by a particular invocation) and passes the number of system calls to the kernel. This might best be implemented as an invocation on a new type of "batch buffer" kernel object rather than adding a new system call trap.

Gernot Heiser

10:10 a.m.

On 31 Jul 2021, at 20:20, Hugo V.C. <skydivebcn@gmail.com> wrote:

...

"Therefore the process server will be almost as important as the kernel when it comes to security"

I'm not developer, so please excuse my ignorance in some topics (like OS dev), but, I've experience in exploiting OS stuff and I always prefer a ring 0 vuln than a ring 3 vuln... If the process server is implemented at user space (ring 3) then I don't get how we can compare security in so different cpu execution modes. Am I missing something?

I agree. It seems that Andrew’s design looks pretty monolithic. Gernot

1446

Age (days ago)

1449

Last active (days ago)

List overview

Download

28 comments

8 participants

participants (8)

Andrew Warkentin
Gernot Heiser
Gerwin Klein
Hugo V.C.
Jimmy Brush
Sidhartha Agrawal
Stefan O'Rear
William ML Leslie

Operating Systems built on top of seL4

tags

participants (8)