Possibility of adding a system call origin limit similar to that of OpenBSD?
I am wanting to implement a system call origin limit (akin to that of OpenBSD) in UX/RT. This could probably be accomplished by adding a "set system call range" TCB method that takes the start and end addresses of the range in which system calls will be permitted without interception. Any system call issued from code outside this range would incur a user exception (which would provide all the state required for user code to handle the system call and resume the thread if desired). UX/RT will require all binaries to be dynamically linked with a minimal "libroot" library that will contain an IPC transport layer that will be the only part of the system outside the root server that makes direct system calls. This will make it easier for later versions to retain backwards compatibility. Limiting system call origin should also make certain kinds of attacks more difficult. I am also planning to use the system call origin limit in the Linux compatibility environment in order to distinguish Linux syscalls from seL4 ones, since Linux syscalls will never come from libroot. Each process in the Linux compatibility environment will have a fault handler thread that looks up the syscall and calls the corresponding UX/RT APIs, and will also replace the trap with a call to a jump table if the function containing it is a known syscall wrapper in order to cut down on the number of trips through the kernel. Would it be possible to add something like this to the mainline seL4 tree?
Just to clarify, UX/RT will only require a single contiguous range in which un-intercepted syscalls will be permitted (since libroot will also function as the dynamic linker), so there will be no need for the added complexity of a list of multiple ranges (unless someone else wants to use system call origin verification in an OS with a more complex standard library structure).
On 11/22/20 3:22 AM, Andrew Warkentin wrote:
I am wanting to implement a system call origin limit (akin to that of OpenBSD) in UX/RT. This could probably be accomplished by adding a "set system call range" TCB method that takes the start and end addresses of the range in which system calls will be permitted without interception. Any system call issued from code outside this range would incur a user exception (which would provide all the state required for user code to handle the system call and resume the thread if desired).
UX/RT will require all binaries to be dynamically linked with a minimal "libroot" library that will contain an IPC transport layer that will be the only part of the system outside the root server that makes direct system calls. This will make it easier for later versions to retain backwards compatibility. Limiting system call origin should also make certain kinds of attacks more difficult.
I am also planning to use the system call origin limit in the Linux compatibility environment in order to distinguish Linux syscalls from seL4 ones, since Linux syscalls will never come from libroot. Each process in the Linux compatibility environment will have a fault handler thread that looks up the syscall and calls the corresponding UX/RT APIs, and will also replace the trap with a call to a jump table if the function containing it is a known syscall wrapper in order to cut down on the number of trips through the kernel.
That is an absolutely awesome way of doing the compatibility layer! Good luck!
Would it be possible to add something like this to the mainline seL4 tree?
I am not part of the seL4 team, so I cannot give an authoritative answer, but intuitively it seems like a fairly simple feature to add. The question then becomes if it is worth the additional maintenance burden. Sincerely, Demi
On Sun, 2020-11-22 at 01:22 -0700, Andrew Warkentin wrote:
I am wanting to implement a system call origin limit (akin to that of OpenBSD) in UX/RT. This could probably be accomplished by adding a "set system call range" TCB method that takes the start and end addresses of the range in which system calls will be permitted without interception. Any system call issued from code outside this range would incur a user exception (which would provide all the state required for user code to handle the system call and resume the thread if desired).
I'm not convinced that adding this feture to the kernel itself is reasonable. Performing any system call (except for Yield) on seL4 requires a capability in the calling threads CSpace. This should be sufficient to protect a thread from performing any unauthorised action. If a thread attempts to perform a syscall directly rather than via the interface library, this shouldn't be a security issue although it may be a correctness one (if you intend to allow applications to make syscalls directly). You shouldn't rely on the shared library within the binary to enforce security requirements. If you need to be able to emulate the Linux system call ABI for the level of compatibility you are attempting to achieve, this should be feasible with the VCPUs present in seL4. You don't necessarily need to provide an entire linux VM within the VCPU, but a minimal guest supervisor mode would be sufficient to trap all calls made from the linux user-level and redirect them to the host system using the seL4 API.
On 11/22/20, Millar, Curtis (Data61, Kensington NSW) <Curtis.Millar@data61.csiro.au> wrote:
If a thread attempts to perform a syscall directly rather than via the interface library, this shouldn't be a security issue although it may be a correctness one (if you intend to allow applications to make syscalls directly). You shouldn't rely on the shared library within the binary to enforce security requirements.
Yes, I'm well aware that it's far from sufficient to enforce security requirements. UX/RT will manage capabilities through its VFS layer, which will be part of the root server rather than a library, and processes won't get any capability unless they are authorized. System call origin limits would just make certain attacks a bit more difficult. The other reason I am wanting them is because I want to eliminate any possibility of badly-written programs making direct syscalls, rather than using the transport layer (the seL4 API isn't stable, and I don't want to expose raw IPC to user programs at all, since the transport layer will already map onto it quite closely). UX/RT is meant for real-world general-purpose installations, so backwards compatibility is very important, and allowing user programs to make raw system calls would make that more difficult.
If you need to be able to emulate the Linux system call ABI for the level of compatibility you are attempting to achieve, this should be feasible with the VCPUs present in seL4. You don't necessarily need to provide an entire linux VM within the VCPU, but a minimal guest supervisor mode would be sufficient to trap all calls made from the linux user-level and redirect them to the host system using the seL4 API.
I would think that would incur significant overhead. I want to implement the syscall handling as a library rather than a server, with the only server-like thing involved with syscalls being the trap handler thread, which as I said should replace traps with calls to a jump table that makes normal function calls (for Linux syscalls that have direct equivalent functions in UX/RT, the equivalent will just be called directly, and those that don't will be implemented in the compatibility library). Am I right that there is no way for VCPUs to make seL4 syscalls directly without going through the associated VMM thread? I don't see any way to do that in the manual. I don't want every single Linux system call to have to trap into the system call handler, only to have the system call handler trap into the kernel again to communicate with the underlying server. Linux programs under UX/RT should be treated as closely to native UX/RT ones as possible and should have identical performance, since UX/RT's API functions will be quite close to Linux even without the compatibility layer. There needs to be some way to patch out the Linux syscall traps so that no traps occur other than those that a native UX/RT program would make.
I think the simple answer here is “extremely unlikely”. This proposed feature fundamentally violates the microkernel minimality principle, as the same can be achieved in other ways. It is also narrow in its application, and carries policy. It will also almost certainly lengthen the critical syscall pass and thus penalise the performance of everything. If someone comes with a convincing performance case then we’ll think about whether there is a clean way of addressing it. Gernot
On 22 Nov 2020, at 19:22, Andrew Warkentin <andreww591@gmail.com> wrote:
I am wanting to implement a system call origin limit (akin to that of OpenBSD) in UX/RT. This could probably be accomplished by adding a "set system call range" TCB method that takes the start and end addresses of the range in which system calls will be permitted without interception. Any system call issued from code outside this range would incur a user exception (which would provide all the state required for user code to handle the system call and resume the thread if desired).
UX/RT will require all binaries to be dynamically linked with a minimal "libroot" library that will contain an IPC transport layer that will be the only part of the system outside the root server that makes direct system calls. This will make it easier for later versions to retain backwards compatibility. Limiting system call origin should also make certain kinds of attacks more difficult.
I am also planning to use the system call origin limit in the Linux compatibility environment in order to distinguish Linux syscalls from seL4 ones, since Linux syscalls will never come from libroot. Each process in the Linux compatibility environment will have a fault handler thread that looks up the syscall and calls the corresponding UX/RT APIs, and will also replace the trap with a call to a jump table if the function containing it is a known syscall wrapper in order to cut down on the number of trips through the kernel.
Would it be possible to add something like this to the mainline seL4 tree? _______________________________________________ Devel mailing list -- devel@sel4.systems To unsubscribe send an email to devel-leave@sel4.systems
On 11/22/20, Heiser, Gernot (Data61, Kensington NSW) <Gernot.Heiser@data61.csiro.au> wrote:
This proposed feature fundamentally violates the microkernel minimality principle, as the same can be achieved in other ways. It is also narrow in its application, and carries policy. It will also almost certainly lengthen the critical syscall pass and thus penalise the performance of everything.
I would have thought it would just be a matter of comparing the syscall origin address against the start and end limit addresses. It's not like I was expecting to check it against any kind of array/list (which would definitely incur significant overhead).
The other serious issue with using virtualization to intercept Linux system calls is that there are many situations where hardware virtualization extensions are unavailable (running in a VMM without nested virtualization, virtualization extensions are disabled in the BIOS which is the default on some PCs, or some machines just don't support them at all). Saying "you must have a system with hardware virtualization available to use the Linux compatibility environment" would be a serious hindrance to the adoption of UX/RT as a desktop/server OS. I want it to be as easy to switch to UX/RT as possible, and supporting Linux applications on all installations regardless of hardware features is a very important part of that. I can't think of any other ways besides origin limits and virtualization to distinguish for which kernel system calls using the SYSCALL instruction are intended (INT system calls could be distinguished by the vector, but they're mostly deprecated at this point and typically won't be used by modern Linux programs). A general-purpose OS for the real world doesn't have the luxury of imposing restrictions like that just for the sake of academic elegance. I do consider myself a bit of an architectural purist, but purism in an OS intended for the real world must serve practicality and not hinder it. I'm beginning to think that my goal of making a full-featured general-purpose OS that doesn't sacrifice practicality and performance for academic purism is just too divergent from the goals of seL4, and I may have to make my own fork. A verified kernel would only be nice to have and not a critical feature, since UX/RT's root server is very likely to be essentially unverifiable, and it will be just as critical to security as the kernel (it will be the only part of the system that writes to CSpaces; UX/RT will treat kernel capabilities as just an implementation detail that is not exposed anywhere in the API). That being said, I would only be expecting to add a few simple hooks that further the goal of making a practical OS rather than making any fundamental architectural changes or adding anything large or complex. The general architecture of seL4 is already more or less ideal as far as I'm concerned.
participants (4)
-
Andrew Warkentin
-
Demi M. Obenour
-
Heiser, Gernot (Data61, Kensington NSW)
-
Millar, Curtis (Data61, Kensington NSW)