[seL4] Re: sel4cp and device driver API's

14 Sep 2022

...
On 13 Sep 2022, at 7:01 pm, Demi Marie Obenour <demi@invisiblethingslab.com> wrote:
In seL4, a passive server is a program that never runs except when one
of its IPC endpoints is called.  Therefore, a passive driver would be
one that is called via IPC, and an active driver would be one that uses
shared memory and notifications.
Potentially not that important for this discussion, but to avoid confusion in the future: a passive server in seL4 terminology is a thread without a scheduling context, which is expecting to get a scheduling context via donation to be able to provide its service. Scheduling context donation can happen via IPC or via notification, including IRQ notifications.
...
...
...
2. Passive drivers only make sense if the drivers are trusted.  In the
    case of e.g. a USB or network drivers, this is a bad assumption: both
    are major attack vectors and drivers ported from e.g. Linux are
    unlikely to be hardened against malicious devices.
Hmm, this I am quite surprised by. Is this an expected outcome of the seL4
security model?
It is.  An PPC callee is allowed to block for as long as it desires, and
doing so will block the caller too.  There is no way (that I am aware
of) to interrupt this wait.  Therefore, the callee can perform a trivial
denial-of-service attack on the caller.
This is correct in the sense that an IPC call implies trusting the invoked party to return to you. There is a mechanism (bound notifications) that allows you to interrupt that wait, but you need to then trust that notification sender. You can probably implement a time-out mechanism with that, even If that would likely be a questionable choice.

The idea of the security model is more that if that trust cannot be established, notifications + shared memory are the intended more conservative mechanism. Think of IPC as a procedure call in a type safe language without global state — it should give you roughly the same trust model: the callee can’t change your memory (while it can change its own + call others), but you still have to rely on the callee performing its service and returning a value to you. (And as analogy to abusing bound notifications for this purpose: you could probably use a debugger to interrupt that wait and continue, but it would be a hack).
...
As mentioned above, the IPC fastpath is only useable when the caller
trusts the callee to at least not perform a denial-of-service attack.
Furthermore, since seL4 does not provide any support for copying
large messages between userspace processes, mutually distrusting PDs
will often need to perform defensive copies on both sides.  One can
implement asynchronous, single-copy message and cap transfer via a
trusted userspace message broker.
I think that is correct, yes. You probably can’t get 0-copy without at least some level of trust between the communication partners. That doesn’t mean you can’t program defensively, there is a lot of design space between no trust, trust in some properties, and trust in full correctness. I.e. if your main trust issue is termination, 0-copy might still be possible.

Note that a lower-level driver can always deny its own service just by not doing anything, there is no mitigation for that. The question is if that denial is amplified or not, i.e. does a low-level driver not returning to you make the system hang or just not produce any network traffic. That question could for instance be decided a few levels up the stack, depending on how things are plugged together.
...
1. seL4 has (so far) mostly been used in embedded systems.  I suspect
   many of these run on simple, low-power, in-order CPUs, which are
   inherently not vulnerable to Spectre.  As such, they very much can
   have fast context switches.  I, on the other hand, am mostly
   interested in using seL4 in Qubes OS, which runs on out-of-order
   hardware where Spectre is a major concern.  Therefore, context switch
   performance is limited by the need for Spectre mitigations.
The current spectre mitigations are not the only way to mitigate timing channels. One major problem of the current mitigations is that they have to be applied to all context switches. That is not really necessary. You only need to apply mitigations to context switches between mutually distrusting information domains. I.e. communication between a driver, the kernel, and the next level protocol stack usually would not need them, but communication between anything that holds a secret and the rest does. I.e. equating processes + context switches with information flow domains is not necessarily required or good for performance (at least not in hardware with design defects like Spectre).

Not that any of this is available in the current seL4 releases, but we’re working on it.

Cheers,
Gerwin

[seL4] Re: sel4cp and device driver API's

Gerwin Klein