On 8/8/23 15:32, Gernot Heiser wrote:
On 9 Aug 2023, at 04:02, Demi Marie Obenour <demiobenour@gmail.com> wrote:
One of the goals of the MCS scheduler is to allow untrusted parts of the system (such as device drivers) to still have low interrupt latency. However, this seems to interact badly with the domain scheduler, as interrupts can arrive when the domain that will serve them is not scheduled. Worse, it appears that interrupts will generally require an IBPB (or equivalent) on both entry and exit, since they may interrupt any code.
The domain scheduler is designed to support a strict separation-kernel configuration (strict time and space partitioning). These generally run with interrupts disabled (that’s the configuration for the seL4 confidentiality proofs). Typically you check for pending interrupts at the beginning of a time partition.
Obviously this means that your interrupt latency is the full time period. Note exactly fast interrupt response, but that’s how eg ARINC systems run.
We demonstrated [Ge et al, EuroSys’19] that you can allow interrupts without introducing timing channels by partitioning IRQs between partitions. But that doesn’t change the WCIL, which is still the partition period. (This isn’t verified yet – work in progress.)
We did brainstorm a while back ways of getting better IRQ latencies for some cases, eg if the IRQ handling domain itself has no secrets to leak. But that was never really thought through, and I’m waiting for a good PhD student to get back to this topic.
Is this accurate? If so, it seems that the “flush all μArch state” instruction coming to some RISC-V CPUs is insufficient,
I assume you’re referring to the fence.t proposal by Wistoff et al [DATE’21]?
Perhaps? It’s whatever was referred to at the last developer hangout.
and full speculative taint tracking is required.
I don’t follow. If you clean all µarch state you don’t have to worry about speculation traces, that’s (among others) the gist of Ge et al.
Does it prevent Spectre v1? A bounds check will almost always predict as in-bounds and that is potentially a problem. Taint tracking does prevent Spectre v1 because the speculatively read data is guaranteed to be unobservable. Strict temporal isolation also mitigates this, but IIUC it is also incompatible with load-balancing and therefore only practical in limited cases.
More generally, requiring mutually distrusting domains to be explicitly marked seems to be problematic for anything that is not a static system: in a dynamic system (one that can run third-party code), one must typically assume that different address spaces are mutually distrusting, with the result that IPC latency will be severely impacted.
You can’t have isolation without paying for it.
But endpoints between mutually-distrusting domains make no sense (our experimental time-protection implementation has it only for performance evaluation). The work on verified time protection has no cross-domain communication at present, which is, of course, overly restrictive. It can be extended to cross-domain shared memory and signals-like mechanisms, but we haven’t done that yet. (See above re looking for a good PhD student…)
As I keep saying, the seL4 mechanism that is (unfortunately, somewhat misleadingly and for purely historic reasons) called “IPC” shouldn’t be considered a general-purpose communication mechanism, but a protected procedure call – the microkernel equivalent to a Linux system call. As such, the trust relationship is not symmetric: you use it to invoke some more privileged operation (and definitely need to trust the callee to a degree).
I should have said “not-mutually-trusting”, then.
Am I missing something, or will a general-purpose OS need full speculative taint tracking in hardware if it is to have fast IPCs between mutually-distrusting code on out-of-order CPUs?
No, I don’t think so. To the contrary, I think that speculation tracking is an instance of the fallacy of trowing hardware at a problem that can be solved much simpler if you instead provide simple mechanisms that allow the OS to do its job. The OS, not the hardware, knows the system’s security policy. Pretending otherwise leads to complexity and waste. Which is fine if your business model is based on making hardware more complex (no names ;-) but it’s not fine if your objective is secure systems.
fence.t (or something similar) is the mechanism you need to let the OS do it’s job, and it is simple and cheap to implement, and costs you no more than the L1-D flush you need anyway, as Wistoff et al demonstrated.
I missed the “costs you no more than the L1-D flush you need anyway” part. On x86, instructions like IBPB can easily take thousands of cycles IIUC. Would fence.t have equally catastrophic overhead on an out-of-order RISC-V processor? https://riscv-europe.org/media/proceedings/posters/2023-06-08-Nils-WISTOFF-a... seems simple to implement in hardware, but does not seem efficient. https://carrv.github.io/2020/papers/CARRV2020_paper_10_Wistoff.pdf claims to be decently efficient, but is for an in-order CPU. Also, in the future, would you mind including the full URL of any articles? I don’t know what “Wistoff et al” and “Ge et al” refer to, and my mail client is configured to only display plain text (not HTML) because the attack surface of HTML rendering is absurdly high. -- Sincerely, Demi Marie Obenour (she/her/hers)