On 9 Aug 2023, at 04:02, Demi Marie Obenour <demiobenour@gmail.com> wrote:
One of the goals of the MCS scheduler is to allow untrusted parts of the system (such as device drivers) to still have low interrupt latency. However, this seems to interact badly with the domain scheduler, as interrupts can arrive when the domain that will serve them is not scheduled. Worse, it appears that interrupts will generally require an IBPB (or equivalent) on both entry and exit, since they may interrupt any code.
The domain scheduler is designed to support a strict separation-kernel configuration (strict time and space partitioning). These generally run with interrupts disabled (that’s the configuration for the seL4 confidentiality proofs). Typically you check for pending interrupts at the beginning of a time partition. Obviously this means that your interrupt latency is the full time period. Note exactly fast interrupt response, but that’s how eg ARINC systems run. We demonstrated [Ge et al, EuroSys’19] that you can allow interrupts without introducing timing channels by partitioning IRQs between partitions. But that doesn’t change the WCIL, which is still the partition period. (This isn’t verified yet – work in progress.) We did brainstorm a while back ways of getting better IRQ latencies for some cases, eg if the IRQ handling domain itself has no secrets to leak. But that was never really thought through, and I’m waiting for a good PhD student to get back to this topic.
Is this accurate? If so, it seems that the “flush all μArch state” instruction coming to some RISC-V CPUs is insufficient,
I assume you’re referring to the fence.t proposal by Wistoff et al [DATE’21]?
and full speculative taint tracking is required.
I don’t follow. If you clean all µarch state you don’t have to worry about speculation traces, that’s (among others) the gist of Ge et al.
More generally, requiring mutually distrusting domains to be explicitly marked seems to be problematic for anything that is not a static system: in a dynamic system (one that can run third-party code), one must typically assume that different address spaces are mutually distrusting, with the result that IPC latency will be severely impacted.
You can’t have isolation without paying for it. But endpoints between mutually-distrusting domains make no sense (our experimental time-protection implementation has it only for performance evaluation). The work on verified time protection has no cross-domain communication at present, which is, of course, overly restrictive. It can be extended to cross-domain shared memory and signals-like mechanisms, but we haven’t done that yet. (See above re looking for a good PhD student…) As I keep saying, the seL4 mechanism that is (unfortunately, somewhat misleadingly and for purely historic reasons) called “IPC” shouldn’t be considered a general-purpose communication mechanism, but a protected procedure call – the microkernel equivalent to a Linux system call. As such, the trust relationship is not symmetric: you use it to invoke some more privileged operation (and definitely need to trust the callee to a degree).
Am I missing something, or will a general-purpose OS need full speculative taint tracking in hardware if it is to have fast IPCs between mutually-distrusting code on out-of-order CPUs?
No, I don’t think so. To the contrary, I think that speculation tracking is an instance of the fallacy of trowing hardware at a problem that can be solved much simpler if you instead provide simple mechanisms that allow the OS to do its job. The OS, not the hardware, knows the system’s security policy. Pretending otherwise leads to complexity and waste. Which is fine if your business model is based on making hardware more complex (no names ;-) but it’s not fine if your objective is secure systems. fence.t (or something similar) is the mechanism you need to let the OS do it’s job, and it is simple and cheap to implement, and costs you no more than the L1-D flush you need anyway, as Wistoff et al demonstrated. Gernot