Demi Marie Obenour wrote:
While the basic framework is in place and performs well (it outperforms Linux without even trying too hard…) there are a number of questions that still need further research, and are unlikely to be resolved by the time of the initial release. One of them is whether drivers should be active (synchronising with notifications only) or passive PDs (using PPCs). There are a bunch of tradeoffs to consider, and we need a fair amount of experimental work to settle it. The good news is that the effect the choice has on driver as well as client implementation is minimal (a few lines changed, possibly zero on the client side). I **strongly** recommend active drivers, for the following reasons:
1. While seL4 can perform context switches with incredible speed, lots of hardware requires very expensive flushing to prevent Spectre attacks. On x86, for instance, one must issue an Indirect Branch Predictor Barrier (IPBP), which costs thousands of cycles. Other processors, such as ARM, also require expensive flushing. In the general case, it is not possible to safely avoid these flushes on current hardware. The issue of managing the number and timing of context switches is indeed critically important to any multi-process system. However, in as far as my original question was about comparing the mechanisms of "shared memory+IPC" and "shared memory+notifications" (which may or may not be captured by the terms "active driver" vs "passive driver", I'm not sure), I'm seeking to understand the operations of these styles of communications independently of any policy that may be built on top of them.
Therefore, context switches must be kept to a minimum, no matter how good the kernel is at doing them. Let us speak precisely here. The management of the cost of context switches, being a limited resource, is a policy-based determination which must be left up to the application to decide and not dictated by
To put it another way, the act of making another, higher-priority thread runnable is the action which induces a context switch; whether that action is part of an IPC, notification, fault, etc. If a piece of code decides to perform one of those activities, it is choosing to incur the costs of a context-switch. It decides if and when to do this based on its own policy and the compromises it necessarily entails; and these are made independently of the mechanisms provided by the kernel. the system. What the system should provide is mechanisms which allow the correct trade-offs to be made, which is why I'm especially curious to see how we can support things like scatter-gather and Segmentation Offloading which are critical for other platforms and I expect this one too.
2. Passive drivers only make sense if the drivers are trusted. In the case of e.g. a USB or network drivers, this is a bad assumption: both are major attack vectors and drivers ported from e.g. Linux are unlikely to be hardened against malicious devices.
Hmm, this I am quite surprised by. Is this an expected outcome of the seL4 security model? This implies that a rather large swath of kernel functionality (the IPC fastpath, cap transfer, the single-threaded event model) are simply not available to mutually suspicious PD's. I'm very concerned about the expansion of the T(rusted)CB into userspace, for both performance and assurance reasons.
3. A passive driver must be on the same core as its caller, since cross-core PPC is deprecated. An active driver does not have this restriction. Active drivers can therefore run concurrently with the programs they serve, which is the same technique used by Linux’s io_uring with SQPOLL. For a high-performance driver, I would expect to have at least one TX and RX queue per CPU. Therefore, a local call should always be possible. The deprecation of cross-CPU IPC is related to the basic concept that spreading work across CPU's is generally not a good idea.
But yes, if the user wants to distribute the workload in that way, passing data between CPU's, obviously IPC fastpath is off the table, and notifications seem like a pretty clear choice in that case. (However, the existence of this use case is _not_ a reason to sacrifice the performance of the same-core RTC execution model.)
4. Linux’s io_uring has the application submit a bunch of requests into a ring buffer and then tell the kernel to process these requests. The kernel processes the requests asynchronously and writes completions into another ring buffer. Userspace can use poll() or epoll() to wait for a completion to arrive. This is a fantastic fit for active drivers: the submission syscall is replaced by a notification, and another notification is used to for wakeup on completion.
Agreed, it's a very attractive model. Indeed, that is basically how I got started on this line of thinking; it is quite apparent that these command ring/pipe-like structures are very flexible and could be used as the building blocks of entire systems. So the question I wanted to answer was: what are we leaving on the table if we go with this approach? Particularly given the emphasis on IPC, its optimizations and the contention that fast IPC is the foundational element of a successful, performant microkernel system. -Eric