Hello, I just wanted to say that I agree with a lot of things that Wanja says in this mail and that I think he makes some very good points about common infrastructure requiring policy decisions. On 2024-11-28 09:26, Wanja.Zaeske--- via Devel wrote:
While for the seL4 design having priority preemption may be essential, I see this as an OS design tradeoff. In particular looking through the glasses of safety certification, non-preemptive yet priority driven scheduling certainly has some appeal compared to priority preemptive scheduling. Don't get me wrong, I just want to point out that priority preemption is neither the only nor necessarily the best design, especially for safety (where having no preemption greatly simplifies reasoning while downsides like reduced performance and increased latency may be completely acceptable). I won't try to coerce anyone into building non-preemptive seL4, so I stop the topic here to not derail this conversation :D.
In seL4 that is a matter of configuration: Just use a period equal to your watchdog timeout value and you will practically have non-preemtive scheduling. And instead of a global watchdog, with seL4 you can use timeout faults for more fine grained control. Or give all tasks a different priority.
One can create a system that implements the priorities as per the recommendations in the sDDF yet fails to consume its input fully, i.e. through setting the client PDs budget smaller than required to consume one chunk of input. Priorities (as you pointed out), but also scheduling parameters (budget/period of partitions), external factors (rate at which data is coming in), and internal factors (system load, i.e. contention on the memory bus) all come to play for whether input is fully consumed or not. For the serial-port this hopefully is not a concern on any HW from this century, but there is of course other, higher bandwidth comms that are desirable. It is also this kind of "What if?" and "Which PD can adversely effect which other PD, and how?" that are relevant for building safety critical systems.
Yes, system design is hard and you need to know a lot of scheduling and other seL4 details to make safety critical systems. But documenting all intricacies of flow control, fault detection, buffering considerations etc. is neither the task of the Microkit documentation, nor the seL4 manual. sDDF should document at least some of it though. It would be nice if there was more general awareness of safety critical system requirements by people making infrastructure like Microkit and sDDF, basic things like fault recovery are currently missing. Especially in a microkernel system, faults should not propagate through the system to other components, so all infrastructure should be designed with fault containment and recovery in mind.
Secondly, being completely policy-free is a double-sided sword. In general, I support the effort. However, if to use a more sophisticated hardware I would need to make dozens of non-trivial design decisions (as the upstream implementation is completely un-opinionated/policy free), this will hurt real-world adoption (especially if there is an alternative that just has "the vanilla way of doing things" work without any decision paralysis). I think being 100% policy-free can lead to an SW landscape that is universally capable, but for which there exist no ecosystem of reusable components, because each and every component is very individually designed for its specific use-case. I think Microkit is much more opinionated than seL4, and I think that is what makes it more useful to me. Being opinionated (albeit resulting in technical compromise more easily) is the way to grow an ecosystem of reusable components, and that is precisely the currency with which one can foster greater adoption. Takeaway: policy-free (and simple to operate!) drivers are great, but from time to time one should not defer _all_ decisions to the user, but set an opinionated precedence for _some_ of them. Also, with seL4 you already offer a product heavily leaning towards "no technical compromise, but all the complexity and tightest possible coupling".
Strongly agreed with this. One big problem is that having a lot of infrastructure that is not build with a certain use case in mind quickly becomes useless, especially with strong safety or security requirements. Except for CPU usage and priorities, seL4 has all the required mechanism to create modular, stand-alone reusable components. However, to let them work together and foster re-use, a common system API/ABI is needed and a way to tie everything together. If the infrastructure is good, people only need to replace some modules if they don't match requirements instead of writing all infrastructure themselves. I think it's necessary to split this into a system API and a system ABI: An API makes it possible to avoid making system design decisions too early. e.g. I'm not a fan of sDDF requiring multiple pointless PD transitions: It's fine to have separation of concern, but if both components have the same security and safety impact, then running them in different PDs is just pointless overhead. Same for time servers: Whether getting the time needs an IPC call or just a CPU instruction, with possibly getting extra info from shared memory, should not be decided beforehand. But having well defined common infrastructure is crucial. sDDF is a good building block for communication, not only for drivers, but for system services too. But it does not solve the API/ABI problem: That needs to be defined per driver framework and system service. The challenge is to create an API that is simple for simple systems, but has the flexibility to be expanded with more complex functionality without breaking all existing users. This is especially true for networking and file systems. Microkit should expose all untyped memory and other resources it doesn't use itself, so that it's easy to create a more dynamic system on top of Microkit and launch stand-alone subsystems with a well-defined interface. Practically such subsystem would look almost the same as a PD to Microkit, but with UT and other additional resources (such subsystem could be build using Microkit itself). If this is not supported, then using Microkit or not becomes an all-or-nothing choice, reducing adoption. If you know you can always get to the low-level seL4 mechanisms and add missing bits yourself on top of Microkit, the choice to use it becomes much safer. To some extent Lions OS (1) and SMOS (2) are a step in the right direction, but they have a very long way to go. Greetings, Indan 1) https://lionsos.org 2) https://trustworthy.systems/projects/smos