[seL4] Re: On Common IPC Patterns in Microkit

28 Nov 2024

      Hi Gernot,

thank you for the prompt and elaborate answer. A few quick comments

I fully agree with your stance on being approachable/guiding through the Microkit Tutorial.
That's why I suggested not to solving the issue (e.g. by buffering) part of the tutorial, but to mention that there might be a lurking issue in the proposed communication for some cases. Awareness is key, everything else explodes the scope of the tutorial IMO.

I do think one should be able to build and understand Microkit applications by consuming the Microkit Manual and the relevant sections of the sDDF docs.
While it is acceptable to have higher complexity topics only in the seL4 docs, this risks the promise of Microkit; if one in the end needs to consume the seL4 documentation to know about even basic behavior of ones application built on Microkit, this greatly reduces the value proposition of Microkit.
Striking the right balance here is hard. In particular, I argue the Microkit Manual should be a bit more detailed on how things are scheduled, because this very information is highly relevant for building correct & robust systems based on Microkit. Scheduling assumptions _are_ foundational to many common IPC patterns in Microkit, more so than in other OS.
...
It’s not only not problematic, it’s essential ;-)
While for the seL4 design having priority preemption may be essential, I see this as an OS design tradeoff. In particular looking through the glasses of safety certification, non-preemptive yet priority driven scheduling certainly has some appeal compared to priority preemptive scheduling. Don't get me wrong, I just want to point out that priority preemption is neither the only nor necessarily the best design, especially for safety (where having no preemption greatly simplifies reasoning while downsides like reduced performance and increased latency may be completely acceptable). I won't try to coerce anyone into building non-preemptive seL4, so I stop the topic here to not derail this conversation :D.
...
Keep in mind that this only mirrors what happens in hardware: If software doesn't consume input from the serial port quickly enough, you’ll lose characters.
Yes, absolutely! The problem is not loosing characters, the problem is _silently_ loosing characters. Failure is omnipresent, and for safety critical systems a huge chunk of effort goes into reasoning about failure paths and system degradation. Silent failure is however terrible, you can not degrade gracefully if you are not aware of any degradation. Also, many serial interfaces that I'm aware of do buffer in hardware, so technically, the software abstraction does less than the HW, instead of matching the HW's behavior. Nitpick, I know.
...
The other key thing to keep in mind (and maybe this needs to be emphasised more in the Microkit documentation) is that the core idea behind the seL4 device driver framework (sDDF) is that the driver has one and only one job: translating a hardware-specific *device* interface to a hardware-independent *device class* interface. Meaning an sDDF driver *only* provides a minimal hardware abstraction. This a great enabler of simplicity of sDDF drivers, making them a fraction of the size of Linux drivers, and easy to write.
Importantly, sDDF drivers are, as far as possible, policy free.
A direct implication of this is that buffering isn’t the driver’s job, it’s policy that should be done in another component (separation of concerns). Meaning the driver just reflects the fact that input characters get lost if you don’t consume them quickly enough. And loss can be prevented by appropriate assignment of priorities.
The serial driver section in the sDDF document is right now mostly AWOL (as there are a number of others, I’ve got a bunch of device class specs sitting on my desk waiting to be reviewed…)
There is discussion of priorities in the general part of the  doc: §4.3.1 specifies that the driver should be higher prio than the virtualiser, and §4.3.2 similarly says that the virt should be higher prio than the client. But there really needs to be more info on serial drivers.
I agree, mostly. I think I catch the drift, and for specific examples discussed, I concur with your argument. Two things worth noting:
...
And loss can be prevented by appropriate assignment of priorities.
This statement is wrong. One can create a system that implements the priorities as per the recommendations in the sDDF yet fails to consume its input fully, i.e. through setting the client PDs budget smaller than required to consume one chunk of input. Priorities (as you pointed out), but also scheduling parameters (budget/period of partitions), external factors (rate at which data is coming in), and internal factors (system load, i.e. contention on the memory bus) all come to play for whether input is fully consumed or not. For the serial-port this hopefully is not a concern on any HW from this century, but there is of course other, higher bandwidth comms that are desirable. It is also this kind of "What if?" and "Which PD can adversely effect which other PD, and how?" that are relevant for building safety critical systems.

Secondly, being completely policy-free is a double-sided sword. In general, I support the effort. However, if to use a more sophisticated hardware I would need to make dozens of non-trivial design decisions (as the upstream implementation is completely un-opinionated/policy free), this will hurt real-world adoption (especially if there is an alternative that just has "the vanilla way of doing things" work without any decision paralysis). I think being 100% policy-free can lead to an SW landscape that is universally capable, but for which there exist no ecosystem of reusable components, because each and every component is very individually designed for its specific use-case. I think Microkit is much more opinionated than seL4, and I think that is what makes it more useful to me. Being opinionated (albeit resulting in technical compromise more easily) is the way to grow an ecosystem of reusable components, and that is precisely the currency with which one can foster greater adoption. Takeaway: policy-free (and simple to operate!) drivers are great, but from time to time one should not defer _all_ decisions to the user, but set an opinionated precedence for _some_ of them. Also, with seL4 you already offer a product heavily leaning towards "no technical compromise, but all the complexity and tightest possible coupling".
...
I repeat, more/better documentation is definitely a good idea (incl diagrams!).
Yes!
...
But it’s also important that this documentation is driven by (and reflective of) the principles behind the design, rather than being a collection of recipes.
Is that really a contradiction? I think from an engineering perspective, recipes are very useful. They help to have many independent engineers grow a more or less coherent body of work, that even if not intended in the first place, actually composes well together (software component reuse again...). That goes without saying, having the engineers understand the design principles behind their recipes is crucial too.

Thank you for the exchange! I certainly learned a bit more about your/the seL4 design philosophy, even if I don't unconditionally agree to every last detail.

Kind Regards, Wanja

——————————————————————————
Deutsches Zentrum für Luft- und Raumfahrt (DLR)
Institut für Flugsystemtechnik | Sichere Systeme & Systems Engineering | Lilienthalplatz 7 | 38108 Braunschweig

Wanja Zaeske
Telefon 0531 295-1114 | Fax 0531 295-2931 | wanja.zaeske@dlr.de<mailto:wanja.zaeske@dlr.de>
DLR.de<http://www.dlr.de/>

[seL4] Re: On Common IPC Patterns in Microkit

Wanja.Zaeske＠dlr.de