Proper seL4 APi/ABI design
Jumping off from the recent thread, I was curious if anyone had any thoughts about how to optimally design API's for seL4. I noticed we had a bit of info about what not to do (don't do long IPC, don't implement Posix, don't use messaging as synchronization, etc.) Are there any shining examples of what _to_ do with an API? Thanks, -Eric
I was looking into sel4cp (https://github.com/BreakawayConsulting/sel4cp) for some ideas about designing high-performance, high-security systems with seL4 and I had some questions about how it interfaces with devices. I saw that there's an ethernet driver (example/tqma8xqp1gb/ethernet/eth.c) which is structured as a PD with two shared memory ring buffers (TX and RX) and associated notifications, without using IPC/ppcall at all in the API. Insofar as IPC/ppcall is the cornerstone of performance in seL4, I wonder if there could be or should be a role for IPC/ppcall. Does using IPC in place of notifications here violate the "don't use IPC for synchronization" rule? I guess I'm not too clear on what the advantage of notifications is over IPC for things like shared memory buffers. I think my ideal goal would be something like a IPC-based API where one can pass in a (limited) number of scatter-gather vectors, plus some metadata (offloading parameters, priority, etc.), and could benefit from the fastpath. This would enable a high-performance stack that could take advantage of things like zero-copy buffering where applicable. More generally, I also wonder how IPC fits in with bidirectional devices if we follow the strict priority-based call restrictions (recommended in seL4; required in sel4cp). If the device can both send and receive data, then it seems it has to be both a high-priority PD (to be called) and a low-priority PD (to call in to the stack), assuming that we are avoiding the use of blocking-style turnaround API's (such as read, recv, etc. - I feel those are best left at the application layers.) Thoughts? Thanks, -Eric
Urgh, sorry about the formatting. Repeating below...
I was looking into sel4cp (https://github.com/BreakawayConsulting/sel4cp) for some ideas about designing high-performance, high-security systems with seL4 and I had some questions about how it interfaces with devices.
I saw that there's an ethernet driver (example/tqma8xqp1gb/ethernet/eth.c) which is structured as a PD with two shared memory ring buffers (TX and RX) and associated notifications, without using IPC/ppcall at all in the API.
Insofar as IPC/ppcall is the cornerstone of performance in seL4, I wonder if there could be or should be a role for IPC/ppcall. Does using IPC in place of notifications here violate the "don't use IPC for synchronization" rule? I guess I'm not too clear on what the advantage of notifications is over IPC for things like shared memory buffers.
I think my ideal goal would be something like a IPC-based API where one can pass in a (limited) number of scatter-gather vectors, plus some metadata (offloading parameters, priority, etc.), and could benefit from the fastpath. This would enable a high-performance stack that could take advantage of things like zero-copy buffering where applicable.
More generally, I also wonder how IPC fits in with bidirectional devices if we follow the strict priority-based call restrictions (recommended in seL4; required in sel4cp). If the device can both send and receive data, then it seems it has to be both a high-priority PD (to be called) and a low-priority PD (to call in to the stack), assuming that we are avoiding the use of blocking-style turnaround API's (such as read, recv, etc. - I feel those are best left at the application layers.)
Thoughts?
Hi Eric,
On 1 Sep 2022, at 13:07, Eric Jacobs
wrote: I was looking into sel4cp (https://github.com/BreakawayConsulting/sel4cp) for some ideas about designing high-performance, high-security systems with seL4 and I had some questions about how it interfaces with devices.
I saw that there's an ethernet driver (example/tqma8xqp1gb/ethernet/eth.c) which is structured as a PD with two shared memory ring buffers (TX and RX) and associated notifications, without using IPC/ppcall at all in the API.
Insofar as IPC/ppcall is the cornerstone of performance in seL4, I wonder if there could be or should be a role for IPC/ppcall. Does using IPC in place of notifications here violate the "don't use IPC for synchronization" rule? I guess I'm not too clear on what the advantage of notifications is over IPC for things like shared memory buffers.
You certainly should not use IPC just for synchronisation, but if you have a two-way control flow then it becomes a question of one IPC vs two Signals.
I think my ideal goal would be something like a IPC-based API where one can pass in a (limited) number of scatter-gather vectors, plus some metadata (offloading parameters, priority, etc.), and could benefit from the fastpath. This would enable a high-performance stack that could take advantage of things like zero-copy buffering where applicable.
More generally, I also wonder how IPC fits in with bidirectional devices if we follow the strict priority-based call restrictions (recommended in seL4; required in sel4cp). If the device can both send and receive data, then it seems it has to be both a high-priority PD (to be called) and a low-priority PD (to call in to the stack), assuming that we are avoiding the use of blocking-style turnaround API's (such as read, recv, etc. - I feel those are best left at the application layers.)
You identify some of the relevant issues ;-) We have been looking at some of this in detail for our high-performance driver framework (see https://trustworthy.systems/projects/TS/drivers/, although that page is presently devoid of technical detail). We are preparing a release which is scheduled to happen at the time of the seL4 Summit in 5 weeks’ time. There will be an extensive report describing design and implementation. While the basic framework is in place and performs well (it outperforms Linux without even trying too hard…) there are a number of questions that still need further research, and are unlikely to be resolved by the time of the initial release. One of them is whether drivers should be active (synchronising with notifications only) or passive PDs (using PPCs). There are a bunch of tradeoffs to consider, and we need a fair amount of experimental work to settle it. The good news is that the effect the choice has on driver as well as client implementation is minimal (a few lines changed, possibly zero on the client side). Gernot
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 On Thu, Sep 01, 2022 at 03:32:15AM +0000, Gernot Heiser wrote:
Hi Eric,
On 1 Sep 2022, at 13:07, Eric Jacobs
wrote: I was looking into sel4cp (https://github.com/BreakawayConsulting/sel4cp) for some ideas about designing high-performance, high-security systems with seL4 and I had some questions about how it interfaces with devices.
I saw that there's an ethernet driver (example/tqma8xqp1gb/ethernet/eth.c) which is structured as a PD with two shared memory ring buffers (TX and RX) and associated notifications, without using IPC/ppcall at all in the API.
Insofar as IPC/ppcall is the cornerstone of performance in seL4, I wonder if there could be or should be a role for IPC/ppcall. Does using IPC in place of notifications here violate the "don't use IPC for synchronization" rule? I guess I'm not too clear on what the advantage of notifications is over IPC for things like shared memory buffers.
You certainly should not use IPC just for synchronisation, but if you have a two-way control flow then it becomes a question of one IPC vs two Signals.
I think my ideal goal would be something like a IPC-based API where one can pass in a (limited) number of scatter-gather vectors, plus some metadata (offloading parameters, priority, etc.), and could benefit from the fastpath. This would enable a high-performance stack that could take advantage of things like zero-copy buffering where applicable.
More generally, I also wonder how IPC fits in with bidirectional devices if we follow the strict priority-based call restrictions (recommended in seL4; required in sel4cp). If the device can both send and receive data, then it seems it has to be both a high-priority PD (to be called) and a low-priority PD (to call in to the stack), assuming that we are avoiding the use of blocking-style turnaround API's (such as read, recv, etc. - I feel those are best left at the application layers.)
You identify some of the relevant issues ;-)
We have been looking at some of this in detail for our high-performance driver framework (see https://trustworthy.systems/projects/TS/drivers/, although that page is presently devoid of technical detail). We are preparing a release which is scheduled to happen at the time of the seL4 Summit in 5 weeks’ time. There will be an extensive report describing design and implementation.
While the basic framework is in place and performs well (it outperforms Linux without even trying too hard…) there are a number of questions that still need further research, and are unlikely to be resolved by the time of the initial release. One of them is whether drivers should be active (synchronising with notifications only) or passive PDs (using PPCs). There are a bunch of tradeoffs to consider, and we need a fair amount of experimental work to settle it. The good news is that the effect the choice has on driver as well as client implementation is minimal (a few lines changed, possibly zero on the client side).
I **strongly** recommend active drivers, for the following reasons: 1. While seL4 can perform context switches with incredible speed, lots of hardware requires very expensive flushing to prevent Spectre attacks. On x86, for instance, one must issue an Indirect Branch Predictor Barrier (IPBP), which costs thousands of cycles. Other processors, such as ARM, also require expensive flushing. In the general case, it is not possible to safely avoid these flushes on current hardware. Therefore, context switches must be kept to a minimum, no matter how good the kernel is at doing them. 2. Passive drivers only make sense if the drivers are trusted. In the case of e.g. a USB or network drivers, this is a bad assumption: both are major attack vectors and drivers ported from e.g. Linux are unlikely to be hardened against malicious devices. 3. A passive driver must be on the same core as its caller, since cross-core PPC is deprecated. An active driver does not have this restriction. Active drivers can therefore run concurrently with the programs they serve, which is the same technique used by Linux’s io_uring with SQPOLL. 4. Linux’s io_uring has the application submit a bunch of requests into a ring buffer and then tell the kernel to process these requests. The kernel processes the requests asynchronously and writes completions into another ring buffer. Userspace can use poll() or epoll() to wait for a completion to arrive. This is a fantastic fit for active drivers: the submission syscall is replaced by a notification, and another notification is used to for wakeup on completion. - -- Sincerely, Demi Marie Obenour (she/her/hers) Invisible Things Lab -----BEGIN PGP SIGNATURE----- iQIzBAEBCgAdFiEEdodNnxM2uiJZBxxxsoi1X/+cIsEFAmMQSFkACgkQsoi1X/+c IsHwiQ/8DnhbDrKNPSWaUs6f/SHI2li1ufOx1OwQx9fGRi2Lu1zhgs3CJko0z6Tu W6NqfzbqCYeh840a74XyQqjx/m1YzP5L7PQp5l45Aub5EzgrEOSs8x7m6cfUKvju LJ7Y86MBsxqUQVvspAHLC+n5JygmDBridqqWmSBC7kEYGLJ+WfgvJQ8xE+MMyCD7 BoVr/apC0IAydtC3rfu8OVIm4c4vRqq0FcYRDXQQDGLoybUP3Vxn8hQs67upKLRK nJX4v9q0L2gi7Wtq5si6E2FafCT8Jf5jSaFRVmyc4Ql1IZQcrdybmoiU8KmfkUxz x/RBGKnxGFRHpfH7eiVV+L4rKnnhUG3u0wXkCaGz9qfmqJP0Mbyh0HrKVpr5pS9D IBpMCdEwAQ3I+xetCLBd7I0cblnHne/uKhsetjyJgKgV2odZzdkGqaQr6B0lKDCN HIJf8kwsxUucje7ugMttJBiQQeAIP1iWV71l4GYshC22IuXhwEWgS/TkkVpqpeOW EWMYdMHrDw+kSKCgawUnV3HDnhYDW9xWuWfzi+xjEpp0Porx33i9a2SPgZzYmy08 UYvxxfHHwQm2CuRkqMxJoYy/N2OGhxxziUqgMswGf6g0szDNKMnesgrP1e0JkXRX AD5qDV3SAsJApwWsE9jntytl5SNN+NklEMZ3TdL/Db13d3SE38w= =EJdh -----END PGP SIGNATURE-----
Demi Marie Obenour wrote:
While the basic framework is in place and performs well (it outperforms Linux without even trying too hard…) there are a number of questions that still need further research, and are unlikely to be resolved by the time of the initial release. One of them is whether drivers should be active (synchronising with notifications only) or passive PDs (using PPCs). There are a bunch of tradeoffs to consider, and we need a fair amount of experimental work to settle it. The good news is that the effect the choice has on driver as well as client implementation is minimal (a few lines changed, possibly zero on the client side). I **strongly** recommend active drivers, for the following reasons:
1. While seL4 can perform context switches with incredible speed, lots of hardware requires very expensive flushing to prevent Spectre attacks. On x86, for instance, one must issue an Indirect Branch Predictor Barrier (IPBP), which costs thousands of cycles. Other processors, such as ARM, also require expensive flushing. In the general case, it is not possible to safely avoid these flushes on current hardware. The issue of managing the number and timing of context switches is indeed critically important to any multi-process system. However, in as far as my original question was about comparing the mechanisms of "shared memory+IPC" and "shared memory+notifications" (which may or may not be captured by the terms "active driver" vs "passive driver", I'm not sure), I'm seeking to understand the operations of these styles of communications independently of any policy that may be built on top of them.
Therefore, context switches must be kept to a minimum, no matter how good the kernel is at doing them. Let us speak precisely here. The management of the cost of context switches, being a limited resource, is a policy-based determination which must be left up to the application to decide and not dictated by
To put it another way, the act of making another, higher-priority thread runnable is the action which induces a context switch; whether that action is part of an IPC, notification, fault, etc. If a piece of code decides to perform one of those activities, it is choosing to incur the costs of a context-switch. It decides if and when to do this based on its own policy and the compromises it necessarily entails; and these are made independently of the mechanisms provided by the kernel. the system. What the system should provide is mechanisms which allow the correct trade-offs to be made, which is why I'm especially curious to see how we can support things like scatter-gather and Segmentation Offloading which are critical for other platforms and I expect this one too.
2. Passive drivers only make sense if the drivers are trusted. In the case of e.g. a USB or network drivers, this is a bad assumption: both are major attack vectors and drivers ported from e.g. Linux are unlikely to be hardened against malicious devices.
Hmm, this I am quite surprised by. Is this an expected outcome of the seL4 security model? This implies that a rather large swath of kernel functionality (the IPC fastpath, cap transfer, the single-threaded event model) are simply not available to mutually suspicious PD's. I'm very concerned about the expansion of the T(rusted)CB into userspace, for both performance and assurance reasons.
3. A passive driver must be on the same core as its caller, since cross-core PPC is deprecated. An active driver does not have this restriction. Active drivers can therefore run concurrently with the programs they serve, which is the same technique used by Linux’s io_uring with SQPOLL. For a high-performance driver, I would expect to have at least one TX and RX queue per CPU. Therefore, a local call should always be possible. The deprecation of cross-CPU IPC is related to the basic concept that spreading work across CPU's is generally not a good idea.
But yes, if the user wants to distribute the workload in that way, passing data between CPU's, obviously IPC fastpath is off the table, and notifications seem like a pretty clear choice in that case. (However, the existence of this use case is _not_ a reason to sacrifice the performance of the same-core RTC execution model.)
4. Linux’s io_uring has the application submit a bunch of requests into a ring buffer and then tell the kernel to process these requests. The kernel processes the requests asynchronously and writes completions into another ring buffer. Userspace can use poll() or epoll() to wait for a completion to arrive. This is a fantastic fit for active drivers: the submission syscall is replaced by a notification, and another notification is used to for wakeup on completion.
Agreed, it's a very attractive model. Indeed, that is basically how I got started on this line of thinking; it is quite apparent that these command ring/pipe-like structures are very flexible and could be used as the building blocks of entire systems. So the question I wanted to answer was: what are we leaving on the table if we go with this approach? Particularly given the emphasis on IPC, its optimizations and the contention that fast IPC is the foundational element of a successful, performant microkernel system. -Eric
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 On Mon, Sep 12, 2022 at 10:18:48PM -0500, Eric Jacobs wrote:
Demi Marie Obenour wrote:
While the basic framework is in place and performs well (it outperforms Linux without even trying too hard…) there are a number of questions that still need further research, and are unlikely to be resolved by the time of the initial release. One of them is whether drivers should be active (synchronising with notifications only) or passive PDs (using PPCs). There are a bunch of tradeoffs to consider, and we need a fair amount of experimental work to settle it. The good news is that the effect the choice has on driver as well as client implementation is minimal (a few lines changed, possibly zero on the client side). I **strongly** recommend active drivers, for the following reasons:
1. While seL4 can perform context switches with incredible speed, lots of hardware requires very expensive flushing to prevent Spectre attacks. On x86, for instance, one must issue an Indirect Branch Predictor Barrier (IPBP), which costs thousands of cycles. Other processors, such as ARM, also require expensive flushing. In the general case, it is not possible to safely avoid these flushes on current hardware. The issue of managing the number and timing of context switches is indeed critically important to any multi-process system. However, in as far as my original question was about comparing the mechanisms of "shared memory+IPC" and "shared memory+notifications" (which may or may not be captured by the terms "active driver" vs "passive driver", I'm not sure), I'm seeking to understand the operations of these styles of communications independently of any policy that may be built on top of them.
In seL4, a passive server is a program that never runs except when one of its IPC endpoints is called. Therefore, a passive driver would be one that is called via IPC, and an active driver would be one that uses shared memory and notifications.
To put it another way, the act of making another, higher-priority thread runnable is the action which induces a context switch; whether that action is part of an IPC, notification, fault, etc. If a piece of code decides to perform one of those activities, it is choosing to incur the costs of a context-switch. It decides if and when to do this based on its own policy and the compromises it necessarily entails; and these are made independently of the mechanisms provided by the kernel.
This is correct.
Therefore, context switches must be kept to a minimum, no matter how good the kernel is at doing them. Let us speak precisely here. The management of the cost of context switches, being a limited resource, is a policy-based determination which must be left up to the application to decide and not dictated by the system. What the system should provide is mechanisms which allow the correct trade-offs to be made, which is why I'm especially curious to see how we can support things like scatter-gather and Segmentation Offloading which are critical for other platforms and I expect this one too.
I do not see how scatter-gatter and segmentation offload depend on how drivers communicate with their clients.
2. Passive drivers only make sense if the drivers are trusted. In the case of e.g. a USB or network drivers, this is a bad assumption: both are major attack vectors and drivers ported from e.g. Linux are unlikely to be hardened against malicious devices.
Hmm, this I am quite surprised by. Is this an expected outcome of the seL4 security model?
It is. An PPC callee is allowed to block for as long as it desires, and doing so will block the caller too. There is no way (that I am aware of) to interrupt this wait. Therefore, the callee can perform a trivial denial-of-service attack on the caller.
This implies that a rather large swath of kernel functionality (the IPC fastpath, cap transfer, the single-threaded event model) are simply not available to mutually suspicious PD's. I'm very concerned about the expansion of the T(rusted)CB into userspace, for both performance and assurance reasons.
As mentioned above, the IPC fastpath is only useable when the caller trusts the callee to at least not perform a denial-of-service attack. Furthermore, since seL4 does not provide any support for copying large messages between userspace processes, mutually distrusting PDs will often need to perform defensive copies on both sides. One can implement asynchronous, single-copy message and cap transfer via a trusted userspace message broker.
3. A passive driver must be on the same core as its caller, since cross-core PPC is deprecated. An active driver does not have this restriction. Active drivers can therefore run concurrently with the programs they serve, which is the same technique used by Linux’s io_uring with SQPOLL. For a high-performance driver, I would expect to have at least one TX and RX queue per CPU. Therefore, a local call should always be possible. The deprecation of cross-CPU IPC is related to the basic concept that spreading work across CPU's is generally not a good idea.
If you are referring to networking applications where the driver can already perform a denial of service (by not processing any packets), then using an IPC call to the driver would be justified. Batching is still critical for performance, though. See Vector Packet Processing for why.
But yes, if the user wants to distribute the workload in that way, passing data between CPU's, obviously IPC fastpath is off the table, and notifications seem like a pretty clear choice in that case. (However, the existence of this use case is _not_ a reason to sacrifice the performance of the same-core RTC execution model.)
If you are okay with the driver being able to perform a DoS, then IPC is fine.
4. Linux’s io_uring has the application submit a bunch of requests into a ring buffer and then tell the kernel to process these requests. The kernel processes the requests asynchronously and writes completions into another ring buffer. Userspace can use poll() or epoll() to wait for a completion to arrive. This is a fantastic fit for active drivers: the submission syscall is replaced by a notification, and another notification is used to for wakeup on completion.
Agreed, it's a very attractive model. Indeed, that is basically how I got started on this line of thinking; it is quite apparent that these command ring/pipe-like structures are very flexible and could be used as the building blocks of entire systems. So the question I wanted to answer was: what are we leaving on the table if we go with this approach? Particularly given the emphasis on IPC, its optimizations and the contention that fast IPC is the foundational element of a successful, performant microkernel system.
I think there are a few factors in play here: 1. seL4 has (so far) mostly been used in embedded systems. I suspect many of these run on simple, low-power, in-order CPUs, which are inherently not vulnerable to Spectre. As such, they very much can have fast context switches. I, on the other hand, am mostly interested in using seL4 in Qubes OS, which runs on out-of-order hardware where Spectre is a major concern. Therefore, context switch performance is limited by the need for Spectre mitigations. 2. seL4 predates speculative execution attacks by several years. Therefore, seL4’s design likely did not consider that hardware vulnerabilities would reduce IPC performance by over an order of magnitude. Monolithic kernels are also impacted by this, but recent x86 CPUs support eIBRS, which significantly reduces syscall overhead so long as no context switch is required. Except for the “IBRS everywhere” case on AMD systems mentioned above, context switches require an expensive IBPB command. 3. Even in monolithic kernels, the cost of a system call has turned out to be high enough to matter. Therefore, Linux and Windows have both adopted ring buffer-based I/O APIs, which dramatically reduce the number of system calls required. This makes the overhead of an individual system call much less important. Since a system call in a monolithic kernel typically corresponds to an IPC in a microkernel, this means that the cost of an IPC is also much less important. - -- Sincerely, Demi Marie Obenour (she/her/hers) Invisible Things Lab -----BEGIN PGP SIGNATURE----- iQIzBAEBCgAdFiEEdodNnxM2uiJZBxxxsoi1X/+cIsEFAmMgt1sACgkQsoi1X/+c IsGWxBAAiO/BbFNry2MEUpYiQPmuc123TwMcP8nQnJMi1IRueI7b3XzgzLjiQeRn o+X17McrJ/mnlHqw2oltl+SMeKF9qXwvXRUw+F6Z8R8ns15K1nASaLGqt5Jf07jp s2zkfEf87IM9cO7BHmIldutZZ0HjMwPKQYH1yf3xOxJ7cxXmxyEM+gXVS0iJ9S4k 3dB6FvSGdkR8nkVKExpm7l5fi/8iEN2DSl4AiaRDp7VLYkK495VqWlWbtT/f4lKB 95J5aos7Rji1oa4Aqqv+Judj1v79dWIZYq8vieVLLRVuBADrE8oMQwXe52z//E/K lDAKBjb2id/xgWTEaiUEBkXMhm+67iaMcE6LnxIxsjDoi/M4dBc1+UkZZ4LI0fKi s9KT+38OLTDr4UeRA3ut0kCIQVyKqnQCxwu5mWzf14U2KXqpQM8duIwkuQP932v/ XEAJPjWsSlgm2KY3WC+jO2+jDHlhziWyitFzIbevM0w69+GL66dTq1WFrW5MvbX+ T+cxxn6S5Eoxl0fc170qaUXR9k9EZIZHmJqZyYbZdmSJArfEPpNtLMC7dWHhGu1V AuMdrZWOey2iLZDSvWVgOw33Yn4g/mMwabyilzMq8zspduOE/bdJENz1cyd/4Vbh dMihPuxJlcuEXbIYGcN6obZs41wu85LKro+z8cegoC+YCsCCXtc= =gaYY -----END PGP SIGNATURE-----
On 13 Sep 2022, at 7:01 pm, Demi Marie Obenour
wrote: In seL4, a passive server is a program that never runs except when one of its IPC endpoints is called. Therefore, a passive driver would be one that is called via IPC, and an active driver would be one that uses shared memory and notifications.
Potentially not that important for this discussion, but to avoid confusion in the future: a passive server in seL4 terminology is a thread without a scheduling context, which is expecting to get a scheduling context via donation to be able to provide its service. Scheduling context donation can happen via IPC or via notification, including IRQ notifications.
2. Passive drivers only make sense if the drivers are trusted. In the case of e.g. a USB or network drivers, this is a bad assumption: both are major attack vectors and drivers ported from e.g. Linux are unlikely to be hardened against malicious devices.
Hmm, this I am quite surprised by. Is this an expected outcome of the seL4 security model?
It is. An PPC callee is allowed to block for as long as it desires, and doing so will block the caller too. There is no way (that I am aware of) to interrupt this wait. Therefore, the callee can perform a trivial denial-of-service attack on the caller.
This is correct in the sense that an IPC call implies trusting the invoked party to return to you. There is a mechanism (bound notifications) that allows you to interrupt that wait, but you need to then trust that notification sender. You can probably implement a time-out mechanism with that, even If that would likely be a questionable choice. The idea of the security model is more that if that trust cannot be established, notifications + shared memory are the intended more conservative mechanism. Think of IPC as a procedure call in a type safe language without global state — it should give you roughly the same trust model: the callee can’t change your memory (while it can change its own + call others), but you still have to rely on the callee performing its service and returning a value to you. (And as analogy to abusing bound notifications for this purpose: you could probably use a debugger to interrupt that wait and continue, but it would be a hack).
As mentioned above, the IPC fastpath is only useable when the caller trusts the callee to at least not perform a denial-of-service attack. Furthermore, since seL4 does not provide any support for copying large messages between userspace processes, mutually distrusting PDs will often need to perform defensive copies on both sides. One can implement asynchronous, single-copy message and cap transfer via a trusted userspace message broker.
I think that is correct, yes. You probably can’t get 0-copy without at least some level of trust between the communication partners. That doesn’t mean you can’t program defensively, there is a lot of design space between no trust, trust in some properties, and trust in full correctness. I.e. if your main trust issue is termination, 0-copy might still be possible. Note that a lower-level driver can always deny its own service just by not doing anything, there is no mitigation for that. The question is if that denial is amplified or not, i.e. does a low-level driver not returning to you make the system hang or just not produce any network traffic. That question could for instance be decided a few levels up the stack, depending on how things are plugged together.
1. seL4 has (so far) mostly been used in embedded systems. I suspect many of these run on simple, low-power, in-order CPUs, which are inherently not vulnerable to Spectre. As such, they very much can have fast context switches. I, on the other hand, am mostly interested in using seL4 in Qubes OS, which runs on out-of-order hardware where Spectre is a major concern. Therefore, context switch performance is limited by the need for Spectre mitigations.
The current spectre mitigations are not the only way to mitigate timing channels. One major problem of the current mitigations is that they have to be applied to all context switches. That is not really necessary. You only need to apply mitigations to context switches between mutually distrusting information domains. I.e. communication between a driver, the kernel, and the next level protocol stack usually would not need them, but communication between anything that holds a secret and the rest does. I.e. equating processes + context switches with information flow domains is not necessarily required or good for performance (at least not in hardware with design defects like Spectre). Not that any of this is available in the current seL4 releases, but we’re working on it. Cheers, Gerwin
Hello, Just replying to what would be, for me, unexpected behaviour: On 2022-09-14 11:54, Gerwin Klein wrote:
On 13 Sep 2022, at 7:01 pm, Demi Marie Obenour wrote: It is. An PPC callee is allowed to block for as long as it desires, and doing so will block the caller too. There is no way (that I am aware of) to interrupt this wait. Therefore, the callee can perform a trivial denial-of-service attack on the caller.
This is correct in the sense that an IPC call implies trusting the invoked party to return to you. There is a mechanism (bound notifications) that allows you to interrupt that wait,
I don't think IPC calls are interruptible by bound notifications. That would be very unexpected behaviour, and if true, should be made very clear in the manual. Now it says: "An important variant is the Call operation, which consists of a standard Send operation atomically followed by a variant of Receive which waits for a Reply. A reply message is always delivered via a special resource instead of using the standard IPC mechanism;" "A reply capability points directly to the caller thread and once the call has been performed is completely unrelated to the original Endpoint." "When a Notification is bound to a TCB, signals to that notification object will be delivered even if the thread is receiving from an IPC endpoint." All this implies to me that calls will not be interrupted by signals. When looking at the source it also seems notifications don't interrupt calls: Threads making a call are put into ThreadState_BlockedOnReply and sendSignal() handles that case with: } else { /* In particular, this path is taken when a thread * is waiting on a reply cap since BlockedOnReply * would also trigger this path. I.e, a thread * with a bound notification will not be awakened * by signals on that bound notification if it is * in the middle of an seL4_Call. */ ntfn_set_active(ntfnPtr, badge); } From a quick look at the source I think the only ways to interrupt a call in progress are replying, seL4_CNode_CancelBadgedSends(), destroying related resources or suspending the caller. To be clear, I think not being able to interrupt calls with signals is the right behaviour and anything else would cause hard-to-find bugs. Greetings, Indan
You’re right, I might have been a bit too hasty in this and not properly looked at the distinction between call/reply and send/wait. Will have another look and confirm. Cheers, Gerwin
On 17 Sep 2022, at 10:49 am, Indan Zupancic
wrote: Hello,
Just replying to what would be, for me, unexpected behaviour:
On 2022-09-14 11:54, Gerwin Klein wrote:
On 13 Sep 2022, at 7:01 pm, Demi Marie Obenour wrote: It is. An PPC callee is allowed to block for as long as it desires, and doing so will block the caller too. There is no way (that I am aware of) to interrupt this wait. Therefore, the callee can perform a trivial denial-of-service attack on the caller. This is correct in the sense that an IPC call implies trusting the invoked party to return to you. There is a mechanism (bound notifications) that allows you to interrupt that wait,
I don't think IPC calls are interruptible by bound notifications. That would be very unexpected behaviour, and if true, should be made very clear in the manual. Now it says:
"An important variant is the Call operation, which consists of a standard Send operation atomically followed by a variant of Receive which waits for a Reply. A reply message is always delivered via a special resource instead of using the standard IPC mechanism;"
"A reply capability points directly to the caller thread and once the call has been performed is completely unrelated to the original Endpoint."
"When a Notification is bound to a TCB, signals to that notification object will be delivered even if the thread is receiving from an IPC endpoint."
All this implies to me that calls will not be interrupted by signals.
When looking at the source it also seems notifications don't interrupt calls:
Threads making a call are put into ThreadState_BlockedOnReply and sendSignal() handles that case with:
} else { /* In particular, this path is taken when a thread * is waiting on a reply cap since BlockedOnReply * would also trigger this path. I.e, a thread * with a bound notification will not be awakened * by signals on that bound notification if it is * in the middle of an seL4_Call. */ ntfn_set_active(ntfnPtr, badge); }
From a quick look at the source I think the only ways to interrupt a call in progress are replying, seL4_CNode_CancelBadgedSends(), destroying related resources or suspending the caller.
To be clear, I think not being able to interrupt calls with signals is the right behaviour and anything else would cause hard-to-find bugs.
Greetings,
Indan
Gernot Heiser wrote:
You certainly should not use IPC just for synchronisation, but if you have a two-way control flow then it becomes a question of one IPC vs two Signals.
Thanks a lot, that answers my question.
I think my ideal goal would be something like a IPC-based API where one can pass in a (limited) number of scatter-gather vectors, plus some metadata (offloading parameters, priority, etc.), and could benefit from the fastpath. This would enable a high-performance stack that could take advantage of things like zero-copy buffering where applicable.
More generally, I also wonder how IPC fits in with bidirectional devices if we follow the strict priority-based call restrictions (recommended in seL4; required in sel4cp). If the device can both send and receive data, then it seems it has to be both a high-priority PD (to be called) and a low-priority PD (to call in to the stack), assuming that we are avoiding the use of blocking-style turnaround API's (such as read, recv, etc. - I feel those are best left at the application layers.) You identify some of the relevant issues ;-)
We have been looking at some of this in detail for our high-performance driver framework (seehttps://trustworthy.systems/projects/TS/drivers/, although that page is presently devoid of technical detail). We are preparing a release which is scheduled to happen at the time of the seL4 Summit in 5 weeks’ time. There will be an extensive report describing design and implementation.
While the basic framework is in place and performs well (it outperforms Linux without even trying too hard…) there are a number of questions that still need further research, and are unlikely to be resolved by the time of the initial release. One of them is whether drivers should be active (synchronising with notifications only) or passive PDs (using PPCs). There are a bunch of tradeoffs to consider, and we need a fair amount of experimental work to settle it. The good news is that the effect the choice has on driver as well as client implementation is minimal (a few lines changed, possibly zero on the client side).
That's really great, that is my area of interest as well. Is that the project at https://github.com/lucypa/sDDF ? I would appreciate being able to follow along at home. I'm not sure how much I can contribute as I am still pretty new to seL4, but if there's anything as far as benchmarking or testing out of the interfaces (including porting apps/drivers) I'd be glad to help out. Thanks, -Eric
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 On Wed, Aug 31, 2022 at 10:07:52PM -0500, Eric Jacobs wrote:
I was looking into sel4cp (https://github.com/BreakawayConsulting/sel4cp) for some ideas about designing high-performance, high-security systems with seL4 and I had some questions about how it interfaces with devices.
I saw that there's an ethernet driver (example/tqma8xqp1gb/ethernet/eth.c) which is structured as a PD with two shared memory ring buffers (TX and RX) and associated notifications, without using IPC/ppcall at all in the API.
Insofar as IPC/ppcall is the cornerstone of performance in seL4,
At least on high-performance processors, IPC performance is generally dominated by Spectre mitigations. On x86, those cost thousands of cycles for every context switch. You don’t want to disable them because if you do, you have an insecure system.
I wonder if there could be or should be a role for IPC/ppcall. Does using IPC in place of notifications here violate the "don't use IPC for synchronization" rule? I guess I'm not too clear on what the advantage of notifications is over IPC for things like shared memory buffers.
Notifications allow for untrusted drivers, reduce the rate of context switches (which are very expensive due to Spectre), and can operate between different CPU cores. They also allow batching of operations, which is a huge win from a CPU cache perspective.
I think my ideal goal would be something like a IPC-based API where one can pass in a (limited) number of scatter-gather vectors, plus some metadata (offloading parameters, priority, etc.), and could benefit from the fastpath. This would enable a high-performance stack that could take advantage of things like zero-copy buffering where applicable.
Sadly, it turns out that zero-copy is usually incompatible with security. The usual rule I have found is that one must make a copy whenever data crosses a security boundary and must be processed. Otherwise, TOCTOU vulnerabilities are almost inevitable. One can and should only copy data that will actually be accessed, but sometimes this will be all of it. A network stack might be able to get away with only copying the packet headers, but if it needs to perform encryption or authentication in software, it will need to copy the packet body as well. If applications and the NIC are both untrusted, two copies will be necessary. - -- Sincerely, Demi Marie Obenour (she/her/hers) Invisible Things Lab -----BEGIN PGP SIGNATURE----- iQIzBAEBCgAdFiEEdodNnxM2uiJZBxxxsoi1X/+cIsEFAmMQTZoACgkQsoi1X/+c IsEQjxAA1jyNfLcxHyMkwT3lYQmlg2/QofeXYhIJrqSDDAh/aZAUvS+mkjlMOhCx Cxu3DskhObfaTS+T8a+LbvNrBMkZI8GphI+hC0R8phSY20rUOF09LsFL7YHv+aFX OgHHftcr3kKl9PpPHdl/dld4gqQ7lFd2MHv9lbPZrxDOOkjYHunAHhkuen8d6sMD SIMXPuW/H6ML4ZcIuY5IbVcA8uYy4dE6GENfOwW29Wh5306Wf6fcKFpj3Iq5ipR9 w1TmghWo/TEBtxniJW976mGZDVMvD89KlCxIDAMV2Q0D6uCBeqWSocY1Sfw3Tvtq cD/iXqneCVVA/t+LL9+W0R/CaGPNY7fQdurw8ICF9yHH24apS/WCgSYTUDQwCe55 rodo1xNvV59+RdeB3em7ZMcFpseP159vZ28CT+a/wE3KCijP34C6gwv+jw75OILM 8PvfcfougidBE7aguJJHL4rbjTwJfxpAmLFgaHBr5pzmeNJ5nvUf1qL3jWppNJMf 7i+vEoQO16hSuuwxdQc+N67c69wN5bpf6kfjcAaYMDYq3OcqmVfXruQ9xDi3NGdP IALxD434/vqDYyn+8KD2UWq+lPIlEtS4Myh5EP39REaedPMboMquWCqGcCQFpzBh qM5pbXaIPjfBtndhZTW/y88I6MYMrjRPc3nGRR11FspCkd/Pvb0= =c6/O -----END PGP SIGNATURE-----
participants (5)
-
Demi Marie Obenour
-
Eric Jacobs
-
Gernot Heiser
-
Gerwin Klein
-
Indan Zupancic