Hi Stewart,
The ring buffers were previously implemented as a library on CAmkES. You can find the PR here: https://github.com/seL4/projects_libs/pull/15
The move to the Core Platform was made due to performance overheads in the CAmkES framework. There is no reason the sDDF can’t be used on other platforms either, though the sample system will need porting.
Hi Harry,
The use of single producer, single consumer queues was an intentional simplification for the driver framework. This makes it much easier to reason about and verify as well as eliminating many possible concurrency bugs. The framework does not require single threaded address spaces though. We could have multiple threads acting as a single component, but they would be servicing different queues (and thus the queues would remain single producer, single consumer).
Ideally for larger systems with multiple clients, we would use a multiplexing component to service multiple queues between client applications and instead grow the stack laterally. This design aims to provide a strong separation of concerns as each job would be a separate component and the simplicity of the queues means there is little performance overhead. In multicore, each of these components would run on separate cores.
You can probably hide most of this for devices where a single hardware
thread is sufficient to handle the device and the devices are independent
but if you are doing anything involving the CPU in conjunction with a fast
modern network device then one hardware thread won’t be enough
We propose adding a second hw thread, one to service each direction and thus we could still maintain single producer/single consumer. We still have some work to do to expand the sDDF to other device classes (and this will involve benchmarking the framework on high throughput networking systems), so more on this to come! :)
In the mean time, happy to answer any questions and welcome any feedback/ideas!
Lucy
On 16 Oct 2022, at 4:25 pm, Harry Butterworth mailto:heb1001@gmail.com> wrote:
The "single producer, single consumer, lockless bounded queues implemented
as ring buffers" also caught my attention.
In the past, to minimize jitter, I found it useful to have a pool of
threads consuming work from a single queue. This reduces the probability
that tasks will be significantly delayed behind other tasks that are taking
an unusually long time, because additional threads in the pool become free
and allow subsequent tasks to bypass the ones taking too long. This is
well known and often implemented in post offices where there is a single
queue for multiple counters. This implies MPMC (or perhaps SPMC and MPSC
in pairs).
It’s also useful to arrange tasks on different dedicated hardware threads
so you can move tasks that are causing jitter away onto a separate hardware
thread. I think you might already have sufficient thread affinity
mechanisms.
Memory and threads for a pool have to be from the same NUMA node. Devices
are also local to a specific NUMA node.
You can probably hide most of this for devices where a single hardware
thread is sufficient to handle the device and the devices are independent
but if you are doing anything involving the CPU in conjunction with a fast
modern network device then one hardware thread won’t be enough, and
anything on NUMA hardware that involves multiple devices (multi-pathing for
example) is likely to benefit from NUMA awareness.
Harry
On Mon, 10 Oct 2022 at 14:24, Stewart Webb mailto:sjwebb@student.unimelb.edu.au>
wrote:
This looks really interesting. I don't feel qualified to offer any
feedback but I do have a question...
I'm particularly curious about the "single producer, single consumer,
lockless bounded queues implemented as ring buffers" mentioned for the
cross-address-space communications. Is this taken from any existing seL4
library/util (e.g. is it similar at all to shared memory communication in
CAMkES?) or is it something novel developed specifically for the driver
framework?
In my research into the Pony language on seL4 I have run into the problem
of all Pony's actor communication being single-address-space-assuming
datastructures. There are some interesting lock-free SPMC and MPSC queues
under the hood that only use C atomics, which I've been able to port over
to the seL4 environment, but they're ultimately implemented as
single-address-space linked lists, so something like this could be a useful
tool for looking at communication between actors in different address
spaces. (although Pony's message-passing is also built around an assumption
the queues are unbounded... which might just not be workable at all between
confined components. This is a discussion point for my thesis though :) )
Cheers,
Stewart Webb
_______________________________________________
Devel mailing list -- devel@sel4.systemsmailto:devel@sel4.systems
To unsubscribe send an email to devel-leave@sel4.systemsmailto:devel-leave@sel4.systems
_______________________________________________
Devel mailing list -- devel@sel4.systemsmailto:devel@sel4.systems
To unsubscribe send an email to devel-leave@sel4.systemsmailto:devel-leave@sel4.systems