CAmkES, Virtual Machines and Multicore Guarantees

26 Mar 2020

      ...
On 18 Mar 2020, at 10:13, Demi Obenour <demiobenour@gmail.com> wrote:
On Tue, Mar 17, 2020, 9:29 PM Klein, Gerwin (Data61, Kensington NSW) <
Gerwin.Klein@data61.csiro.au> wrote:
...
...
On 18 Mar 2020, at 08:59, Demi Obenour <demiobenour@gmail.com> wrote:
On Tue, Mar 17, 2020, 7:32 PM Mcleod, Kent (Data61, Kensington NSW) <
Kent.Mcleod@data61.csiro.au> wrote:
The closest you could get to a multicore system that leverages verified
...
kernel configurations would be to setup a multikernel implementation
where
each core had its own instance of a verified seL4 configuration where
any
hardware resources are strictly partitioned between them.  I'm not
aware of
any examples of this setup but there have been research projects that
have
used multikernel implementations of seL4 in the past. Also this still
wouldn't give you an overall system with the same guarantees as a
single
seL4 as the different instances would need to trust each other to not
overwrite each other's kernel data structures and is outside of the
current
proof assumptions.
Assuming that the root tasks drop capabilities that could be
problematic,
would this still be correct?  This seems to follow from the existing
seL4
specification (no way to get rights to memory from nothing).
You can set up a system to be completely separated from the root task,
and yes, that would increase confidence. There was a PhD thesis a while ago
that had a semi-formal but fairly rigorous argument for why and how this
will work, but you’d still not be running the verified configuration of the
kernel, i.e. you’d have code in there that was not verified, even if that
code doesn’t do much in this setup and you can probably convince yourself
manually that it is fine.
Since the code is deterministic, and will only ever have one initial
condition, is it possible to prove it correct by simply running the binary
and looking at the state after it executes?  That is, this code could be
viewed as an extension of the kernel init code.
That would work per system, yes, although you’d have to re-do that
certification every time you tweaked something in the setup of that system.
We’ve thought about this approach also for kernel initialisation in
general, i.e. writing an automated check that the invariants are true after
the system is initialised. There are a lot of expensive calculations in
that check, though, so we haven’t really gone with that. Still remains an
option, though. (Expensive, because the invariants have things like “for
every cap, if there exists another cap to the same object, this cap must
have the following properties” etc, ie. lots of forall/exists combinations
over all of memory, which are no problem in logic, but hard on
computation).
You’d also still have kernel code that runs later that is not verified,
mostly very simple stuff such as instead of a current thread, you have an
array of current threads (one per CPU). The PhD thesis[1] did look at that
and concluded it’s all fine, but compared to a full code-level verification
that’s obviously lower assurance. The thesis is from 2013, so a lot has
happened since then, but the general idea still holds.
[1]:
https://ts.data61.csiro.au/publications/nictaabstracts/vonTessin:phd.abstrac...
...
Also, why is verifying fine-grained locking so difficult?  CertiKOS
managed
...
to do it, so I am probably missing something.
I don’t see that much fine-grained locking in CertKOS, it mostly uses
separation. It’s easier for it to do so, because it has a significantly
simpler API (in fact, it’s probably quite similar to a multikernel setup in
seL4 + messaging between cores. The only setups I’ve seen have precisely
one VM per core).
I was not aware of that, thank you.
It seems to be even worse, actually -- according to the paper that Gernot
posted there is no communication between cores, which if I think back to
the projects I’ve seen is actually true there as well, they used shared
memory between VMs to communicate, not kernel primitives. It also has
cooperative scheduling which is fine for some systems, but not others. It’s
notoriously hard to figure out what features CertiKOS actually has, because
the papers of course concentrate on other aspects such as verification (as
do ours, it’s probably not that easy to figure out what is verified in
which version of seL4 either without a lot of reading, but we do try to be
explicit about it :-)).
Gernot is obviously not very happy with them from a systems perspective.
From the verification side, they have done some pretty cool things, just
not to the systems level that Gernot would find acceptable ;-)
Thank you so much.  I view seL4 as universal, in that it should be
possible to build virtually any desired OS on top of it, and I am glad that
a high-performance multicore version with fine-grained locking at least
appears feasible.
It’s definitely part of the vision. There is a lot of thinking about
fine-grained vs big-lock, because the seL4 big-lock is about the order of a
Linux fine-grained lock, with some caveats, but the idea is that we want
this to scale.
Is it planned to scale beyond the area of small inter-core communication
costs?  Obviously, transferring large amounts of data between cores on hot
paths won't scale.  However, I am still not convinced that a multikernel is
the best way to go: it has always seemed like a hack to work around kernel
limitations, and it seems to make slow-path control-plane userspace code
far more complex.  Such code often isn't performance critical at all, so
forcing it to be distributed seems like unnecessary complexity.
Furthermore, replicating data structures (such as page tables) which are
almost entirely read-only seems like it could create substantial overhead,
although I am not sure if that is an issue in practice.

Finally, as someone who writes Rust professionally, I find thread-affine
APIs to be quite annoying to deal with.  Resources that can't be safely
accessed by more than one thread concurrently are easy to wrap in clean
APIs, but resources that can't be sent across threads are much harder to
wrap.  Making kernel resources specifically per-core, and preventing them
from being transferred between cores, seems likely to lead to a bunch of
userspace frameworks of varying quality that try to work around this
limitation.
...
Are there any papers that I could read to learn more?
We don’t have anything really published yet on the verification side,
because it’s all still ongoing work. There is one about the big-lock idea,
though:
https://ts.data61.csiro.au/publications/nictaabstracts/Peters_DEH_15.abstrac...
I read and like that idea.  My biggest worry, however, is that a big lock
won't scale beyond a small number of cores.  The idea that fine-grained
locking might be possible to verify gives me hope, as I had previously
given up on that.
...
Cheers,
Gerwin
Cheers,

Demi=

Demi M. Obenour

Demi Obenour

tags

participants (2)