[seL4] Re: Conclusions regarding speculation

18 Nov 2023

      At 2023-11-17T16:02:28-0500, Demi Marie Obenour wrote:
...
On 11/12/23 00:44, Gernot Heiser via Devel wrote:
...
If you want a theoretically sound solution – Welcome to Time
Protection!
In contrast to your implied claim that time protection is unsound:
It’s been formalised, and it’s in the process of being proved
correct and complete in an seL4 implementation. Its minimal hardware
support also been implemented in RISC-V processors and demonstrated
cheap and complete (may even be in silicon by now).
A key storage service _must_ be able to respond to signing and
decryption requests to be usable at all, and that means that the
requester _will_ know how long the operation took.  One can try to hide
this information by padding the operation to its worst-case value,
but this only works if there _is_ a worst-case value.  In Qubes OS,
responding to a request will require heap allocation, fork(), disk
I/O, and sometimes user interaction.  That means that the worst-case
operation time is _infinite_, so time protection is simply not
possible and will not be possible for the foreseeable future.
Can we reëstablish what we mean by "time protection", then?

A few things seem to me to be true:

1.  Even on statically configured systems, response times for requested
    services can easily exceed the quantum of a schedule slice or "time
    protection domain", for exactly the reasons you say.  A service may
    have highly variable _latency_: to require information retrieval
    from a machine register, or local cache, or remote cache, or main
    memory, or mass storage, or over the network, or may await human
    interaction--as with a user needing to fumble with a physical token
    or poke keystrokes into a user interface.  Approximately, each of
    these is at least one order of magnitude slower than the last.

2.  The bandwidth of a timing channel attack is strongly and inversely
    correlated with the channel's latency.  (This statement needs a
    proof.)

3.  Practical time protection is therefore not so much a matter of
    achieving zero information leakage; application design may not
    permit that.  For example, in the key storage service example, one
    _can_ draw statistical inferences based on the latency of response;
    if the signing or decryption request was fulfilled in less than a
    millisecond, then use of the key was probably not authorized
    consequent to human interaction.

4.  The distinction between "static" and "dynamic" systems may become
    indistinct here, since even a statically partitioned system may be
    operating in a dynamic environment, particularly when taking into
    account what data populate caches.

5.  A "static" system is not necessarily a deterministic one.  And
    without perfect knowledge of microarchitectural implementations,
    something that seems unlikely to eventuate with x86-64 or AArch64
    cores, determinism seems impossible.  We have no way of knowing when
    a microarchitectural gadget will decide to flush an undocumented
    buffer.)

6.  Because multiple orders of magnitude separate fast accesses from
    slow ones, the available bandwidth for exfiltration
    decreases exponentially.  (This statement also needs a proof.)

7.  The art and science of time protection will progress on two fronts:
    (a) selection by system designers of a "good" time quantum for
    high-bandwidth channels that sacrifice little performance while
    strangling the bandwidth of information leakage to zero capacity;
    and (b) the socialization of knowledge about timing channel attacks
    to application designers so that they can estimate the bandwidth of
    the timing channels they create, and respond appropriately.

An analogy that comes to mind is the quality of one-way hashes.  We know
that we'll never have a perfect hash function for inputs of arbitrary
length, but we know that the longer the hash, the better we can do.
CRC32 is hopelessly weak for security applications (but still okay for
others); MD5 is regarded as compromised; and BLAKE3 state-of-the-art.

Time protection strong enough that your exfiltration countermeasures
give your opponent a problem that will outlive them is adequate for most
applications.  (When it is not, there is always the option of
destroying,[1] relocating, or otherwise "losing" the data.[2])

I'd very much welcome correction on any of the foregoing points.

Regards,
Branden

[1] https://academic.oup.com/hwj/article/93/1/95/6535581
[2] https://www.theguardian.com/media/2018/jan/31/cabinet-documents-abc-reveals-...

[seL4] Re: Conclusions regarding speculation

G. Branden Robinson