At 2023-11-17T16:02:28-0500, Demi Marie Obenour wrote:
On 11/12/23 00:44, Gernot Heiser via Devel wrote:
If you want a theoretically sound solution – Welcome to Time Protection!
In contrast to your implied claim that time protection is unsound: It’s been formalised, and it’s in the process of being proved correct and complete in an seL4 implementation. Its minimal hardware support also been implemented in RISC-V processors and demonstrated cheap and complete (may even be in silicon by now).
A key storage service _must_ be able to respond to signing and decryption requests to be usable at all, and that means that the requester _will_ know how long the operation took. One can try to hide this information by padding the operation to its worst-case value, but this only works if there _is_ a worst-case value. In Qubes OS, responding to a request will require heap allocation, fork(), disk I/O, and sometimes user interaction. That means that the worst-case operation time is _infinite_, so time protection is simply not possible and will not be possible for the foreseeable future.
Can we reëstablish what we mean by "time protection", then? A few things seem to me to be true: 1. Even on statically configured systems, response times for requested services can easily exceed the quantum of a schedule slice or "time protection domain", for exactly the reasons you say. A service may have highly variable _latency_: to require information retrieval from a machine register, or local cache, or remote cache, or main memory, or mass storage, or over the network, or may await human interaction--as with a user needing to fumble with a physical token or poke keystrokes into a user interface. Approximately, each of these is at least one order of magnitude slower than the last. 2. The bandwidth of a timing channel attack is strongly and inversely correlated with the channel's latency. (This statement needs a proof.) 3. Practical time protection is therefore not so much a matter of achieving zero information leakage; application design may not permit that. For example, in the key storage service example, one _can_ draw statistical inferences based on the latency of response; if the signing or decryption request was fulfilled in less than a millisecond, then use of the key was probably not authorized consequent to human interaction. 4. The distinction between "static" and "dynamic" systems may become indistinct here, since even a statically partitioned system may be operating in a dynamic environment, particularly when taking into account what data populate caches. 5. A "static" system is not necessarily a deterministic one. And without perfect knowledge of microarchitectural implementations, something that seems unlikely to eventuate with x86-64 or AArch64 cores, determinism seems impossible. We have no way of knowing when a microarchitectural gadget will decide to flush an undocumented buffer.) 6. Because multiple orders of magnitude separate fast accesses from slow ones, the available bandwidth for exfiltration decreases exponentially. (This statement also needs a proof.) 7. The art and science of time protection will progress on two fronts: (a) selection by system designers of a "good" time quantum for high-bandwidth channels that sacrifice little performance while strangling the bandwidth of information leakage to zero capacity; and (b) the socialization of knowledge about timing channel attacks to application designers so that they can estimate the bandwidth of the timing channels they create, and respond appropriately. An analogy that comes to mind is the quality of one-way hashes. We know that we'll never have a perfect hash function for inputs of arbitrary length, but we know that the longer the hash, the better we can do. CRC32 is hopelessly weak for security applications (but still okay for others); MD5 is regarded as compromised; and BLAKE3 state-of-the-art. Time protection strong enough that your exfiltration countermeasures give your opponent a problem that will outlive them is adequate for most applications. (When it is not, there is always the option of destroying,[1] relocating, or otherwise "losing" the data.[2]) I'd very much welcome correction on any of the foregoing points. Regards, Branden [1] https://academic.oup.com/hwj/article/93/1/95/6535581 [2] https://www.theguardian.com/media/2018/jan/31/cabinet-documents-abc-reveals-...