On 16 Aug 2023, at 00:22, Demi Marie Obenour wrote:
I’d claim they don’t help at all. They might if “random” was truly random, which in hardware is rarely the case. “Random” cache replacement is more properly described as “unspecified”. In reality it’s usually using some existing state (eg some counter) and as such is actually fully deterministic, you only have to reverse-engineer the mechanism.
Besides that, fully-associative caches (unless very small) are inherently slow (long critical path) and energy-hungry many gates to traverse).
What about Chameleon Cache (https://arxiv.org/pdf/2209.14673.pdf)?
I had a look, and the paper pretty much confirms what I’ve said. (Besides being only a proposal…)
For one, they don’t actually do an FA cache, nor true randomness, they approximate both in some fashion so that the effect is basically the same (so they claim, I didn’t read it carefully enough to confirm that, note that the paper is published in a third-tier venue I hadn’t even heard of before – IEEE International Symposium on Secure and Private Execution Environment Design).
It does all this by making the hardware more complicated (and thus more likely to be buggy/attackable). Specifically, they take the IMHO wrong approach of trying to solve the whole problem in hardware, discarding alternative approaches as “involve[ing] software to manage the partitions”. As I keep saying, security is inherently a HW-SW-codesign problem, you won’t get good solutions by trying to do it all in HW or in SW. The right approach is policy-mechanism separation: HW provides minimal (and ideally orthogonal) mechanisms, and SW uses them according to its security policy.
The HW-only approach means that the HW must take a pessimistic approach and prevent information flow across anything that looks like it might be a security boundary, whether or not it actually is (which only the OS can know). This has a (generally measurable) cost.
Here this cost is that the cache will likely be slower (in terms of access), likely more energy-hungry and have reduced hit rates compared to, say, a 4-way LRU cache. This will be a constant overhead, whether it’s needed or not. In contrast, flushing the L1 is a cost that is only needed when an actual security-domain switch happens. It is small (fraction of a µs), negligible unless domain-switch rates are unusually high.
There is a trade-off between such per-domain-switch costs vs constant overheads resulting from a complex design, but the paper doesn’t attempt to evaluate this. In fact, their evaluation commits the benchmarking crime of improper baseline (D1 of van der Kouwe et al [2020], see https://trustworthy.systems/publications/full_text/vanderKouwe_HABG_20.pdf or also my list https://gernot-heiser.org/benchmarking-crimes.html#base). They only compare against other comples, HW-only schemes.
Gernot