Conclusions regarding speculation

Demi Marie Obenour

10 Nov 2023 10 Nov '23

6:03 a.m.

I talked with a Xen developer and came to these conclusions: - Speculative taint tracking provides complete protection against speculative attacks. This is sufficient to prevent leakage of cryptographic key material, even in fully dynamic systems. Furthermore, it is compatible with fast context switches between protection domains. - Full time partitioning eliminates all timing channels, but it is possible only in fully static systems, which severely limits its applicability. - Time protection without time partitioning does _not_ fully prevent Spectre v1 attacks, and still imposes a large penalty on protection domain switches. Additionally, I am almost certain that: - On properly designed hardware, both time protection and speculative taint tracking can be enabled and disabled by systems software. - Time protection and speculative taint tracking are not mutually exclusive. A cloud provider might use time partitioning to partition different customers from each other, while guest OSs use speculative taint tracking to protect different processes from each other. In short, time protection is excellent, but it is not a sufficient mechanism for general-purpose computing. Speculative taint tracking is a different mechanism that is applicable to many more workloads, and which provides complete protection against speculative attacks. Both mechanisms can be used together depending on system security policy. -- Sincerely, Demi Marie Obenour (she/her/hers)

Show replies by date

Gernot Heiser

10 Nov 10 Nov

9:47 a.m.

On 10 Nov 2023, at 06:03, Demi Marie Obenour <demiobenour@gmail.com> wrote: - Speculative taint tracking provides complete protection against speculative attacks. This is sufficient to prevent leakage of cryptographic key material, even in fully dynamic systems. Furthermore, it is compatible with fast context switches between protection domains. It’s also a point solution, that provides zero guarantees against unforeseen attacks. - Full time partitioning eliminates all timing channels, but it is possible only in fully static systems, which severely limits its applicability. I’m sorry, but this is simply false. What you need for time protection (I assume this is what you mean with “full time partitioning”) are fixed time slices – ”fixed” in that their length cannot depend on any events in the system that can be controlled by an untrusted domain. It doesn’t mean they cannot be changed as domains come and go. - Time protection without time partitioning does _not_ fully prevent Spectre v1 attacks, and still imposes a large penalty on protection domain switches. Time protection does *not* impose a large penalty. Its context-switching cost is completely hidden by the cost of an L1 D-cache flush – as has been demonstrated by published work. And if you don’t flush the L1 cache, you’ll have massive leakage, taint-tracking or not. Where time protection, *without further hardware support*, does have a cost is for partitioning the lower-level caches. This cost is two-fold: 1) Average cache utilisation is reduced due to the static partitioning (in contrast to the dynamic partitioning that happens as a side effect of the hardware’s cache replacement policy). This cost is generally in the single-digit percentage range (as per published work), but can also be negative – there’s plenty of work that uses static cache partitioning for performance *isolation/improvement*. 2) Memory utilisation is reduced, as colouring implicitly partitions memory. This is the most significant cost, and unavoidable without more hardware support. Intel CAT is one variant of such support (partitions the cache by ways rather than set, which has not effect on memory utilisation, but only works on the LLC, and has itself a performance cost due to reduced cache associativity). And, of course, without partitioning the lower-level caches you have leakage again, and taint tracking isn’t going to help there either. If people want to improve the hardware, focussing on generic mechanisms such as support for partitioning L2-LL caches would be far more beneficial than point-solutions that will be defeated by the next class of attacks. Gernot

Hugo V.C.

9:07 p.m.

"If people want to improve the hardware, focussing on generic mechanisms such as support for partitioning L2-LL caches would be far more beneficial than point-solutions that will be defeated by the next class of attacks." The path of partitioning some hardware resourse ends up in full partitioning of the computing platform including power supply. It is simpler (almost zero design effort) and the only "reasonably" secure solution. Whenever you share hardware resources, you open the path to side channels. On the other hand, PLUS full computing platform partitioning time protection is a must on each isolated computing platform. So here we have two problems that need to be addressed by different vendors: 1) Time protection, CPU/SoC vendors 2) Computing platform isolation (laptop/servers vendors). Figure out how wonderful would be to have a laptop with X full independent computing platforms inside (fun/work/banking...) and each one based on CPU/SoC solutions with Time Protection. On top of each of those platforms some verified hypervisor/kernel (seL4?). BTW, the step 2 is straight forward, just make laptops a bit bigger and add a screen swich to switch each isolated computing platform. Step 1 requires cpu/SoC vendors more effort and there will be some resilience to sell it. El jue., 9 nov. 2023 23:51, Gernot Heiser via Devel <devel@sel4.systems> escribió:

...

On 10 Nov 2023, at 06:03, Demi Marie Obenour <demiobenour@gmail.com> wrote:

- Speculative taint tracking provides complete protection against speculative attacks. This is sufficient to prevent leakage of cryptographic key material, even in fully dynamic systems. Furthermore, it is compatible with fast context switches between protection domains.

It’s also a point solution, that provides zero guarantees against unforeseen attacks.

- Full time partitioning eliminates all timing channels, but it is possible only in fully static systems, which severely limits its applicability.

I’m sorry, but this is simply false.

What you need for time protection (I assume this is what you mean with “full time partitioning”) are fixed time slices – ”fixed” in that their length cannot depend on any events in the system that can be controlled by an untrusted domain. It doesn’t mean they cannot be changed as domains come and go.

- Time protection without time partitioning does _not_ fully prevent Spectre v1 attacks, and still imposes a large penalty on protection domain switches.

Time protection does *not* impose a large penalty. Its context-switching cost is completely hidden by the cost of an L1 D-cache flush – as has been demonstrated by published work. And if you don’t flush the L1 cache, you’ll have massive leakage, taint-tracking or not.

Where time protection, *without further hardware support*, does have a cost is for partitioning the lower-level caches. This cost is two-fold:

1) Average cache utilisation is reduced due to the static partitioning (in contrast to the dynamic partitioning that happens as a side effect of the hardware’s cache replacement policy). This cost is generally in the single-digit percentage range (as per published work), but can also be negative – there’s plenty of work that uses static cache partitioning for performance *isolation/improvement*.

2) Memory utilisation is reduced, as colouring implicitly partitions memory. This is the most significant cost, and unavoidable without more hardware support. Intel CAT is one variant of such support (partitions the cache by ways rather than set, which has not effect on memory utilisation, but only works on the LLC, and has itself a performance cost due to reduced cache associativity).

And, of course, without partitioning the lower-level caches you have leakage again, and taint tracking isn’t going to help there either.

If people want to improve the hardware, focussing on generic mechanisms such as support for partitioning L2-LL caches would be far more beneficial than point-solutions that will be defeated by the next class of attacks.

Gernot _______________________________________________ Devel mailing list -- devel@sel4.systems To unsubscribe send an email to devel-leave@sel4.systems

Demi Marie Obenour

13 Nov 13 Nov

12:46 p.m.

On 11/10/23 05:07, Hugo V.C. wrote:

...

"If people want to improve the hardware, focussing on generic mechanisms such as support for partitioning L2-LL caches would be far more beneficial than point-solutions that will be defeated by the next class of attacks."

The path of partitioning some hardware resourse ends up in full partitioning of the computing platform including power supply. It is simpler (almost zero design effort) and the only "reasonably" secure solution. Whenever you share hardware resources, you open the path to side channels.

At this point one just has multiple separate systems.

...

On the other hand, PLUS full computing platform partitioning time protection is a must on each isolated computing platform. So here we have two problems that need to be addressed by different vendors:

1) Time protection, CPU/SoC vendors 2) Computing platform isolation (laptop/servers vendors).

Figure out how wonderful would be to have a laptop with X full independent computing platforms inside (fun/work/banking...) and each one based on CPU/SoC solutions with Time Protection.

That works until one needs to use all of the cores on the system for a parallel VM kernel build or for non-accelerated video encoding.

...

On top of each of those platforms some verified hypervisor/kernel (seL4?).

BTW, the step 2 is straight forward, just make laptops a bit bigger and add a screen swich to switch each isolated computing platform.

One might as well just buy multiple laptops and be able to use them at the same time. -- Sincerely, Demi Marie Obenour (she/her/hers)

Hugo V.C.

6:55 p.m.

El lun, 13 nov 2023 a las 2:48, Demi Marie Obenour (<demiobenour@gmail.com>) escribió:

...

On 11/10/23 05:07, Hugo V.C. wrote:

...
"If people want to improve the hardware, focussing on generic mechanisms such as support for partitioning L2-LL caches would be far more beneficial than point-solutions that will be defeated by the next class of attacks."

The path of partitioning some hardware resourse ends up in full partitioning of the computing platform including power supply. It is simpler (almost zero design effort) and the only "reasonably" secure solution. Whenever you share hardware resources, you open the path to side channels.

At this point one just has multiple separate systems.

**** Yes, it is wonderful how simple it is... isn't it?

...

...
On the other hand, PLUS full computing platform partitioning time protection is a must on each isolated computing platform. So here we have two problems that need to be addressed by different vendors:

1) Time protection, CPU/SoC vendors 2) Computing platform isolation (laptop/servers vendors).

Figure out how wonderful would be to have a laptop with X full independent computing platforms inside (fun/work/banking...) and each one based on CPU/SoC solutions with Time Protection.

That works until one needs to use all of the cores on the system for a parallel VM kernel build or for non-accelerated video encoding.

...
On top of each of those platforms some verified hypervisor/kernel (seL4?).

BTW, the step 2 is straight forward, just make laptops a bit bigger and add a screen swich to switch each isolated computing platform.

One might as well just buy multiple laptops and be able to use them at the same time.

*** You can not easily travel with multiple laptops and it is not a comfortable solution, nor you can sell this idea to anyone. Instead you can travel with a single laptop with 3 different computing platforms inside, basically, the user will never know, it will be transparent.

...

-- Sincerely, Demi Marie Obenour (she/her/hers)

_______________________________________________ Devel mailing list -- devel@sel4.systems To unsubscribe send an email to devel-leave@sel4.systems

Demi Marie Obenour

11 Nov 11 Nov

7:49 a.m.

On 11/9/23 17:47, Gernot Heiser via Devel wrote:

...

On 10 Nov 2023, at 06:03, Demi Marie Obenour <demiobenour@gmail.com> wrote:

- Speculative taint tracking provides complete protection against speculative attacks. This is sufficient to prevent leakage of cryptographic key material, even in fully dynamic systems. Furthermore, it is compatible with fast context switches between protection domains.

It’s also a point solution, that provides zero guarantees against unforeseen attacks.

Unless I am severely mistaken, it provides complete protection for code that has secret-independent timing, such as cryptographic software. It is also cheaper than some of the workarounds existing systems must use.

...

- Full time partitioning eliminates all timing channels, but it is possible only in fully static systems, which severely limits its applicability.

I’m sorry, but this is simply false.

What you need for time protection (I assume this is what you mean with “full time partitioning”) are fixed time slices – ”fixed” in that their length cannot depend on any events in the system that can be controlled by an untrusted domain. It doesn’t mean they cannot be changed as domains come and go.

Based on what information should I set these time slices? The system I work with has no idea what workload it is running. It can’t require the workload to provide it with hints because existing workloads don’t do that. It can’t ask the user because the user will have no idea either. The most it can do is adapt based on what the workload is doing at runtime. In Qubes OS, it is completely normal for one VM (effectively a protection domain) to start using almost the entire system’s CPU resources without any warning at all. This could happen because the user just started compiling a big project. The user might then start another CPU-intensive task (such as a video call) in another VM, and that might _also_ try to use 100% of the system CPU resources. And users will expect that the CPU usage of the first workload will decrease when this happens, as some (but not all!) of the CPU time is allotted to the second workload instead. Given this constraint, I see no way to implement time protection. It _is_ possible to quantize the amount of CPU time allotted to a given vCPU, and to have a process that uses small, random amounts of CPU to generate noise. However, ensuring that the CPU time used by one domain is not observable by another domain is not possible, and I do not believe this will ever change.

...

- Time protection without time partitioning does _not_ fully prevent Spectre v1 attacks, and still imposes a large penalty on protection domain switches.

Time protection does *not* impose a large penalty. Its context-switching cost is completely hidden by the cost of an L1 D-cache flush – as has been demonstrated by published work. And if you don’t flush the L1 cache, you’ll have massive leakage, taint-tracking or not.

Where time protection, *without further hardware support*, does have a cost is for partitioning the lower-level caches. This cost is two-fold:

1) Average cache utilisation is reduced due to the static partitioning (in contrast to the dynamic partitioning that happens as a side effect of the hardware’s cache replacement policy). This cost is generally in the single-digit percentage range (as per published work), but can also be negative – there’s plenty of work that uses static cache partitioning for performance *isolation/improvement*.

Static partitioning based on _what_? On a desktop system, the dynamic behavior of a workload is generally the _only_ information known about that workload, so any partitioning _must_ be dynamic.

...

2) Memory utilisation is reduced, as colouring implicitly partitions memory. This is the most significant cost, and unavoidable without more hardware support. Intel CAT is one variant of such support (partitions the cache by ways rather than set, which has not effect on memory utilisation, but only works on the LLC, and has itself a performance cost due to reduced cache associativity).

Static memory partitioning is completely nonviable for desktop workloads. A VM might go from needing 1/16 of system RAM to requesting over half of it without any warning, and the expected behavior is that unless some other part of the system is using that RAM, the VM will get it. And yes, that does mean that two VMs can communicate by competing for system RAM, just like they can communicate by competing for CPU resources. Covert channels (ones that require cooperation from both ends) are out of scope for Qubes OS and for every other desktop system I am aware of, precisely because the cost of eliminating them is so high.

...

And, of course, without partitioning the lower-level caches you have leakage again, and taint tracking isn’t going to help there either.

If people want to improve the hardware, focussing on generic mechanisms such as support for partitioning L2-LL caches would be far more beneficial than point-solutions that will be defeated by the next class of attacks.

I would much rather have a theoretically sound solution than an unsound one. However, it is even more important that my desktop actually be able to do the tasks I expect of it. To the best of my knowledge, time protection and a usable desktop are incompatible with each other. I do hope you can prove me wrong here. -- Sincerely, Demi Marie Obenour (she/her/hers)

Gernot Heiser

12 Nov 12 Nov

4:44 p.m.

On 11 Nov 2023, at 07:49, Demi Marie Obenour <demiobenour@gmail.com> wrote:

...

On 11/9/23 17:47, Gernot Heiser via Devel wrote:

...
On 10 Nov 2023, at 06:03, Demi Marie Obenour <demiobenour@gmail.com> wrote:

...
- Speculative taint tracking provides complete protection against speculative attacks. This is sufficient to prevent leakage of cryptographic key material, even in fully dynamic systems. Furthermore, it is compatible with fast context switches between protection domains.

It’s also a point solution, that provides zero guarantees against unforeseen attacks.

Unless I am severely mistaken, it provides complete protection for code that has secret-independent timing, such as cryptographic software. It is also cheaper than some of the workarounds existing systems must use.

Well, *if* speculative taint tracking is really completely and correctly implemented, *and* you have such magic hardware. That’s a strong statement (for which there’s no proof). But let’s assume it is true. Then you may a complete protection against *speculation* attacks. Remember, speculation attacks construct a Trojan in otherwise trustworthy code using speculative execution of gadgets. There are other ways, specifically control-flow attacks So order to be secure, you then “only” need: - the magic complete and performant implementation of taint tacking, AND - complete prevention of control-flow attacks, AND - all secret-handling code being free from algorithmic timing side channels (i.e. no branching on or indexing by secrets), AND - no untrusted code, because any untrusted code may contain a Trojan that actively leaks through caches etc If you’re comfortable with all those ifs, fine. I’m not.

...

...
- Full time partitioning eliminates all timing channels, but it is possible only in fully static systems, which severely limits its applicability.

...
I’m sorry, but this is simply false.

What you need for time protection (I assume this is what you mean with “full time partitioning”) are fixed time slices – ”fixed” in that their length cannot depend on any events in the system that can be controlled by an untrusted domain. It doesn’t mean they cannot be changed as domains come and go.

Based on what information should I set these time slices?

That’s OS/hypervisor policy. Every system I know assigns time slices, that’s normal. But note, the strictly fixed (in the sense of not influencable by user code) time slices are only needed if you want to prevent *all* timing channels, in this case leaking by controlling the timing of context switches. This is a relatively low-bandwidth channel, i.e. it will need a minute or tow to leak an SSL key. If you’re fine with that then there’s no need for fixed time slices.

...

...
...
- Time protection without time partitioning does _not_ fully prevent Spectre v1 attacks, and still imposes a large penalty on protection domain switches.

Time protection does *not* impose a large penalty. Its context-switching cost is completely hidden by the cost of an L1 D-cache flush – as has been demonstrated by published work. And if you don’t flush the L1 cache, you’ll have massive leakage, taint-tracking or not.

Where time protection, *without further hardware support*, does have a cost is for partitioning the lower-level caches. This cost is two-fold:

1) Average cache utilisation is reduced due to the static partitioning (in contrast to the dynamic partitioning that happens as a side effect of the hardware’s cache replacement policy). This cost is generally in the single-digit percentage range (as per published work), but can also be negative – there’s plenty of work that uses static cache partitioning for performance *isolation/improvement*.

Static partitioning based on _what_? On a desktop system, the dynamic behavior of a workload is generally the _only_ information known about that workload, so any partitioning _must_ be dynamic.

Again, if you trust all your code to not intentionally leak secrets, then you don’t have to do this. Cache channels are very high bandwidth. Even cache side-channels have high enough bandwidth to steal encryption keys in minutes. If your threat scenario doesn’t care about this, fine. But there’s no way of preventing cache channels other than flushing or partitioning. So, it all depends on your threat scenario. If your threat scenario is that: - your hypervisor/kernel is completely trusted - all secret-handling code is trusted to - not have algorithmic timing channels - not be susceptible to control-flow attacks - be free of Trojans … then speculation attacks may be your main worry and you can ignore all the other timing channels, and you *may* be covered by (complete and properly-implemented) speculation taint tracking. That’s a lot of ifs – too many for my comfort. Note, this tracking adds a fair amount of complexity to the processor, which means there’s a high likelihood the implementation is buggy. This is in contract to fence.t, which is extremely simple to implement.

...

...
And, of course, without partitioning the lower-level caches you have leakage again, and taint tracking isn’t going to help there either.

If people want to improve the hardware, focussing on generic mechanisms such as support for partitioning L2-LL caches would be far more beneficial than point-solutions that will be defeated by the next class of attacks.

I would much rather have a theoretically sound solution than an unsound one. However, it is even more important that my desktop actually be able to do the tasks I expect of it. To the best of my knowledge, time protection and a usable desktop are incompatible with each other. I do hope you can prove me wrong here.

If you want a theoretically sound solution – Welcome to Time Protection! In contrast to your implied claim that time protection is unsound: It’s been formalised, and it’s in the process of being proved correct and complete in an seL4 implementation. Its minimal hardware support also been implemented in RISC-V processors and demonstrated cheap and complete (may even be in silicon by now). Gernot

Demi Marie Obenour

13 Nov 13 Nov

12:24 p.m.

On 11/12/23 00:44, Gernot Heiser via Devel wrote:

...

On 11 Nov 2023, at 07:49, Demi Marie Obenour <demiobenour@gmail.com> wrote:

...
On 11/9/23 17:47, Gernot Heiser via Devel wrote:

...
On 10 Nov 2023, at 06:03, Demi Marie Obenour <demiobenour@gmail.com> wrote:

...
- Speculative taint tracking provides complete protection against speculative attacks. This is sufficient to prevent leakage of cryptographic key material, even in fully dynamic systems. Furthermore, it is compatible with fast context switches between protection domains.

It’s also a point solution, that provides zero guarantees against unforeseen attacks.

Unless I am severely mistaken, it provides complete protection for code that has secret-independent timing, such as cryptographic software. It is also cheaper than some of the workarounds existing systems must use.

Well, *if* speculative taint tracking is really completely and correctly implemented, *and* you have such magic hardware. That’s a strong statement (for which there’s no proof). But let’s assume it is true.

Then you may a complete protection against *speculation* attacks.

Remember, speculation attacks construct a Trojan in otherwise trustworthy code using speculative execution of gadgets. There are other ways, specifically control-flow attacks

So order to be secure, you then “only” need: - the magic complete and performant implementation of taint tacking, AND - complete prevention of control-flow attacks, AND - all secret-handling code being free from algorithmic timing side channels (i.e. no branching on or indexing by secrets), AND - no untrusted code, because any untrusted code may contain a Trojan that actively leaks through caches etc

The first is a reasonable assumption for hardware with taint tracking, and the rest are reasonable assumptions for cryptographic code.

...

If you’re comfortable with all those ifs, fine. I’m not.

...
...
- Full time partitioning eliminates all timing channels, but it is possible only in fully static systems, which severely limits its applicability.

...
I’m sorry, but this is simply false.

What you need for time protection (I assume this is what you mean with “full time partitioning”) are fixed time slices – ”fixed” in that their length cannot depend on any events in the system that can be controlled by an untrusted domain. It doesn’t mean they cannot be changed as domains come and go.

Based on what information should I set these time slices?

That’s OS/hypervisor policy. Every system I know assigns time slices, that’s normal.

But note, the strictly fixed (in the sense of not influencable by user code) time slices are only needed if you want to prevent *all* timing channels, in this case leaking by controlling the timing of context switches.

This is a relatively low-bandwidth channel, i.e. it will need a minute or tow to leak an SSL key. If you’re fine with that then there’s no need for fixed time slices.

Is this with both sides cooperating or not? Covert channel attacks are out of scope.

...

...
...
...
- Time protection without time partitioning does _not_ fully prevent Spectre v1 attacks, and still imposes a large penalty on protection domain switches.

Time protection does *not* impose a large penalty. Its context-switching cost is completely hidden by the cost of an L1 D-cache flush – as has been demonstrated by published work. And if you don’t flush the L1 cache, you’ll have massive leakage, taint-tracking or not.

Where time protection, *without further hardware support*, does have a cost is for partitioning the lower-level caches. This cost is two-fold:

1) Average cache utilisation is reduced due to the static partitioning (in contrast to the dynamic partitioning that happens as a side effect of the hardware’s cache replacement policy). This cost is generally in the single-digit percentage range (as per published work), but can also be negative – there’s plenty of work that uses static cache partitioning for performance *isolation/improvement*.

Static partitioning based on _what_? On a desktop system, the dynamic behavior of a workload is generally the _only_ information known about that workload, so any partitioning _must_ be dynamic.

Again, if you trust all your code to not intentionally leak secrets, then you don’t have to do this.

Cache channels are very high bandwidth. Even cache side-channels have high enough bandwidth to steal encryption keys in minutes.

How much would this be reduced by a cache that was fully associative, or which emulated full associativity?

...

If your threat scenario doesn’t care about this, fine. But there’s no way of preventing cache channels other than flushing or partitioning.

Environments that can use cache flushing or partitioning should. However, the cost of flushing may be more than one can bear, and static partitioning requires information that desktop (and mobile) OSs simply don’t have. The question then becomes how much one can minimize the channel before the performance (and power consumption) penalty exceeds what users are willing to tolerate.

...

So, it all depends on your threat scenario.

It doesn’t just depend on threat scenario, but also on what one can afford. Desktop and mobile OSs run workloads their developers may have never dreamed of. And users expect them to remain performant and responsive. I have yet to see anyone

...

If your threat scenario is that: - your hypervisor/kernel is completely trusted. - all secret-handling code is trusted to - not have algorithmic timing channels - not be susceptible to control-flow attacks - be free of Trojans … then speculation attacks may be your main worry and you can ignore all the other timing channels, and you *may* be covered by (complete and properly-implemented) speculation taint tracking.

Algorithmic timing channels are the only threat on that list I am concerned about for the purposes of this discussion.

...

That’s a lot of ifs – too many for my comfort.

Note, this tracking adds a fair amount of complexity to the processor, which means there’s a high likelihood the implementation is buggy. This is in contract to fence.t, which is extremely simple to implement.

If I can’t use it, it is of no use to me.

...

...
...
And, of course, without partitioning the lower-level caches you have leakage again, and taint tracking isn’t going to help there either.

If people want to improve the hardware, focussing on generic mechanisms such as support for partitioning L2-LL caches would be far more beneficial than point-solutions that will be defeated by the next class of attacks.

I would much rather have a theoretically sound solution than an unsound one. However, it is even more important that my desktop actually be able to do the tasks I expect of it. To the best of my knowledge, time protection and a usable desktop are incompatible with each other. I do hope you can prove me wrong here.

If you want a theoretically sound solution – Welcome to Time Protection!

I _want_ a theoretically sound solution. I _need_ a usable desktop. If I must use between a theoretically sound solution and a usable system, soundness loses. Every time. I _know_ that Qubes OS could use speculative taint tracking if the hardware supported it. I can say the same about fully-associative caches and other pure-hardware countermeasures. Are you equally certain that Qubes OS could take advantage of time protection, and do so in a way that is completely transparent to the user? If so, how? “Completely transparent to the user” also means that the performance penalty must not be more than 30% or so, even when one protection domain unexpectedly needs access to most of the system CPU and memory.

...

In contrast to your implied claim that time protection is unsound: It’s been formalised, and it’s in the process of being proved correct and complete in an seL4 implementation. Its minimal hardware support also been implemented in RISC-V processors and demonstrated cheap and complete (may even be in silicon by now).

I misunderstood time protection as referring to using fence.t on context switch, as opposed to also ensuring fixed-size timeslices. -- Sincerely, Demi Marie Obenour (she/her/hers)

Demi Marie Obenour

18 Nov 18 Nov

8:02 a.m.

On 11/12/23 00:44, Gernot Heiser via Devel wrote:

...

On 11 Nov 2023, at 07:49, Demi Marie Obenour <demiobenour@gmail.com> wrote:

...
On 11/9/23 17:47, Gernot Heiser via Devel wrote:

...
On 10 Nov 2023, at 06:03, Demi Marie Obenour <demiobenour@gmail.com> wrote:

...
- Speculative taint tracking provides complete protection against speculative attacks. This is sufficient to prevent leakage of cryptographic key material, even in fully dynamic systems. Furthermore, it is compatible with fast context switches between protection domains.

It’s also a point solution, that provides zero guarantees against unforeseen attacks.

Unless I am severely mistaken, it provides complete protection for code that has secret-independent timing, such as cryptographic software. It is also cheaper than some of the workarounds existing systems must use.

Well, *if* speculative taint tracking is really completely and correctly implemented, *and* you have such magic hardware. That’s a strong statement (for which there’s no proof). But let’s assume it is true.

Then you may a complete protection against *speculation* attacks.

Remember, speculation attacks construct a Trojan in otherwise trustworthy code using speculative execution of gadgets. There are other ways, specifically control-flow attacks

So order to be secure, you then “only” need: - the magic complete and performant implementation of taint tacking, AND - complete prevention of control-flow attacks, AND - all secret-handling code being free from algorithmic timing side channels (i.e. no branching on or indexing by secrets), AND - no untrusted code, because any untrusted code may contain a Trojan that actively leaks through caches etc

If you’re comfortable with all those ifs, fine. I’m not.

...
...
- Full time partitioning eliminates all timing channels, but it is possible only in fully static systems, which severely limits its applicability.

...
I’m sorry, but this is simply false.

What you need for time protection (I assume this is what you mean with “full time partitioning”) are fixed time slices – ”fixed” in that their length cannot depend on any events in the system that can be controlled by an untrusted domain. It doesn’t mean they cannot be changed as domains come and go.

Based on what information should I set these time slices?

That’s OS/hypervisor policy. Every system I know assigns time slices, that’s normal.

But note, the strictly fixed (in the sense of not influencable by user code) time slices are only needed if you want to prevent *all* timing channels, in this case leaking by controlling the timing of context switches.

This is a relatively low-bandwidth channel, i.e. it will need a minute or tow to leak an SSL key. If you’re fine with that then there’s no need for fixed time slices.

...
...
...
- Time protection without time partitioning does _not_ fully prevent Spectre v1 attacks, and still imposes a large penalty on protection domain switches.

Time protection does *not* impose a large penalty. Its context-switching cost is completely hidden by the cost of an L1 D-cache flush – as has been demonstrated by published work. And if you don’t flush the L1 cache, you’ll have massive leakage, taint-tracking or not.

Where time protection, *without further hardware support*, does have a cost is for partitioning the lower-level caches. This cost is two-fold:

1) Average cache utilisation is reduced due to the static partitioning (in contrast to the dynamic partitioning that happens as a side effect of the hardware’s cache replacement policy). This cost is generally in the single-digit percentage range (as per published work), but can also be negative – there’s plenty of work that uses static cache partitioning for performance *isolation/improvement*.

Static partitioning based on _what_? On a desktop system, the dynamic behavior of a workload is generally the _only_ information known about that workload, so any partitioning _must_ be dynamic.

Again, if you trust all your code to not intentionally leak secrets, then you don’t have to do this.

Cache channels are very high bandwidth. Even cache side-channels have high enough bandwidth to steal encryption keys in minutes.

If your threat scenario doesn’t care about this, fine. But there’s no way of preventing cache channels other than flushing or partitioning.

So, it all depends on your threat scenario.

If your threat scenario is that: - your hypervisor/kernel is completely trusted - all secret-handling code is trusted to - not have algorithmic timing channels - not be susceptible to control-flow attacks - be free of Trojans … then speculation attacks may be your main worry and you can ignore all the other timing channels, and you *may* be covered by (complete and properly-implemented) speculation taint tracking.

That’s a lot of ifs – too many for my comfort.

Note, this tracking adds a fair amount of complexity to the processor, which means there’s a high likelihood the implementation is buggy. This is in contract to fence.t, which is extremely simple to implement.

...
...
And, of course, without partitioning the lower-level caches you have leakage again, and taint tracking isn’t going to help there either.

If people want to improve the hardware, focussing on generic mechanisms such as support for partitioning L2-LL caches would be far more beneficial than point-solutions that will be defeated by the next class of attacks.

I would much rather have a theoretically sound solution than an unsound one. However, it is even more important that my desktop actually be able to do the tasks I expect of it. To the best of my knowledge, time protection and a usable desktop are incompatible with each other. I do hope you can prove me wrong here.

If you want a theoretically sound solution – Welcome to Time Protection!

In contrast to your implied claim that time protection is unsound: It’s been formalised, and it’s in the process of being proved correct and complete in an seL4 implementation. Its minimal hardware support also been implemented in RISC-V processors and demonstrated cheap and complete (may even be in silicon by now). A key storage service _must_ be able to respond to signing and decryption requests to be usable at all, and that means that the requester _will_ know how long the operation took. One can try to hide this information by padding the operation to its worst-case value, but this only works if there _is_ a worst-case value. In Qubes OS, responding to a request will require heap allocation, fork(), disk I/O, and sometimes user interaction. That means that the worst-case operation time is _infinite_, so time protection is simply not possible and will not be possible for the foreseeable future. -- Sincerely, Demi Marie Obenour (she/her/hers)

G. Branden Robinson

9:59 a.m.

At 2023-11-17T16:02:28-0500, Demi Marie Obenour wrote:

...

On 11/12/23 00:44, Gernot Heiser via Devel wrote:

...
If you want a theoretically sound solution – Welcome to Time Protection!

In contrast to your implied claim that time protection is unsound: It’s been formalised, and it’s in the process of being proved correct and complete in an seL4 implementation. Its minimal hardware support also been implemented in RISC-V processors and demonstrated cheap and complete (may even be in silicon by now).

A key storage service _must_ be able to respond to signing and decryption requests to be usable at all, and that means that the requester _will_ know how long the operation took. One can try to hide this information by padding the operation to its worst-case value, but this only works if there _is_ a worst-case value. In Qubes OS, responding to a request will require heap allocation, fork(), disk I/O, and sometimes user interaction. That means that the worst-case operation time is _infinite_, so time protection is simply not possible and will not be possible for the foreseeable future.

Can we reëstablish what we mean by "time protection", then? A few things seem to me to be true: 1. Even on statically configured systems, response times for requested services can easily exceed the quantum of a schedule slice or "time protection domain", for exactly the reasons you say. A service may have highly variable _latency_: to require information retrieval from a machine register, or local cache, or remote cache, or main memory, or mass storage, or over the network, or may await human interaction--as with a user needing to fumble with a physical token or poke keystrokes into a user interface. Approximately, each of these is at least one order of magnitude slower than the last. 2. The bandwidth of a timing channel attack is strongly and inversely correlated with the channel's latency. (This statement needs a proof.) 3. Practical time protection is therefore not so much a matter of achieving zero information leakage; application design may not permit that. For example, in the key storage service example, one _can_ draw statistical inferences based on the latency of response; if the signing or decryption request was fulfilled in less than a millisecond, then use of the key was probably not authorized consequent to human interaction. 4. The distinction between "static" and "dynamic" systems may become indistinct here, since even a statically partitioned system may be operating in a dynamic environment, particularly when taking into account what data populate caches. 5. A "static" system is not necessarily a deterministic one. And without perfect knowledge of microarchitectural implementations, something that seems unlikely to eventuate with x86-64 or AArch64 cores, determinism seems impossible. We have no way of knowing when a microarchitectural gadget will decide to flush an undocumented buffer.) 6. Because multiple orders of magnitude separate fast accesses from slow ones, the available bandwidth for exfiltration decreases exponentially. (This statement also needs a proof.) 7. The art and science of time protection will progress on two fronts: (a) selection by system designers of a "good" time quantum for high-bandwidth channels that sacrifice little performance while strangling the bandwidth of information leakage to zero capacity; and (b) the socialization of knowledge about timing channel attacks to application designers so that they can estimate the bandwidth of the timing channels they create, and respond appropriately. An analogy that comes to mind is the quality of one-way hashes. We know that we'll never have a perfect hash function for inputs of arbitrary length, but we know that the longer the hash, the better we can do. CRC32 is hopelessly weak for security applications (but still okay for others); MD5 is regarded as compromised; and BLAKE3 state-of-the-art. Time protection strong enough that your exfiltration countermeasures give your opponent a problem that will outlive them is adequate for most applications. (When it is not, there is always the option of destroying,[1] relocating, or otherwise "losing" the data.[2]) I'd very much welcome correction on any of the foregoing points. Regards, Branden [1] https://academic.oup.com/hwj/article/93/1/95/6535581 [2] https://www.theguardian.com/media/2018/jan/31/cabinet-documents-abc-reveals-...

Hugo V.C.

11:42 p.m.

That's it. Point "7" is summarizes the goal. In other words, (current) laws of physics do no allow completely isolated systems. Unless you jump into a black hole... were even info can't escape from it. So unless we compute stuff inside a black hole (and the problem would be how to get result info back...) I think we can consider solutions as "good" were the effort required to attack the solution it is not worth. I guess everything we rely on is based on that nowadays... El sáb., 18 nov. 2023 0:02, G. Branden Robinson < g.branden.robinson@gmail.com> escribió:

...

At 2023-11-17T16:02:28-0500, Demi Marie Obenour wrote:

...
On 11/12/23 00:44, Gernot Heiser via Devel wrote:

...
If you want a theoretically sound solution – Welcome to Time Protection!

In contrast to your implied claim that time protection is unsound: It’s been formalised, and it’s in the process of being proved correct and complete in an seL4 implementation. Its minimal hardware support also been implemented in RISC-V processors and demonstrated cheap and complete (may even be in silicon by now).

A key storage service _must_ be able to respond to signing and decryption requests to be usable at all, and that means that the requester _will_ know how long the operation took. One can try to hide this information by padding the operation to its worst-case value, but this only works if there _is_ a worst-case value. In Qubes OS, responding to a request will require heap allocation, fork(), disk I/O, and sometimes user interaction. That means that the worst-case operation time is _infinite_, so time protection is simply not possible and will not be possible for the foreseeable future.

Can we reëstablish what we mean by "time protection", then?

A few things seem to me to be true:

1. Even on statically configured systems, response times for requested services can easily exceed the quantum of a schedule slice or "time protection domain", for exactly the reasons you say. A service may have highly variable _latency_: to require information retrieval from a machine register, or local cache, or remote cache, or main memory, or mass storage, or over the network, or may await human interaction--as with a user needing to fumble with a physical token or poke keystrokes into a user interface. Approximately, each of these is at least one order of magnitude slower than the last.

2. The bandwidth of a timing channel attack is strongly and inversely correlated with the channel's latency. (This statement needs a proof.)

3. Practical time protection is therefore not so much a matter of achieving zero information leakage; application design may not permit that. For example, in the key storage service example, one _can_ draw statistical inferences based on the latency of response; if the signing or decryption request was fulfilled in less than a millisecond, then use of the key was probably not authorized consequent to human interaction.

4. The distinction between "static" and "dynamic" systems may become indistinct here, since even a statically partitioned system may be operating in a dynamic environment, particularly when taking into account what data populate caches.

5. A "static" system is not necessarily a deterministic one. And without perfect knowledge of microarchitectural implementations, something that seems unlikely to eventuate with x86-64 or AArch64 cores, determinism seems impossible. We have no way of knowing when a microarchitectural gadget will decide to flush an undocumented buffer.)

6. Because multiple orders of magnitude separate fast accesses from slow ones, the available bandwidth for exfiltration decreases exponentially. (This statement also needs a proof.)

7. The art and science of time protection will progress on two fronts: (a) selection by system designers of a "good" time quantum for high-bandwidth channels that sacrifice little performance while strangling the bandwidth of information leakage to zero capacity; and (b) the socialization of knowledge about timing channel attacks to application designers so that they can estimate the bandwidth of the timing channels they create, and respond appropriately.

An analogy that comes to mind is the quality of one-way hashes. We know that we'll never have a perfect hash function for inputs of arbitrary length, but we know that the longer the hash, the better we can do. CRC32 is hopelessly weak for security applications (but still okay for others); MD5 is regarded as compromised; and BLAKE3 state-of-the-art.

Time protection strong enough that your exfiltration countermeasures give your opponent a problem that will outlive them is adequate for most applications. (When it is not, there is always the option of destroying,[1] relocating, or otherwise "losing" the data.[2])

I'd very much welcome correction on any of the foregoing points.

Regards, Branden

[1] https://academic.oup.com/hwj/article/93/1/95/6535581 [2] https://www.theguardian.com/media/2018/jan/31/cabinet-documents-abc-reveals-... _______________________________________________ Devel mailing list -- devel@sel4.systems To unsubscribe send an email to devel-leave@sel4.systems

Hugo V.C.

12 Nov 12 Nov

7:57 p.m.

"However, ensuring that the CPU time used by one domain is not observable by another domain is not possible, and I do not believe this will ever change." I usually see infosec people talking about CPU time and cache to cover "modern" hardware based attacks, which is a good starting point, but just a starting point. Share a beer with anyone with know-how on physics and will give you half dozen ways to attack a workload from another workload in a system that is sharing resources. And I'm not talking about covert channels, which are the last unicorns to protect, I talk about direct info leaks based on several measurable environmental variables of the medium were those workload are being executed. Even with air gaped systems... so sharing hardware you can figure it out... I guess I should release some paper about Marvel CPUs (ARM) and how to play with those naive hardware partitioning concepts we all are blindly trusting. One thing I absolutely agree with Gernot s to simplyfy hardware as much as possible and let the (formally verified) software do it's job when it comes to Time Protection. Even if far from perfect, this is an affordable and realistic approach. For any other hardware-based solution where the words "share/sharing" are wrote down somewhere, I would not even read the specs. El vie, 10 nov 2023 a las 22:02, Demi Marie Obenour (<demiobenour@gmail.com>) escribió:

...

On 11/9/23 17:47, Gernot Heiser via Devel wrote:

...
On 10 Nov 2023, at 06:03, Demi Marie Obenour <demiobenour@gmail.com> wrote:

- Speculative taint tracking provides complete protection against speculative attacks. This is sufficient to prevent leakage of cryptographic key material, even in fully dynamic systems. Furthermore, it is compatible with fast context switches between protection domains.

It’s also a point solution, that provides zero guarantees against unforeseen attacks.

Unless I am severely mistaken, it provides complete protection for code that has secret-independent timing, such as cryptographic software. It is also cheaper than some of the workarounds existing systems must use.

...
- Full time partitioning eliminates all timing channels, but it is possible only in fully static systems, which severely limits its applicability.

I’m sorry, but this is simply false.

What you need for time protection (I assume this is what you mean with “full time partitioning”) are fixed time slices – ”fixed” in that their length cannot depend on any events in the system that can be controlled by an untrusted domain. It doesn’t mean they cannot be changed as domains come and go.

Based on what information should I set these time slices?

The system I work with has no idea what workload it is running. It can’t require the workload to provide it with hints because existing workloads don’t do that. It can’t ask the user because the user will have no idea either. The most it can do is adapt based on what the workload is doing at runtime.

In Qubes OS, it is completely normal for one VM (effectively a protection domain) to start using almost the entire system’s CPU resources without any warning at all. This could happen because the user just started compiling a big project. The user might then start another CPU-intensive task (such as a video call) in another VM, and that might _also_ try to use 100% of the system CPU resources. And users will expect that the CPU usage of the first workload will decrease when this happens, as some (but not all!) of the CPU time is allotted to the second workload instead.

Given this constraint, I see no way to implement time protection. It _is_ possible to quantize the amount of CPU time allotted to a given vCPU, and to have a process that uses small, random amounts of CPU to generate noise. However, ensuring that the CPU time used by one domain is not observable by another domain is not possible, and I do not believe this will ever change.

...
- Time protection without time partitioning does _not_ fully prevent Spectre v1 attacks, and still imposes a large penalty on protection domain switches.

Time protection does *not* impose a large penalty. Its context-switching cost is completely hidden by the cost of an L1 D-cache flush – as has been demonstrated by published work. And if you don’t flush the L1 cache, you’ll have massive leakage, taint-tracking or not.

Where time protection, *without further hardware support*, does have a cost is for partitioning the lower-level caches. This cost is two-fold:

1) Average cache utilisation is reduced due to the static partitioning (in contrast to the dynamic partitioning that happens as a side effect of the hardware’s cache replacement policy). This cost is generally in the single-digit percentage range (as per published work), but can also be negative – there’s plenty of work that uses static cache partitioning for performance *isolation/improvement*.

Static partitioning based on _what_? On a desktop system, the dynamic behavior of a workload is generally the _only_ information known about that workload, so any partitioning _must_ be dynamic.

...
2) Memory utilisation is reduced, as colouring implicitly partitions memory. This is the most significant cost, and unavoidable without more hardware support. Intel CAT is one variant of such support (partitions the cache by ways rather than set, which has not effect on memory utilisation, but only works on the LLC, and has itself a performance cost due to reduced cache associativity).

Static memory partitioning is completely nonviable for desktop workloads. A VM might go from needing 1/16 of system RAM to requesting over half of it without any warning, and the expected behavior is that unless some other part of the system is using that RAM, the VM will get it. And yes, that does mean that two VMs can communicate by competing for system RAM, just like they can communicate by competing for CPU resources. Covert channels (ones that require cooperation from both ends) are out of scope for Qubes OS and for every other desktop system I am aware of, precisely because the cost of eliminating them is so high.

...
And, of course, without partitioning the lower-level caches you have leakage again, and taint tracking isn’t going to help there either.

If people want to improve the hardware, focussing on generic mechanisms such as support for partitioning L2-LL caches would be far more beneficial than point-solutions that will be defeated by the next class of attacks.

I would much rather have a theoretically sound solution than an unsound one. However, it is even more important that my desktop actually be able to do the tasks I expect of it. To the best of my knowledge, time protection and a usable desktop are incompatible with each other. I do hope you can prove me wrong here. -- Sincerely, Demi Marie Obenour (she/her/hers)

_______________________________________________ Devel mailing list -- devel@sel4.systems To unsubscribe send an email to devel-leave@sel4.systems

Demi Marie Obenour

13 Nov 13 Nov

12:42 p.m.

On 11/12/23 03:57, Hugo V.C. wrote:

...

"However, ensuring that the CPU time used by one domain is not observable by another domain is not possible, and I do not believe this will ever change."

I usually see infosec people talking about CPU time and cache to cover "modern" hardware based attacks, which is a good starting point, but just a starting point. Share a beer with anyone with know-how on physics and will give you half dozen ways to attack a workload from another workload in a system that is sharing resources. And I'm not talking about covert channels, which are the last unicorns to protect, I talk about direct info leaks based on several measurable environmental variables of the medium were those workload are being executed. Even with air gaped systems... so sharing hardware you can figure it out...

I guess I should release some paper about Marvel CPUs (ARM) and how to play with those naive hardware partitioning concepts we all are blindly trusting.

One thing I absolutely agree with Gernot s to simplyfy hardware as much as possible and let the (formally verified) software do it's job when it comes to Time Protection. Even if far from perfect, this is an affordable and realistic approach. For any other hardware-based solution where the words "share/sharing" are wrote down somewhere, I would not even read the specs.

- Are you sure it is affordable? - Can that formally verified software run on desktop-class hardware that I can buy, either now or in the foreseeable future? - Does it have a VMM that can run multi-core Linux guests, and which can schedule those vCPUs across multiple physical CPUs? - Does it support low power states, so that battery life is decent? - Does it allow a privileged guest to create and destroy unprivileged guests? - Does it support PCI passthrough? If not, does it have drivers for every single piece of hardware on the aforementioned desktop-class machine — including the GPU? - Does it protect against speculative execution attacks _without_ requiring static partitioning of _any_ resource? Even with a work-conserving scheduler? - Does it support reassigning memory from VMs that have lots of spare memory to VMs that are short on memory? If the answer to any of these questions is “no”, then it is not realistic in my world. -- Sincerely, Demi Marie Obenour (she/her/hers)

Hugo V.C.

6:50 p.m.

- Are you sure it is affordable? *** Yes. Creating laptops/servers with independent computing platforms is like putting toghether different laptps/servers inside the same box. Few things in life are so simple.... and simple usually means that it will be affordable or will tend to be affordable (same happened with multi core computing and this is by far a more complex manufacturing problem) - Can that formally verified software run on desktop-class hardware that I can buy, either now or in the foreseeable future? *** Yes. seL4 is an example. Here the debate is to define what amount of software you want to be verified. To me, an hypervisor/OS like seL4 is more than enough and then, on top of this you need to build other layers of non-verified software. You will always have unverified software on a Desktop, what is important is to have the verified one on critical parts of the solution. What do you need to believe seL4 can run on a laptop? Do you need to see it booting up and running a VM with a standard OS inside? (remember that this is basically what QubesOS, which I use right now, is doing all the time...). The difference is QubesOS do not rely on a verified hypervisor while the hypothetical system I'm talking about it will have seL4 booting, which is a verified piece of code. Having said that, you can et we will have more and more pieces of verified (or semi-formally verified) software that we will put together like a puzzle to build a more reliable solution. An example is TCP/IP stack, which is a key piece of software I will always encourage people to verify as it will benefit all the Internet, not only Desktops.... but yes, the response is yes, in a foreseeable future we will have such verified software running on desktop-class hardware and the current debate will be part of the Internet History. - Does it have a VMM that can run multi-core Linux guests, and which can schedule those vCPUs across multiple physical CPUs? - Does it support low power states, so that battery life is decent? - Does it allow a privileged guest to create and destroy unprivileged guests? - Does it support PCI passthrough? If not, does it have drivers for every single piece of hardware on the aforementioned desktop-class machine — including the GPU? *** Those can be answered better by seL4 experts, don't dare to give you a wrong answers, anyway, you should understand there are steps in the process of migrating to seL4, do not expect a magical path with everything working like a charm from minute zero. But there's lot of amazing talented people like you that, if working together, can help boosting the process (it will happen after or before). I bet my business and my career on it. - Does it protect against speculative execution attacks _without_ requiring static partitioning of _any_ resource? Even with a work-conserving scheduler? *** Those can be answered better by seL4 experts, don't dare to give you a wrong answers, but I think the question is tricky in the way you made it... - Does it support reassigning memory from VMs that have lots of spare memory to VMs that are short on memory? *** Those can be answered better by seL4 experts, don't dare to give you a wrong answers, anyway, I guess you are referring to the infamous 🙏 (don't want to offend) "memory balancer" of QubesOS that locks you out and forbids you to start news guests when no more memory is available (even if lot of free memory is available on all guests). If this kind of "reassigning" memory is what you need, I bet this is already viable with seL4, anyway better response will be given by seL4 experts El lun, 13 nov 2023 a las 2:42, Demi Marie Obenour (<demiobenour@gmail.com>) escribió:

...

On 11/12/23 03:57, Hugo V.C. wrote:

...
"However, ensuring that the CPU time used by one domain is not observable by another domain is not possible, and I do not believe this will ever change."

I usually see infosec people talking about CPU time and cache to cover "modern" hardware based attacks, which is a good starting point, but just a starting point. Share a beer with anyone with know-how on physics and will give you half dozen ways to attack a workload from another workload in a system that is sharing resources. And I'm not talking about covert channels, which are the last unicorns to protect, I talk about direct info leaks based on several measurable environmental variables of the medium were those workload are being executed. Even with air gaped systems... so sharing hardware you can figure it out...

I guess I should release some paper about Marvel CPUs (ARM) and how to play with those naive hardware partitioning concepts we all are blindly trusting.

One thing I absolutely agree with Gernot s to simplyfy hardware as much as possible and let the (formally verified) software do it's job when it comes to Time Protection. Even if far from perfect, this is an affordable and realistic approach. For any other hardware-based solution where the words "share/sharing" are wrote down somewhere, I would not even read the specs.

- Are you sure it is affordable? - Can that formally verified software run on desktop-class hardware that I can buy, either now or in the foreseeable future? - Does it have a VMM that can run multi-core Linux guests, and which can schedule those vCPUs across multiple physical CPUs? - Does it support low power states, so that battery life is decent? - Does it allow a privileged guest to create and destroy unprivileged guests? - Does it support PCI passthrough? If not, does it have drivers for every single piece of hardware on the aforementioned desktop-class machine — including the GPU? - Does it protect against speculative execution attacks _without_ requiring static partitioning of _any_ resource? Even with a work-conserving scheduler? - Does it support reassigning memory from VMs that have lots of spare memory to VMs that are short on memory?

If the answer to any of these questions is “no”, then it is not realistic in my world. -- Sincerely, Demi Marie Obenour (she/her/hers)

Hugo V.C.

7:01 p.m.

Sorry Demi, I misunderstood you question about affordability, I guess you are referring to:

...

One thing I absolutely agree with Gernot s to simplyfy hardware as much as possible and let the (formally verified) software do it's job when it comes to Time Protection. Even if far from perfect, this is an affordable and realistic approach.

My answer is also YES, of course it is affordable seL4 approach. It is reusable solution forever and ever. Once you formally verify something then it is backed by maths and it can be reused on future solutions. It is not only affordable, it is sustainable! Stop building short-term solutions, let's create long-term solution with reusable code. Formally verified code is the most clever security approach we humans have had since we invented computers. El lun, 13 nov 2023 a las 8:50, Hugo V.C. (<skydivebcn@gmail.com>) escribió:

...

- Are you sure it is affordable? *** Yes. Creating laptops/servers with independent computing platforms is like putting toghether different laptps/servers inside the same box. Few things in life are so simple.... and simple usually means that it will be affordable or will tend to be affordable (same happened with multi core computing and this is by far a more complex manufacturing problem)

- Can that formally verified software run on desktop-class hardware that I can buy, either now or in the foreseeable future? *** Yes. seL4 is an example. Here the debate is to define what amount of software you want to be verified. To me, an hypervisor/OS like seL4 is more than enough and then, on top of this you need to build other layers of non-verified software. You will always have unverified software on a Desktop, what is important is to have the verified one on critical parts of the solution. What do you need to believe seL4 can run on a laptop? Do you need to see it booting up and running a VM with a standard OS inside? (remember that this is basically what QubesOS, which I use right now, is doing all the time...). The difference is QubesOS do not rely on a verified hypervisor while the hypothetical system I'm talking about it will have seL4 booting, which is a verified piece of code. Having said that, you can et we will have more and more pieces of verified (or semi-formally verified) software that we will put together like a puzzle to build a more reliable solution. An example is TCP/IP stack, which is a key piece of software I will always encourage people to verify as it will benefit all the Internet, not only Desktops.... but yes, the response is yes, in a foreseeable future we will have such verified software running on desktop-class hardware and the current debate will be part of the Internet History.

- Does it have a VMM that can run multi-core Linux guests, and which can schedule those vCPUs across multiple physical CPUs? - Does it support low power states, so that battery life is decent? - Does it allow a privileged guest to create and destroy unprivileged guests? - Does it support PCI passthrough? If not, does it have drivers for every single piece of hardware on the aforementioned desktop-class machine — including the GPU? *** Those can be answered better by seL4 experts, don't dare to give you a wrong answers, anyway, you should understand there are steps in the process of migrating to seL4, do not expect a magical path with everything working like a charm from minute zero. But there's lot of amazing talented people like you that, if working together, can help boosting the process (it will happen after or before). I bet my business and my career on it.

- Does it protect against speculative execution attacks _without_ requiring static partitioning of _any_ resource? Even with a work-conserving scheduler? *** Those can be answered better by seL4 experts, don't dare to give you a wrong answers, but I think the question is tricky in the way you made it...

- Does it support reassigning memory from VMs that have lots of spare memory to VMs that are short on memory? *** Those can be answered better by seL4 experts, don't dare to give you a wrong answers, anyway, I guess you are referring to the infamous 🙏 (don't want to offend) "memory balancer" of QubesOS that locks you out and forbids you to start news guests when no more memory is available (even if lot of free memory is available on all guests). If this kind of "reassigning" memory is what you need, I bet this is already viable with seL4, anyway better response will be given by seL4 experts

El lun, 13 nov 2023 a las 2:42, Demi Marie Obenour (<demiobenour@gmail.com>) escribió:

...
On 11/12/23 03:57, Hugo V.C. wrote:

...
"However, ensuring that the CPU time used by one domain is not observable by another domain is not possible, and I do not believe this will ever change."

I usually see infosec people talking about CPU time and cache to cover "modern" hardware based attacks, which is a good starting point, but just a starting point. Share a beer with anyone with know-how on physics and will give you half dozen ways to attack a workload from another workload in a system that is sharing resources. And I'm not talking about covert channels, which are the last unicorns to protect, I talk about direct info leaks based on several measurable environmental variables of the medium were those workload are being executed. Even with air gaped systems... so sharing hardware you can figure it out...

I guess I should release some paper about Marvel CPUs (ARM) and how to play with those naive hardware partitioning concepts we all are blindly trusting.

One thing I absolutely agree with Gernot s to simplyfy hardware as much as possible and let the (formally verified) software do it's job when it comes to Time Protection. Even if far from perfect, this is an affordable and realistic approach. For any other hardware-based solution where the words "share/sharing" are wrote down somewhere, I would not even read the specs.

- Are you sure it is affordable? - Can that formally verified software run on desktop-class hardware that I can buy, either now or in the foreseeable future? - Does it have a VMM that can run multi-core Linux guests, and which can schedule those vCPUs across multiple physical CPUs? - Does it support low power states, so that battery life is decent? - Does it allow a privileged guest to create and destroy unprivileged guests? - Does it support PCI passthrough? If not, does it have drivers for every single piece of hardware on the aforementioned desktop-class machine — including the GPU? - Does it protect against speculative execution attacks _without_ requiring static partitioning of _any_ resource? Even with a work-conserving scheduler? - Does it support reassigning memory from VMs that have lots of spare memory to VMs that are short on memory?

If the answer to any of these questions is “no”, then it is not realistic in my world. -- Sincerely, Demi Marie Obenour (she/her/hers)

Demi Marie Obenour

7:39 p.m.

On 11/13/23 02:50, Hugo V.C. wrote:

...

- Are you sure it is affordable? *** Yes. Creating laptops/servers with independent computing platforms is like putting toghether different laptps/servers inside the same box. Few things in life are so simple.... and simple usually means that it will be affordable or will tend to be affordable (same happened with multi core computing and this is by far a more complex manufacturing problem)

- Can that formally verified software run on desktop-class hardware that I can buy, either now or in the foreseeable future? *** Yes. seL4 is an example. Here the debate is to define what amount of software you want to be verified. To me, an hypervisor/OS like seL4 is more than enough and then, on top of this you need to build other layers of non-verified software. You will always have unverified software on a Desktop, what is important is to have the verified one on critical parts of the solution. What do you need to believe seL4 can run on a laptop? Do you need to see it booting up and running a VM with a standard OS inside?

Yes, and with the same speculation protections Xen provides.

...

(remember that this is basically what QubesOS, which I use right now, is doing all the time...).

I’m one of the developers of Qubes OS, though I am speaking in a personal capacity here.

...

The difference is QubesOS do not rely on a verified hypervisor while the hypothetical system I'm talking about it will have seL4 booting, which is a verified piece of code.

I would love that, but seL4 currently is missing at least five critical security features on x86: 1. Interrupt remapping in the IOMMU. 2. Protection against speculative execution attacks the moment the embargo breaks. 3. The ability to either choose the appropriate mitigations itself, or to let userspace choose them. 4. Properly implemented speculative execution mitigations. 5. Microcode update support. Without all five of these, seL4 is _less_ secure than Xen, which defeats the entire purpose of using it. As per the discussion in https://github.com/seL4/seL4/issues/1108 this will not be fixed any time soon. That’s why I asked about desktop-class hardware I can actually buy: the only non-x86 desktop-class hardware I know of are Macs with Apple silicon and POWER9 machines from Raptor, and seL4 supports neither. Apple silicon has lots of nonstandard IOMMUs that seL4 would need to support, and POWER is not supported at all.

...

Having said that, you can et we will have more and more pieces of verified (or semi-formally verified) software that we will put together like a puzzle to build a more reliable solution. An example is TCP/IP stack, which is a key piece of software I will always encourage people to verify as it will benefit all the Internet, not only Desktops.... but yes, the response is yes, in a foreseeable future we will have such verified software running on desktop-class hardware and the current debate will be part of the Internet History.

That would be amazing. (Fun fact: you probably have some formally verified cryptographic code installed on your system _right now_, since NSS uses fiat-crypto for some of its cryptographic primitives last I checked.)

...

- Does it have a VMM that can run multi-core Linux guests, and which can schedule those vCPUs across multiple physical CPUs? - Does it support low power states, so that battery life is decent? - Does it allow a privileged guest to create and destroy unprivileged guests? - Does it support PCI passthrough? If not, does it have drivers for every single piece of hardware on the aforementioned desktop-class machine — including the GPU? *** Those can be answered better by seL4 experts, don't dare to give you a wrong answers, anyway, you should understand there are steps in the process of migrating to seL4, do not expect a magical path with everything working like a charm from minute zero. But there's lot of amazing talented people like you that, if working together, can help boosting the process (it will happen after or before). I bet my business and my career on it.

My understanding is that Qubes OS would be happy to _use_ an seL4-based VMM, provided that it met Qubes OS’s requirements and offered a sufficiently compelling advantage over Xen. But Qubes OS simply doesn’t have the resources to _implement_ one. Therefore, the time to talk about integrating seL4 with Qubes OS would be _after_ the features I mentioned are implemented in either the kernel or the supporting userland. Xen provides all of them out of the box, with the sole exception of S0ix ("modern standby"). To be clear: I don’t fault seL4 for focusing on embedded right now. The reward/effort ratio is vastly more favorable there. Servers are significantly harder and I suspect desktops and laptops are the worst-case scenario, because _everything_ is dynamic and subject to the whim of a human who can and will break any simplifying assumptions one has made. And this is even before one starts talking about GPUs and other accelerators, which are a huge attack surface and yet can be the difference between a system being usable and unusable for the tasks people expect of it. There is a reason Qubes OS is going to be supporting GPU acceleration at some point. That’s why I am so pessimistic about time protection on desktop: all of the stuff I have read about it assumes at least some static partitioning, and in the desktop world, static _anything_ simply doesn’t exist.

...

- Does it protect against speculative execution attacks _without_ requiring static partitioning of _any_ resource? Even with a work-conserving scheduler? *** Those can be answered better by seL4 experts, don't dare to give you a wrong answers, but I think the question is tricky in the way you made it...

- Does it support reassigning memory from VMs that have lots of spare memory to VMs that are short on memory? *** Those can be answered better by seL4 experts, don't dare to give you a wrong answers, anyway, I guess you are referring to the infamous 🙏 (don't want to offend) "memory balancer" of QubesOS that locks you out and forbids you to start news guests when no more memory is available (even if lot of free memory is available on all guests).

No offense taken :). The memory balancer certainly leaves lots to be desired.

...

If this kind of "reassigning" memory is what you need, I bet this is already viable with seL4, anyway better response will be given by seL4 experts

El lun, 13 nov 2023 a las 2:42, Demi Marie Obenour (<demiobenour@gmail.com>) escribió:

...
On 11/12/23 03:57, Hugo V.C. wrote:

...
"However, ensuring that the CPU time used by one domain is not observable by another domain is not possible, and I do not believe this will ever change."

I usually see infosec people talking about CPU time and cache to cover "modern" hardware based attacks, which is a good starting point, but just a starting point. Share a beer with anyone with know-how on physics and will give you half dozen ways to attack a workload from another workload in a system that is sharing resources. And I'm not talking about covert channels, which are the last unicorns to protect, I talk about direct info leaks based on several measurable environmental variables of the medium were those workload are being executed. Even with air gaped systems... so sharing hardware you can figure it out...

I guess I should release some paper about Marvel CPUs (ARM) and how to play with those naive hardware partitioning concepts we all are blindly trusting.

One thing I absolutely agree with Gernot s to simplyfy hardware as much as possible and let the (formally verified) software do it's job when it comes to Time Protection. Even if far from perfect, this is an affordable and realistic approach. For any other hardware-based solution where the words "share/sharing" are wrote down somewhere, I would not even read the specs.

- Are you sure it is affordable? - Can that formally verified software run on desktop-class hardware that I can buy, either now or in the foreseeable future? - Does it have a VMM that can run multi-core Linux guests, and which can schedule those vCPUs across multiple physical CPUs? - Does it support low power states, so that battery life is decent? - Does it allow a privileged guest to create and destroy unprivileged guests? - Does it support PCI passthrough? If not, does it have drivers for every single piece of hardware on the aforementioned desktop-class machine — including the GPU? - Does it protect against speculative execution attacks _without_ requiring static partitioning of _any_ resource? Even with a work-conserving scheduler? - Does it support reassigning memory from VMs that have lots of spare memory to VMs that are short on memory?

If the answer to any of these questions is “no”, then it is not realistic in my world.-- Sincerely, Demi Marie Obenour (she/her/hers)

Hugo V.C.

10:08 p.m.

El lun, 13 nov 2023 a las 9:39, Demi Marie Obenour (<demiobenour@gmail.com>) escribió:

...

On 11/13/23 02:50, Hugo V.C. wrote:

...
- Are you sure it is affordable? *** Yes. Creating laptops/servers with independent computing platforms is like putting toghether different laptps/servers inside the same box. Few things in life are so simple.... and simple usually means that it will be affordable or will tend to be affordable (same happened with multi core computing and this is by far a more complex manufacturing problem)

- Can that formally verified software run on desktop-class hardware that I can buy, either now or in the foreseeable future? *** Yes. seL4 is an example. Here the debate is to define what amount of software you want to be verified. To me, an hypervisor/OS like seL4 is more than enough and then, on top of this you need to build other layers of non-verified software. You will always have unverified software on a Desktop, what is important is to have the verified one on critical parts of the solution. What do you need to believe seL4 can run on a laptop? Do you need to see it booting up and running a VM with a standard OS inside?

Yes, and with the same speculation protections Xen provides.

...
(remember that this is basically what QubesOS, which I use right now, is doing all the time...).

I’m one of the developers of Qubes OS, though I am speaking in a personal capacity here.

**** I know since long ago...! And that's why I think your contribution here is so valuable, you are right now a bridge between the academic (seL4) and the real World of Secure Desktop Computing (QubesOS). I've been working with hardened solutions (Grsecurity, Pitbull, etc) since 20 years ago trying to solve the "Desktop problem" and never get a viable solution. Then the virtualization started being more affordable/usable on Desktop systems... and then I knew about QubesOS... and here I saw a big opportunity to join efforts of both solutions. I firmly believe in that. And you are a key piece here, I have not seen anyone else from QubesOS being interested on seL4, but you. So yes, I know who you are and it is very nice you are here debating stuff about seL4 desktop implementation, as this is a path to follow IMHO :-)

...

...
The difference is QubesOS do not rely on a verified hypervisor while the hypothetical system I'm talking about it will have seL4 booting, which is a verified piece of code.

I would love that, but seL4 currently is missing at least five critical security features on x86:

1. Interrupt remapping in the IOMMU. 2. Protection against speculative execution attacks the moment the embargo breaks. 3. The ability to either choose the appropriate mitigations itself, or to let userspace choose them. 4. Properly implemented speculative execution mitigations. 5. Microcode update support.

Without all five of these, seL4 is _less_ secure than Xen, which defeats the entire purpose of using it. As per the discussion in https://github.com/seL4/seL4/issues/1108 this will not be fixed any time soon.

*** Wow... this would open a entire new debate out of the scope of this thread which I think it is no worth as I don't want one or the other to be better, but joining efforts to get a superb product that will make the security industry change forever.

...

That’s why I asked about desktop-class hardware I can actually buy: the only non-x86 desktop-class hardware I know of are Macs with Apple silicon and POWER9 machines from Raptor, and seL4 supports neither. Apple silicon has lots of nonstandard IOMMUs that seL4 would need to support, and POWER is not supported at all.

...
Having said that, you can et we will have more and more pieces of verified (or semi-formally verified) software that we will put together like a puzzle to build a more reliable solution. An example is TCP/IP stack, which is a key piece of software I will always encourage people to verify as it will benefit all the Internet, not only Desktops.... but yes, the response is yes, in a foreseeable future we will have such verified software running on desktop-class hardware and the current debate will be part of the Internet History.

That would be amazing. (Fun fact: you probably have some formally verified cryptographic code installed on your system _right now_, since NSS uses fiat-crypto for some of its cryptographic primitives last I checked.)

...
- Does it have a VMM that can run multi-core Linux guests, and which can schedule those vCPUs across multiple physical CPUs? - Does it support low power states, so that battery life is decent? - Does it allow a privileged guest to create and destroy unprivileged guests? - Does it support PCI passthrough? If not, does it have drivers for every single piece of hardware on the aforementioned desktop-class machine — including the GPU? *** Those can be answered better by seL4 experts, don't dare to give you a wrong answers, anyway, you should understand there are steps in the process of migrating to seL4, do not expect a magical path with everything working like a charm from minute zero. But there's lot of amazing talented people like you that, if working together, can help boosting the process (it will happen after or before). I bet my business and my career on it.

My understanding is that Qubes OS would be happy to _use_ an seL4-based VMM, provided that it met Qubes OS’s requirements and offered a sufficiently compelling advantage over Xen. But Qubes OS simply doesn’t have the resources to _implement_ one. Therefore, the time to talk about integrating seL4 with Qubes OS would be _after_ the features I mentioned are implemented in either the kernel or the supporting userland. Xen provides all of them out of the box, with the sole exception of S0ix ("modern standby").

To be clear: I don’t fault seL4 for focusing on embedded right now. The reward/effort ratio is vastly more favorable there. Servers are significantly harder and I suspect desktops and laptops are the worst-case scenario, because _everything_ is dynamic and subject to the whim of a human who can and will break any simplifying assumptions one has made. And this is even before one starts talking about GPUs and other accelerators, which are a huge attack surface and yet can be the difference between a system being usable and unusable for the tasks people expect of it. There is a reason Qubes OS is going to be supporting GPU acceleration at some point.

That’s why I am so pessimistic about time protection on desktop: all of the stuff I have read about it assumes at least some static partitioning, and in the desktop world, static _anything_ simply doesn’t exist.

...
- Does it protect against speculative execution attacks _without_ requiring static partitioning of _any_ resource? Even with a work-conserving scheduler? *** Those can be answered better by seL4 experts, don't dare to give you a wrong answers, but I think the question is tricky in the way you made it...

- Does it support reassigning memory from VMs that have lots of spare memory to VMs that are short on memory? *** Those can be answered better by seL4 experts, don't dare to give you a wrong answers, anyway, I guess you are referring to the infamous 🙏 (don't want to offend) "memory balancer" of QubesOS that locks you out and forbids you to start news guests when no more memory is available (even if lot of free memory is available on all guests).

No offense taken :). The memory balancer certainly leaves lots to be desired.

...

...
If this kind of "reassigning" memory is what you need, I bet this is already viable with seL4, anyway better response will be given by seL4 experts

El lun, 13 nov 2023 a las 2:42, Demi Marie Obenour (< demiobenour@gmail.com>) escribió:

...
On 11/12/23 03:57, Hugo V.C. wrote:

...
"However, ensuring that the CPU time used by one domain is not observable by another domain is not possible, and I do not believe this will ever change."

I usually see infosec people talking about CPU time and cache to cover "modern" hardware based attacks, which is a good starting point, but just a starting point. Share a beer with anyone with know-how on physics and will give you half dozen ways to attack a workload from another workload in a system that is sharing resources. And I'm not talking about covert channels, which are the last unicorns to protect, I talk about direct info leaks based on several measurable environmental variables of the medium were those workload are being executed. Even with air gaped systems... so sharing hardware you can figure it out...

I guess I should release some paper about Marvel CPUs (ARM) and how to play with those naive hardware partitioning concepts we all are blindly trusting.

One thing I absolutely agree with Gernot s to simplyfy hardware as much as possible and let the (formally verified) software do it's job when it comes to Time Protection. Even if far from perfect, this is an affordable and realistic approach. For any other hardware-based solution where the words "share/sharing" are wrote down somewhere, I would not even read the specs.

- Are you sure it is affordable? - Can that formally verified software run on desktop-class hardware that I can buy, either now or in the foreseeable future? - Does it have a VMM that can run multi-core Linux guests, and which can schedule those vCPUs across multiple physical CPUs? - Does it support low power states, so that battery life is decent? - Does it allow a privileged guest to create and destroy unprivileged guests? - Does it support PCI passthrough? If not, does it have drivers for every single piece of hardware on the aforementioned desktop-class machine — including the GPU? - Does it protect against speculative execution attacks _without_ requiring static partitioning of _any_ resource? Even with a work-conserving scheduler? - Does it support reassigning memory from VMs that have lots of spare memory to VMs that are short on memory?

If the answer to any of these questions is “no”, then it is not realistic in my world.-- Sincerely, Demi Marie Obenour (she/her/hers)

Peter Chubb

14 Nov 14 Nov

7:17 a.m.

...

...
...
...
...
"Demi" == Demi Marie Obenour <demiobenour@gmail.com> writes:

Demi> I would love that, but seL4 currently is missing at least five Demi> critical security features on x86: Demi> 1. Interrupt remapping in the IOMMU. Can you please take a look at https://github.com/seL4/seL4/pull/1098 which attempts to add this? It needs more eyes on it. -- Dr Peter Chubb https://trustworthy.systems/ Trustworthy Systems Group CSE, UNSW Core hours: Mon 8am-3pm; Wed: 8am-5pm; Fri 8am-12pm.

Branden Robinson

8:13 a.m.

I had one reply each to Demi and Hugo, but I think they got stuck in a moderation queue. Possibly my GPG signature angers the spam filter. Can someone look into it? Regards, Branden On Mon, Nov 13, 2023 at 2:19 PM Peter Chubb via Devel <devel@sel4.systems> wrote:

...

...
...
...
...
...
"Demi" == Demi Marie Obenour <demiobenour@gmail.com> writes:

Demi> I would love that, but seL4 currently is missing at least five Demi> critical security features on x86:

Demi> 1. Interrupt remapping in the IOMMU.

Can you please take a look at https://github.com/seL4/seL4/pull/1098 which attempts to add this? It needs more eyes on it.

-- Dr Peter Chubb https://trustworthy.systems/ Trustworthy Systems Group CSE, UNSW Core hours: Mon 8am-3pm; Wed: 8am-5pm; Fri 8am-12pm. _______________________________________________ Devel mailing list -- devel@sel4.systems To unsubscribe send an email to devel-leave@sel4.systems

640

Age (days ago)

649

Last active (days ago)

List overview

Download

18 comments

6 participants

participants (6)

Branden Robinson
Demi Marie Obenour
G. Branden Robinson
Gernot Heiser
Hugo V.C.
Peter Chubb

Conclusions regarding speculation

tags

participants (6)