On Tue, 30 Aug 2016 05:44:52 +0000, Gernot.Heiser wrote:
On 30 Aug 2016, at 15:22 , Alex Elsayed
wrote: On Mon, 29 Aug 2016 23:01:45 +0000, Gernot.Heiser wrote:
[…]
The only way around this is high-assurance hardware, and this doesn’t come at an affordable cost.
Sure, and I don't disagree. It's part of why I'm so excited by the RISC-V work, especially MIT's progress on formal verification - that, at least, pushes the space problems can occur in down by one more level.
But to a certain extent, I think that there's value in reducing the amount of new API exposed. Furthermore, there's a large body of code exercising the "more conventional" bits in the wild, and a considerably smaller body of code exercising the newer bits. Bugs in the more conventional bits are thus more likely to break existing code - and thus be caught early.
It's not protection in the same sence as high-assurance hardware, no. But at least statistically, from looking at where severe hardware-based security vulnerabilities have cropped up, it seems to be better than the alternative of trusting the hardware virtualization to actually isolate.
The problem is that you don’t really know how the hardware is internally structured, and how modularised what you see as a separate feature really is.
Sure, and that would be an argument against it providing yes/no hard assurance. But I'm talking about "empirically, this has been less likely to be a problem."
In the example you gave, if there’s a hardware bug that gives you privilege escalation from non-root Ring 3 to root Ring 0, what makes you think that the same bug doesn’t let you go from root Ring 3 to root Ring 0, which seems like a smaller step (switching only one mode instead of two)?
Mainly? That this bug was pretty-well characterized, and that despite people prodding at it pretty hard, no one managed to reproduce it without virtualization being active.
Root Rings 3+0 is exactly what you’re using when using para-virtualisation. I claim that you cannot have more than zero confidence that hardware that allows the first type of escalation won’t allow the second type.
I suppose this is where we disagree - I know full well that I can't _show_ that the first kind of escalation does not allow the second, but when a severe bug like this is revealed, the ensuing scrutiny has high chances of revealing bugs that are "nearby" in terms of causation. And the vast majority of the virtualization-related bugs have not had such "nearby" discoveries in virtualization-free operation. If anything, that seems to imply that there _is_ some structural separation - otherwise, the odds of such "nearby" bugs being present (and found) would almost certainly be higher than demonstrated.