Some small details in kernel code and docs
Hi All, Here are a few minor details/questions about the kernel code & documentation. (I've confirmed that these are present in the new seL4-12.1.0 release.) 1) kernel/include/arch/x86/arch/32/mode/machine.h has the following definition (some code elided): static inline word_t FORCE_INLINE x86_read_fs_base_impl(void) { word_t base = 0; base &= ...; base &= ...; base &= ...; return base; } I'm pretty sure that the three &= operators here should be |= or else the result is always going to be zero. The same issue occurs in x86_read_gs_base_impl in the same file. Some follow-up questions: Can anyone say if this would have been detected if the PC99 (32-bit) platform had been verified? More generally, where can I find information about how ia32 segment registers are handled by the kernel? Why aren't they just stored/restored like other user-level registers on kernel entry/exit? Is there something that prevents user-level changes in the ES register, for example, from being observed in other threads? 2) A CNodeRotate will generate a FailedLookup error if the pivot slot either doesn't exist or contains a null capability. But, in the first case the "lookup failed for a source capability" field in the error message is 1, while in the second it is 0. Hardly a serious problem, but this does seem inconsistent. Is this intentional, or a mistake in the specification that has been carried over to the implementation? 3) Section "3.4 Lookup Failure Description" of the manual describes how capability lookup errors are represented in an IPC buffer. But it seems the layout information is incorrect. For example, Section 3.4.2 says that "Offset + 0" will contain seL4_MissingCapability while the number of bits left will be at "Offset + seL4_CapFault_BitsLeft", which is probably supposed to be equivalent to "Offset + 1". But, based on kernel/libsel4/include/sel4/shared_types.h, it would seem that this gives "Offset + 4". Sections 3.4.3 and 3.4.4 have similar problems. 4) On https://docs.sel4.systems/GettingStarted, in the section labeled "Fetching, Configuring and Building seL4test", there are missing line breaks between the commands in the text boxes for Steps 1 and 2, and there are some tags in the text box for Step 4 that probably should not be displayed. Best wishes, and thanks for any feedback! Mark
Hi Mark, Just replying to 1 and 4 for now: On Mon, Jun 21, 2021 at 3:32 PM Mark Jones <mpj@pdx.edu> wrote:
Hi All,
Here are a few minor details/questions about the kernel code & documentation. (I've confirmed that these are present in the new seL4-12.1.0 release.)
1) kernel/include/arch/x86/arch/32/mode/machine.h has the following definition (some code elided):
static inline word_t FORCE_INLINE x86_read_fs_base_impl(void) { word_t base = 0; base &= ...; base &= ...; base &= ...; return base; }
I'm pretty sure that the three &= operators here should be |= or else the result is always going to be zero. The same issue occurs in x86_read_gs_base_impl in the same file.
Some follow-up questions: Can anyone say if this would have been detected if the PC99 (32-bit) platform had been verified? More generally, where can I find information about how ia32 segment registers are handled by the kernel? Why aren't they just stored/restored like other user-level registers on kernel entry/exit? Is there something that prevents user-level changes in the ES register, for example, from being observed in other threads?
The x86_read_fs_base_impl function does appear to have the defect that you observe. It also doesn't seem that the code can ever be called by the kernel. The kernel supports 2 ways for managing the FS/GS base address registers on ia32, CONFIG_FSGSBASE_GDT where they are written into the GDT table directly and CONFIG_FSGSBASE_MSR where msg registers are used. For both of these strategies, the kernel considers reading to be too expensive and so caches every write in a global value. Then when a read is performed it only looks up the cached value and doesn't call x86_read_fs_base_impl or x86_read_gs_base_impl. The issue that you identify was clearly missed during code review and we don't test that every line of source code is executed or even ends up in the final binary. As for whether verification would have caught it: if it was part of a verification configuration, and didn't have `/** DONT_TRANSLATE */` over the function, then it would need to be shown that it's implementation correctly refined whatever was specified in the higher spec. (Maybe someone from the proof engineering team could elaborate more). If it had DONT_TRANSLATE then it would instead need to be axiomatized and someone would have looked closely enough to hopefully see an issue. The segment selectors are cleared on entry to the kernel in src/arch/x86/32/traps.S with the RESET_SELECTORS macro. So the ES selector will be reset during any context switch between threads.
2) A CNodeRotate will generate a FailedLookup error if the pivot slot either doesn't exist or contains a null capability. But, in the first case the "lookup failed for a source capability" field in the error message is 1, while in the second it is 0. Hardly a serious problem, but this does seem inconsistent. Is this intentional, or a mistake in the specification that has been carried over to the implementation?
3) Section "3.4 Lookup Failure Description" of the manual describes how capability lookup errors are represented in an IPC buffer. But it seems the layout information is incorrect. For example, Section 3.4.2 says that "Offset + 0" will contain seL4_MissingCapability while the number of bits left will be at "Offset + seL4_CapFault_BitsLeft", which is probably supposed to be equivalent to "Offset + 1". But, based on kernel/libsel4/include/sel4/shared_types.h, it would seem that this gives "Offset + 4". Sections 3.4.3 and 3.4.4 have similar problems.
4) On https://docs.sel4.systems/GettingStarted, in the section labeled "Fetching, Configuring and Building seL4test", there are missing line breaks between the commands in the text boxes for Steps 1 and 2, and there are some tags in the text box for Step 4 that probably should not be displayed.
Thanks for pointing this out. The source files have newlines in the code fragments, but they seem to be stripped by the markdown processor that is generating the site. For now the raw markdown page on GitHub may be easier to read: https://github.com/seL4/docs/blob/master/GettingStarted.md How did you find the issue with x86_read_fs_base_impl? Manual inspection or some sort of static analysis? Kind regards, Kent.
Best wishes, and thanks for any feedback! Mark
_______________________________________________ Devel mailing list -- devel@sel4.systems To unsubscribe send an email to devel-leave@sel4.systems
Hi Mark,
On Mon, Jun 21, 2021 at 3:32 PM Mark Jones <mpj@pdx.edu> wrote:
Here are a few minor details/questions about the kernel code & documentation. (I've confirmed that these are present in the new seL4-12.1.0 release.)
1) kernel/include/arch/x86/arch/32/mode/machine.h has the following definition (some code elided):
static inline word_t FORCE_INLINE x86_read_fs_base_impl(void) { word_t base = 0; base &= ...; base &= ...; base &= ...; return base; }
I'm pretty sure that the three &= operators here should be |= or else the result is always going to be zero. The same issue occurs in x86_read_gs_base_impl in the same file.
Some follow-up questions: Can anyone say if this would have been detected if the PC99 (32-bit) platform had been verified?
This would not have been caught by the proof itself, because this particular function is hidden behind the machine interface, i.e. the proofs don't see it (if they did see it, it would have come up that the result is always zero). It should be found during the validation part where we check that the machine interface makes sense. That part is manual, though, and has the usual problems of eyes sometimes seeing what they want to see instead of what is there. Nice catch! Cheers, Gerwin
Thanks Kent and Gerwin! To Kent's question about how I found the issue with x86_read_fs_base_impl: it wasn't the result of any static analysis, just something that I stumbled on while browsing to figure out how segment registers are handled ... and Kent's description answered that nicely. Thanks specifically for pointing out RESET_SELECTORS; I missed that previously because it uses an unadorned "es" whereas I had searched for "%es" to reduce false positives; lesson learned ... On the verification side, I've long been interested in the question of how we protect against covert channels resulting from hardware state components that are undocumented, or that don't work as intended, or that are not captured in a formal model. [There was a news item in the last few weeks about a (seemingly harmless) flaw of this kind in the Apple M1 processor, which is probably why this was on my mind again.] Best wishes, and thanks again, Mark
On 24 Jun 2021, at 04:23, Mark Jones <mpj@pdx.edu<mailto:mpj@pdx.edu>> wrote: On the verification side, I've long been interested in the question of how we protect against covert channels resulting from hardware state components that are undocumented, or that don't work as intended, or that are not captured in a formal model. [There was a news item in the last few weeks about a (seemingly harmless) flaw of this kind in the Apple M1 processor, which is probably why this was on my mind again.] You’ll need a full reset of microarchitectural state. I’m working on a paper with ETH colleagues (follow-up from https://ts.data61.csiro.au/publications/csiro_full_text/Wistoff_SGBH_21.pdf) which shows that a comprehensive reset is feasible and cheap. Gernot
Hi Mark,
On 24 Jun 2021, at 04:23, Mark Jones <mpj@pdx.edu> wrote:
On the verification side, I've long been interested in the question of how we protect against covert channels resulting from hardware state components that are undocumented, or that don't work as intended, or that are not captured in a formal model. [There was a news item in the last few weeks about a (seemingly harmless) flaw of this kind in the Apple M1 processor, which is probably why this was on my mind again.]
That is indeed an interesting question and it is hard to achieve. One way is doing vulnerability research to figure out where things go wrong on the hardware side and what that space even really looks like. Yuval has found many interesting side channels that way, e.g. Spectre and Meltdown are among those. Essentially this is validating the model and investigating where it breaks. The pure formal verification answer is then always a more detailed model that does have that hidden state. This works for some things (e.g. we found a missing register in one of our models that way back in the day), but it's not really feasible if there is no documentation from the hardware side or the hardware misbehaves. The other way is figuring out what we can do without full knowledge of the hardware. We have started work on that and have a few ideas that we're formalising. The end result of that if everything works out should be closing large classes of timing side channels *without* knowing the exact hardware details. The idea is that we need enough documentation to determine if an instruction *might* change some unspecified deeper hardware state (which might affect timing), and we need enough instructions/mechanisms to either partition that hardware state (as with cache colouring) or reset that hardware state (as with cache flushing). If you have both, and have a precise enough measure of time, you can build OS mechanisms that close the channels that result from that hardware state. (More in that link that Gernot sent) This would solve a good chunk of the problem -- there are more channels, such as power etc, but one thing at a time ;-) What that will still not solve is completely undocumented registers and hardware state, which I'm not sure you can do much about if you're working purely in software. It's not like these don't exist, there is a reason complex chips like modern x86 implementations have a hard time for security evaluations. You can systematically search for them as with for instance https://github.com/xoreaxeaxeax/sandsifter, but you can't know that it is complete. It's a fact of live that you must guard against some hardware-brokenness, but you do need some grounding in reality to achieve anything. Cheers, Gerwin
What’s really required is a HW-SW contract that guarantees that any hardware features can be spatially or temporally partitioned (https://ts.data61.csiro.au/publications/csiro_full_text/Ge_YH_18.pdf). Then it becomes a question of verifying the HW implementation against that. For stateful resources I actually don’t think that this would be all that hard: you’d have to show on an RTL-level description of the circuit that every register is either part of architected state or reset by the operation that removes all non-architected state. Stateless shared resources are trickier (at least these cannot be exploited as side channels, but they can be exploited for intentional leakage by a trojan). Gernot
On 24 Jun 2021, at 09:02, Gerwin Klein <kleing@unsw.edu.au> wrote:
Hi Mark,
On 24 Jun 2021, at 04:23, Mark Jones <mpj@pdx.edu> wrote:
On the verification side, I've long been interested in the question of how we protect against covert channels resulting from hardware state components that are undocumented, or that don't work as intended, or that are not captured in a formal model. [There was a news item in the last few weeks about a (seemingly harmless) flaw of this kind in the Apple M1 processor, which is probably why this was on my mind again.]
That is indeed an interesting question and it is hard to achieve.
One way is doing vulnerability research to figure out where things go wrong on the hardware side and what that space even really looks like. Yuval has found many interesting side channels that way, e.g. Spectre and Meltdown are among those. Essentially this is validating the model and investigating where it breaks.
The pure formal verification answer is then always a more detailed model that does have that hidden state. This works for some things (e.g. we found a missing register in one of our models that way back in the day), but it's not really feasible if there is no documentation from the hardware side or the hardware misbehaves.
The other way is figuring out what we can do without full knowledge of the hardware. We have started work on that and have a few ideas that we're formalising. The end result of that if everything works out should be closing large classes of timing side channels *without* knowing the exact hardware details. The idea is that we need enough documentation to determine if an instruction *might* change some unspecified deeper hardware state (which might affect timing), and we need enough instructions/mechanisms to either partition that hardware state (as with cache colouring) or reset that hardware state (as with cache flushing). If you have both, and have a precise enough measure of time, you can build OS mechanisms that close the channels that result from that hardware state. (More in that link that Gernot sent)
This would solve a good chunk of the problem -- there are more channels, such as power etc, but one thing at a time ;-)
What that will still not solve is completely undocumented registers and hardware state, which I'm not sure you can do much about if you're working purely in software. It's not like these don't exist, there is a reason complex chips like modern x86 implementations have a hard time for security evaluations. You can systematically search for them as with for instance https://github.com/xoreaxeaxeax/sandsifter, but you can't know that it is complete.
It's a fact of live that you must guard against some hardware-brokenness, but you do need some grounding in reality to achieve anything.
Cheers, Gerwin
_______________________________________________ Devel mailing list -- devel@sel4.systems To unsubscribe send an email to devel-leave@sel4.systems
"Stateless shared resources are trickier (at least these cannot be exploited as side channels, but they can be exploited for intentional leakage by a trojan)." Of course. I played with that long ago... If you can observe and measure the system that contains the information and you can interact with it then you can leak information. Here we have two different attack scenarios: "remote" and "local" attacks, which are not always isolated one each other clear as some times you may found a way to start an attack vector where you are remote but the measurement is local, in example changes in the environment (radiation). "What’s really required is a HW-SW contract that guarantees that any hardware features can be spatially or temporally partitioned ( https://ts.data61.csiro.au/publications/csiro_full_text/Ge_YH_18.pdf). Then it becomes a question of verifying the HW implementation against that."" Yes, that is a very first step. And this "contract" means more and more makes sense to design both at the same time and not following complete different paths as we have been doing till nowadays. "The other way is figuring out what we can do without full knowledge of the hardware." That's a very interesting question :-). In fact, IMHO, it is also a good starting point for designing robust solutions: if your starting point is a hostile environment (unknown full hardware specs) then chances of having a more resilient solution are big. I always use that for designing my solutions: I build different layers, and assume compromise of all of them but the last one (where the protected resource is) then try to calculate what are the chances of a chained exploitation in real World. Not having full knowledge of the hardware is more than a challenge and even if I don't like to use the expression "impossible mission" I really can't figure out how to protect against hostile hardware. Just think that hostile hardware equals to have the information system roots (the physical layer) against you. I really can't figure out how anyone can protect from this, in example, from a trojaned hardware that has hidden built in mechanisms to measure environment variables. Fascinating topic. El jue, 24 jun 2021 a las 1:57, Heiser, Gernot (Data61, Eveleigh) (< Gernot.Heiser@data61.csiro.au>) escribió:
What’s really required is a HW-SW contract that guarantees that any hardware features can be spatially or temporally partitioned ( https://ts.data61.csiro.au/publications/csiro_full_text/Ge_YH_18.pdf). Then it becomes a question of verifying the HW implementation against that.
For stateful resources I actually don’t think that this would be all that hard: you’d have to show on an RTL-level description of the circuit that every register is either part of architected state or reset by the operation that removes all non-architected state.
Stateless shared resources are trickier (at least these cannot be exploited as side channels, but they can be exploited for intentional leakage by a trojan).
Gernot
On 24 Jun 2021, at 09:02, Gerwin Klein <kleing@unsw.edu.au> wrote:
Hi Mark,
On 24 Jun 2021, at 04:23, Mark Jones <mpj@pdx.edu> wrote:
On the verification side, I've long been interested in the question of how we protect against covert channels resulting from hardware state components that are undocumented, or that don't work as intended, or that are not captured in a formal model. [There was a news item in the last few weeks about a (seemingly harmless) flaw of this kind in the Apple M1 processor, which is probably why this was on my mind again.]
That is indeed an interesting question and it is hard to achieve.
One way is doing vulnerability research to figure out where things go wrong on the hardware side and what that space even really looks like. Yuval has found many interesting side channels that way, e.g. Spectre and Meltdown are among those. Essentially this is validating the model and investigating where it breaks.
The pure formal verification answer is then always a more detailed model that does have that hidden state. This works for some things (e.g. we found a missing register in one of our models that way back in the day), but it's not really feasible if there is no documentation from the hardware side or the hardware misbehaves.
The other way is figuring out what we can do without full knowledge of the hardware. We have started work on that and have a few ideas that we're formalising. The end result of that if everything works out should be closing large classes of timing side channels *without* knowing the exact hardware details. The idea is that we need enough documentation to determine if an instruction *might* change some unspecified deeper hardware state (which might affect timing), and we need enough instructions/mechanisms to either partition that hardware state (as with cache colouring) or reset that hardware state (as with cache flushing). If you have both, and have a precise enough measure of time, you can build OS mechanisms that close the channels that result from that hardware state. (More in that link that Gernot sent)
This would solve a good chunk of the problem -- there are more channels, such as power etc, but one thing at a time ;-)
What that will still not solve is completely undocumented registers and hardware state, which I'm not sure you can do much about if you're working purely in software. It's not like these don't exist, there is a reason complex chips like modern x86 implementations have a hard time for security evaluations. You can systematically search for them as with for instance https://github.com/xoreaxeaxeax/sandsifter, but you can't know that it is complete.
It's a fact of live that you must guard against some hardware-brokenness, but you do need some grounding in reality to achieve anything.
Cheers, Gerwin
_______________________________________________ Devel mailing list -- devel@sel4.systems To unsubscribe send an email to devel-leave@sel4.systems
_______________________________________________ Devel mailing list -- devel@sel4.systems To unsubscribe send an email to devel-leave@sel4.systems
participants (6)
-
Gernot Heiser
-
Gerwin Klein
-
Heiser, Gernot (Data61, Eveleigh)
-
Hugo V.C.
-
Kent Mcleod
-
Mark Jones