Potential vulnerabilities
I'm forwarding this question here (tried on Mattermost Trustworthy Systems group first) hoping someone can put some light on this? --- Hi, I'm having a look to the vulns (in areas of the kernel that have not been formally verified) patched in seL4 13.0.0. We have: 1) "NULL pointer dereference when injecting an IRQ for a non-associated VCPU on SMP configurations." 2) "On AArch64, when seL4 runs in EL1 the kernel would fault with a data abort in seL4_ARM_Page_Invalidate_Data and seL4_ARM_VSpace_Invalidate_Data when the user requested a dc ivac cache maintenance operation on a page that is not mapped writeable." Extremely simple question: running version < 13.0.0 on top of Qemu (in example like https://docs.sel4.systems/projects/sel4webserver/) would it be vulnerable to any of those? --- Best,
HI Hugo,
The CHANGES.md file lists the vulnerable versions of seL4 for each of these (https://github.com/seL4/seL4/blob/master/CHANGES.md)
- for VCPU/SMP: seL4 versions 12.0.0 and 12.1.0.
- for cache maintenance ops on AArch64: all versions before 13.0.0 from 5.0.0
Cheers,
Gerwin
On 5 Nov 2024, at 10:02, Hugo V.C.
HI Hugo, The CHANGES.md file lists the vulnerable versions of seL4 for each of
Hi Gerwin,
thanks for the fast response! Yes, this.I already know. But I'm interested
in this specific case:
https://docs.sel4.systems/projects/sel4webserver/
as I guess it may not vulnerable as this example (by default) does not run
SMP seL4. For "for cache maintenance ops" I'm not sure as this qemu
environment. Any hints?
On Tuesday, November 5, 2024, Gerwin Klein
- for VCPU/SMP: seL4 versions 12.0.0 and 12.1.0. - for cache maintenance ops on AArch64: all versions before 13.0.0 from 5.0.0 Cheers, Gerwin
On 5 Nov 2024, at 10:02, Hugo V.C.
wrote: I'm forwarding this question here (tried on Mattermost Trustworthy Systems group first) hoping someone can put some light on this? ---
Hi, I'm having a look to the vulns (in areas of the kernel that have not been formally verified) patched in seL4 13.0.0.
We have:
1) "NULL pointer dereference when injecting an IRQ for a non-associated VCPU on SMP configurations." 2) "On AArch64, when seL4 runs in EL1 the kernel would fault with a data abort in seL4_ARM_Page_Invalidate_Data and seL4_ARM_VSpace_Invalidate_Data when the user requested a dc ivac cache maintenance operation on a page that is not mapped writeable."
Extremely simple question: running version < 13.0.0 on top of Qemu (in example like https://docs.sel4.systems/projects/sel4webserver/) would it be vulnerable to any of those?
---
Best, _______________________________________________ Devel mailing list -- devel@sel4.systems To unsubscribe send an email to devel-leave@sel4.systems
Hello Hugo, The bugs you listed are only a problem for untrusted code. That is, it gives threads with vcpu control or vspace caps the ability to cause a system halt, effectively giving them more rights than they should have, hence it being a vulnerability. But those operations, if done in good faith, would be user space bugs. In the case of the webserver example, if you can trigger either of the two bugs in any way, with a current kernel it will avoid the kernel crashing, but it will still mean your user space will most likely be broken, resulting in the same end user experience of a non-working system. Because as far as I know, the webserver example has no fault detection, recovery or restart mechanism implemented. Greetings, Indan On 2024-11-05 10:34, Hugo V.C. wrote:
Hi Gerwin,
thanks for the fast response! Yes, this.I already know. But I'm interested in this specific case:
https://docs.sel4.systems/projects/sel4webserver/
as I guess it may not vulnerable as this example (by default) does not run SMP seL4. For "for cache maintenance ops" I'm not sure as this qemu environment. Any hints?
HI Hugo, The CHANGES.md file lists the vulnerable versions of seL4 for each of
On Tuesday, November 5, 2024, Gerwin Klein
wrote: these (https://github.com/seL4/seL4/blob/master/CHANGES.md) - for VCPU/SMP: seL4 versions 12.0.0 and 12.1.0. - for cache maintenance ops on AArch64: all versions before 13.0.0 from 5.0.0 Cheers, Gerwin
On 5 Nov 2024, at 10:02, Hugo V.C.
wrote: I'm forwarding this question here (tried on Mattermost Trustworthy Systems group first) hoping someone can put some light on this? ---
Hi, I'm having a look to the vulns (in areas of the kernel that have not been formally verified) patched in seL4 13.0.0.
We have:
1) "NULL pointer dereference when injecting an IRQ for a non-associated VCPU on SMP configurations." 2) "On AArch64, when seL4 runs in EL1 the kernel would fault with a data abort in seL4_ARM_Page_Invalidate_Data and seL4_ARM_VSpace_Invalidate_Data when the user requested a dc ivac cache maintenance operation on a page that is not mapped writeable."
Extremely simple question: running version < 13.0.0 on top of Qemu (in example like https://docs.sel4.systems/projects/sel4webserver/) would it be vulnerable to any of those?
---
Best,
Hi Indan,
thanks for the explanation. Anyway, still not clear to me 🙏.
Let me clarify my risk model.
1) I don't care abour user space (Linux/web server) crashes
2) Suppose I have a webserver example with a "vulnerable" seL4 kernel
version
I'm a bit concerned about "But those operations, if done in good faith".
So to be even more precise:
In a "webserver example" of year... 2022? (which will be running an old
seL4 version), as is, with no modifications from what was published at that
time:
Can or can not the kernel be crashed in a exploitable way? An seL4 system
halt is not a problem to me. An execution manipulation at kernel level it
would be. A halt is a halt. No problem. But seL4 is formally verified, so,
my concern is if an attacker can do more than halting the seL4.
This kind of questions are not just for fun guys, this is what customers
will ask you every single time a non verified code bug arises. And the
infosec community needs that kind of analysis to increase trust.
Thank you in advance 🙏
On Tuesday, November 5, 2024, Indan Zupancic
Hello Hugo,
The bugs you listed are only a problem for untrusted code. That is, it gives threads with vcpu control or vspace caps the ability to cause a system halt, effectively giving them more rights than they should have, hence it being a vulnerability. But those operations, if done in good faith, would be user space bugs.
In the case of the webserver example, if you can trigger either of the two bugs in any way, with a current kernel it will avoid the kernel crashing, but it will still mean your user space will most likely be broken, resulting in the same end user experience of a non-working system. Because as far as I know, the webserver example has no fault detection, recovery or restart mechanism implemented.
Greetings,
Indan
On 2024-11-05 10:34, Hugo V.C. wrote:
Hi Gerwin,
thanks for the fast response! Yes, this.I already know. But I'm
interested
in this specific case:
https://docs.sel4.systems/projects/sel4webserver/
as I guess it may not vulnerable as this example (by default) does not run SMP seL4. For "for cache maintenance ops" I'm not sure as this qemu environment. Any hints?
On Tuesday, November 5, 2024, Gerwin Klein
wrote: HI Hugo, The CHANGES.md file lists the vulnerable versions of seL4 for each of
these (https://github.com/seL4/seL4/blob/master/CHANGES.md)
- for VCPU/SMP: seL4 versions 12.0.0 and 12.1.0. - for cache maintenance ops on AArch64: all versions before 13.0.0 from
5.0.0
Cheers, Gerwin
On 5 Nov 2024, at 10:02, Hugo V.C.
wrote: I'm forwarding this question here (tried on Mattermost Trustworthy Systems
group first) hoping someone can put some light on this?
---
Hi, I'm having a look to the vulns (in areas of the kernel that have not been formally verified) patched in seL4 13.0.0.
We have:
1) "NULL pointer dereference when injecting an IRQ for a non-associated VCPU on SMP configurations." 2) "On AArch64, when seL4 runs in EL1 the kernel would fault with a data abort in seL4_ARM_Page_Invalidate_Data and seL4_ARM_VSpace_Invalidate_Data when the user requested a dc ivac cache maintenance operation on a page that is not mapped writeable."
Extremely simple question: running version < 13.0.0 on top of Qemu (in example like https://docs.sel4.systems/projects/sel4webserver/) would it
be
vulnerable to any of those?
---
Best,
Hi Hugo,
Sorry I misread you first email. Looking at the sel4webserver setup, it does not seem to use SMP, so this application is not affected by the SMP/VCPU problem.
The documentation is only for AArch32, but the application does work for AArch64 and is tested for AArch64.
To answer whether the application can be affected by the cache operation vulnerability, the main question is what is considered to be trusted and under what threat model. A reasonable threat model would be to treat the outside network as malicious, and the VM(s) and VMM as known, initially trusted code, where the VM(s) may be compromised by attack, because they run Linux.
There are two configurations of the app, one with a single VM that serves directly to the network and one with a separate VM for network interaction. For this vulnerability I think both can be treated the same (although the one with separate network VM has a smaller attack surface): for a network attack to be able to actually perform the vulnerable syscall, it must first subvert Linux with full root-level code execution, then the VM must subvert the VMM with full code execution, then find an untyped capability or existing frame capability that the VMM possesses (it does possess some), from which to then create a non-writeable mapping, and then make the syscall.
Because the VMM is not formally verified, we usually treat the VMM and the VM as being in the same security domain for analysis, but in this case it makes sense to look more closely. Without formal verification there is no guarantee that the full code execution step from VM to VMM cannot happen, but I can’t see any scenario or currently known kind of attack where that would actually be possible. So I would rate the risk as very low.
Even if all of this happened, the only consequence would be a system halt.
Indan’s threat model was that if the VM is compromised, the main purpose of the application (serving web pages) is already compromised, so the additional vulnerability does not really matter. It would only lead to an additional opportunity for a system halt.
The whole thing comes down to that this web server demo is more a demo on how to run a useful VM on top of seL4 than on how to do a secure web server on top of seL4. A more secure design would be the one that currently serves the se4L website (https://sel4.systems/Info/Website/), but it is still a bit early for general production use.
Cheers,
Gerwin
On 7 Nov 2024, at 09:09, Hugo V.C.
Hi Hugo, Sorry I misread you first email. Looking at the sel4webserver setup, it does not seem to use SMP, so this application is not affected by the SMP/VCPU problem. The documentation is only for AArch32, but the application does work for AArch64 and is tested for AArch64. To answer whether the application can be affected by the cache operation vulnerability, the main question is what is considered to be trusted and under what threat model. A reasonable threat model would be to treat the outside network as malicious, and the VM(s) and VMM as known, initially
There are two configurations of the app, one with a single VM that serves
Because the VMM is not formally verified, we usually treat the VMM and
Even if all of this happened, the only consequence would be a system halt. Indan’s threat model was that if the VM is compromised, the main purpose of the application (serving web pages) is already compromised, so the additional vulnerability does not really matter. It would only lead to an additional opportunity for a system halt. The whole thing comes down to that this web server demo is more a demo on how to run a useful VM on top of seL4 than on how to do a secure web server on top of seL4. A more secure design would be the one that currently serves
Cheers, Gerwin
On 7 Nov 2024, at 09:09, Hugo V.C.
wrote: Hi Indan, thanks for the explanation. Anyway, still not clear to me 🙏.
Let me clarify my risk model.
1) I don't care abour user space (Linux/web server) crashes 2) Suppose I have a webserver example with a "vulnerable" seL4 kernel version
I'm a bit concerned about "But those operations, if done in good faith".
So to be even more precise:
In a "webserver example" of year... 2022? (which will be running an old seL4 version), as is, with no modifications from what was published at
Nice Gerwin. Thanks.
Just to make a more spicy debate,
"A reasonable threat model"
well, "reasonable" is what defender wants to achieve. In example, you talk
about "it must first subvert Linux with full root-level code execution"...
ok, this is too easy. I don't trust Linux since Grsecurity patches went
closed source... Vanilla Linux security is a toy. So, to me, Linux is just
a "big boy" application. There are lot of kernel and user space apps 0days
for getting root. So once clarified this, I assume Linux is already
compromised or, a more "real world" use case: I want to give Linux root
access to users. An seL4 must stop them from reaching the host.
I know is not that easy...
All VMM code there...
so, if I give you a root shell in a seL4 webserver example ("Testing the
web server (qemu-arm-virt)", https://github.com/seL4/sel4webserver) with a
compilation evironment (to build an exploit):
you can only crash seL4? Or you think you (with your deep know how of seL4)
would be able to do more nasty things against the underlying host?
Be careful, you are a step away to write the first ever exploit for (demo)
seL4 system... ;-)
On Thursday, November 7, 2024, Gerwin Klein
time:
Can or can not the kernel be crashed in a exploitable way? An seL4 system halt is not a problem to me. An execution manipulation at kernel level it would be. A halt is a halt. No problem. But seL4 is formally verified, so, my concern is if an attacker can do more than halting the seL4.
This kind of questions are not just for fun guys, this is what customers will ask you every single time a non verified code bug arises. And the infosec community needs that kind of analysis to increase trust.
Thank you in advance 🙏
On Tuesday, November 5, 2024, Indan Zupancic
wrote: Hello Hugo,
The bugs you listed are only a problem for untrusted code. That is, it
gives
threads with vcpu control or vspace caps the ability to cause a system
halt,
effectively giving them more rights than they should have, hence it being
a
vulnerability. But those operations, if done in good faith, would be user space bugs.
In the case of the webserver example, if you can trigger either of the
two bugs
in any way, with a current kernel it will avoid the kernel crashing, but
it will
still mean your user space will most likely be broken, resulting in the
same end
user experience of a non-working system. Because as far as I know, the
webserver
example has no fault detection, recovery or restart mechanism implemented.
Greetings,
Indan
On 2024-11-05 10:34, Hugo V.C. wrote:
Hi Gerwin,
thanks for the fast response! Yes, this.I already know. But I'm
interested
in this specific case:
https://docs.sel4.systems/projects/sel4webserver/
as I guess it may not vulnerable as this example (by default) does not
run
SMP seL4. For "for cache maintenance ops" I'm not sure as this qemu environment. Any hints?
On Tuesday, November 5, 2024, Gerwin Klein
wrote: HI Hugo, The CHANGES.md file lists the vulnerable versions of seL4 for each of
these (https://github.com/seL4/seL4/blob/master/CHANGES.md)
- for VCPU/SMP: seL4 versions 12.0.0 and 12.1.0. - for cache maintenance ops on AArch64: all versions before 13.0.0 from
5.0.0
Cheers, Gerwin
On 5 Nov 2024, at 10:02, Hugo V.C.
wrote: I'm forwarding this question here (tried on Mattermost Trustworthy Systems
group first) hoping someone can put some light on this?
---
Hi, I'm having a look to the vulns (in areas of the kernel that have not been formally verified) patched in seL4 13.0.0.
We have:
1) "NULL pointer dereference when injecting an IRQ for a non-associated VCPU on SMP configurations." 2) "On AArch64, when seL4 runs in EL1 the kernel would fault with a data abort in seL4_ARM_Page_Invalidate_Data
and
seL4_ARM_VSpace_Invalidate_Data when the user requested a dc ivac cache maintenance operation on a page that is not mapped writeable."
Extremely simple question: running version < 13.0.0 on top of Qemu (in example like https://docs.sel4.systems/projects/sel4webserver/) would it
be
vulnerable to any of those?
---
Best,
_______________________________________________ Devel mailing list -- devel@sel4.systems To unsubscribe send an email to devel-leave@sel4.systems
Hi Hugo,
Yes, even given full root access to the Linux VM with a compiler, I think that the possibility of exploiting the vulnerability is still extremely low. The VM would need to subvert the VMM into full arbitrary code execution (e.g. executing existing bits of code of the VMM is not enough). The CAmkES VMM doesn’t do any sophisticated instruction emulation or things like that, so the attack surface is very low. That step from VM to VMM is the one that is not verified, but that I think is very unlikely to be possible. Even if it does turn out to be possible, the maximum damage is system halt for the AArch64 cache problem.
For VCPU+SMP, I don’t think address 0 is mapped in any relevant config, so it would also hang as Indan describes.
Cheers,
Gerwin
On 7 Nov 2024, at 13:43, Hugo V.C.
Hi Hugo, Sorry I misread you first email. Looking at the sel4webserver setup, it does not seem to use SMP, so this application is not affected by the SMP/VCPU problem. The documentation is only for AArch32, but the application does work for AArch64 and is tested for AArch64. To answer whether the application can be affected by the cache operation vulnerability, the main question is what is considered to be trusted and under what threat model. A reasonable threat model would be to treat the outside network as malicious, and the VM(s) and VMM as known, initially trusted code, where the VM(s) may be compromised by attack, because they run Linux. There are two configurations of the app, one with a single VM that serves directly to the network and one with a separate VM for network interaction. For this vulnerability I think both can be treated the same (although the one with separate network VM has a smaller attack surface): for a network attack to be able to actually perform the vulnerable syscall, it must first subvert Linux with full root-level code execution, then the VM must subvert the VMM with full code execution, then find an untyped capability or existing frame capability that the VMM possesses (it does possess some), from which to then create a non-writeable mapping, and then make the syscall. Because the VMM is not formally verified, we usually treat the VMM and the VM as being in the same security domain for analysis, but in this case it makes sense to look more closely. Without formal verification there is no guarantee that the full code execution step from VM to VMM cannot happen, but I can’t see any scenario or currently known kind of attack where that would actually be possible. So I would rate the risk as very low. Even if all of this happened, the only consequence would be a system halt. Indan’s threat model was that if the VM is compromised, the main purpose of the application (serving web pages) is already compromised, so the additional vulnerability does not really matter. It would only lead to an additional opportunity for a system halt. The whole thing comes down to that this web server demo is more a demo on how to run a useful VM on top of seL4 than on how to do a secure web server on top of seL4. A more secure design would be the one that currently serves the se4L website (https://sel4.systems/Info/Website/), but it is still a bit early for general production use. Cheers, Gerwin
On 7 Nov 2024, at 09:09, Hugo V.C.
mailto:skydivebcn@gmail.com> wrote: Hi Indan, thanks for the explanation. Anyway, still not clear to me 🙏.
Let me clarify my risk model.
1) I don't care abour user space (Linux/web server) crashes 2) Suppose I have a webserver example with a "vulnerable" seL4 kernel version
I'm a bit concerned about "But those operations, if done in good faith".
So to be even more precise:
In a "webserver example" of year... 2022? (which will be running an old seL4 version), as is, with no modifications from what was published at that time:
Can or can not the kernel be crashed in a exploitable way? An seL4 system halt is not a problem to me. An execution manipulation at kernel level it would be. A halt is a halt. No problem. But seL4 is formally verified, so, my concern is if an attacker can do more than halting the seL4.
This kind of questions are not just for fun guys, this is what customers will ask you every single time a non verified code bug arises. And the infosec community needs that kind of analysis to increase trust.
Thank you in advance 🙏
On Tuesday, November 5, 2024, Indan Zupancic
mailto:indan@nul.nu> wrote: Hello Hugo,
The bugs you listed are only a problem for untrusted code. That is, it
gives
threads with vcpu control or vspace caps the ability to cause a system
halt,
effectively giving them more rights than they should have, hence it being
a
vulnerability. But those operations, if done in good faith, would be user space bugs.
In the case of the webserver example, if you can trigger either of the
two bugs
in any way, with a current kernel it will avoid the kernel crashing, but
it will
still mean your user space will most likely be broken, resulting in the
same end
user experience of a non-working system. Because as far as I know, the
webserver
example has no fault detection, recovery or restart mechanism implemented.
Greetings,
Indan
On 2024-11-05 10:34, Hugo V.C. wrote:
Hi Gerwin,
thanks for the fast response! Yes, this.I already know. But I'm
interested
in this specific case:
https://docs.sel4.systems/projects/sel4webserver/
as I guess it may not vulnerable as this example (by default) does not
run
SMP seL4. For "for cache maintenance ops" I'm not sure as this qemu environment. Any hints?
On Tuesday, November 5, 2024, Gerwin Klein
mailto:kleing@unsw.edu.au> wrote: HI Hugo, The CHANGES.md file lists the vulnerable versions of seL4 for each of
these (https://github.com/seL4/seL4/blob/master/CHANGES.md)
- for VCPU/SMP: seL4 versions 12.0.0 and 12.1.0. - for cache maintenance ops on AArch64: all versions before 13.0.0 from
5.0.0
Cheers, Gerwin
On 5 Nov 2024, at 10:02, Hugo V.C.
mailto:skydivebcn@gmail.com> wrote: I'm forwarding this question here (tried on Mattermost Trustworthy Systems
group first) hoping someone can put some light on this?
---
Hi, I'm having a look to the vulns (in areas of the kernel that have not been formally verified) patched in seL4 13.0.0.
We have:
1) "NULL pointer dereference when injecting an IRQ for a non-associated VCPU on SMP configurations." 2) "On AArch64, when seL4 runs in EL1 the kernel would fault with a data abort in seL4_ARM_Page_Invalidate_Data
and
seL4_ARM_VSpace_Invalidate_Data when the user requested a dc ivac cache maintenance operation on a page that is not mapped writeable."
Extremely simple question: running version < 13.0.0 on top of Qemu (in example like https://docs.sel4.systems/projects/sel4webserver/) would it
be
vulnerable to any of those?
---
Best,
_______________________________________________ Devel mailing list -- devel@sel4.systems To unsubscribe send an email to devel-leave@sel4.systems
Hi Gerwin, On 2024-11-07 14:00, Gerwin Klein wrote:
For VCPU+SMP, I don’t think address 0 is mapped in any relevant config
I think the kernel shares the page tables with user space and maps its kernel memory window in every address space, it doesn't do a full address space switch on each entry. That means that if user space has mappings at 0, the kernel will have it too. I don't think there is a way to make user space mappings invalid for kernel accesses. Any sane user space won't have a zero mapping, but the init task may be an exception. I also don't know if the kernel disallows user space mappings at 0. Greetings, Indan
On 7 Nov 2024, at 15:38, Indan Zupancic
wrote: Hi Gerwin,
On 2024-11-07 14:00, Gerwin Klein wrote:
For VCPU+SMP, I don’t think address 0 is mapped in any relevant config
I think the kernel shares the page tables with user space and maps its kernel memory window in every address space, it doesn't do a full address space switch on each entry.
On Arm, when seL4 runs in EL2, it has its own separate page tables. Cheers, Gerwin
Hello Hugo, On 2024-11-07 08:09, Hugo V.C. wrote:
Can or can not the kernel be crashed in a exploitable way? An seL4 system halt is not a problem to me. An execution manipulation at kernel level it would be. A halt is a halt. No problem. But seL4 is formally verified, so, my concern is if an attacker can do more than halting the seL4.
The bugs are normally limited to halting the kernel in both cases for all user space setups: - The cache invalidate causes a halt because there is no write access. This is a user space app calling cache maintenance syscalls on a page cap that is mapped read-only. Here seL4 was more permissive than the actual hardware. The only problem is the kernel halt, there are no other side effects. - The VCPU bug only affects ARM SMP configurations (otherwise it would have been found by verification). Normally the kernel has no memory mapped at address 0, you can check the memory map to verify if this is the case for your platform. If there is no mapping for address 0, the kernel will halt and there are no other side effects. The rest of the analysis is for the highly unlikely case that you do run an unverified kernel build on SMP ARM with HYP enabled, where an untrusted or compromised task has a cap to a disassociated VCPU, with a kernel that has memory mapped starting at address 0: If for some reason it has, the NULL access itself is limited to a read of the NULL TCB's affinity, which will be used to decide whether to send an IPI and to which core (offset depends on the size of arch_tcb_t, but will be less than 4kb). If you have device memory at that address, that read may have some side effect (although I think ARM hardware usually doesn't). Otherwise that invalid read itself will have no side-effects and is harmless. If it reads a valid core affinity, then things end up harmless: handleVCPUInjectInterruptIPI() will handle the IPI and do what was requested, it doesn't use or touch the NULL TCB in any way, it just does some internal bookkeeping. If it reads an invalid core affinity, things get more interesting. It will try to send an IPI to core >= NUM_CORES. This results in an out-of-bound write past big_kernel_lock.node_owners[] in generic_ipi_send_mask(). The write is limited to writing the value '1' to the third word in one 64 or 32 bytes aligned chunk past the big kernel lock, limited in range to about 64 or 32 * 64 bytes, for 64-bit and 32-bit platforms respectively (less if L1 cache lines are 32 bytes). You can check your kernel.elf binary to see what could be affected this way if you're interested. If you're unlucky it's within range of the kernel stack and the attacker can modify one stack word to the value of 1. Assuming the attacker has no control over memory content at address 0-4kb, this is most likely useless, as they can't choose the address, nor the value written. This one write is all they get, because after that the kernel will hang forever in ipi_wait(), without releasing the big kernel lock, so any write they did to kernel memory can't affect other cores either. If the attacker has control over address range 0-4kb and all other requirements are met, and the attacker gets very lucky, then the VCPU bug might be enough to take over the system. But it's still very unlikely, as all they can write is 1 to limited memory targets. That one write has to both escalate privileges and avoid the hang in ipi_wait() to be useful. They can't unlock the kernel lock because that requires a write of 0 to the wrong offset.
This kind of questions are not just for fun guys, this is what customers will ask you every single time a non verified code bug arises. And the infosec community needs that kind of analysis to increase trust.
Usually customers understand their own system enough to know whether a bug affects their system or not. An analysis like I did above is required for your specific system before you can be assured that known bugs don't affect your systems. Greetings, Indan
Thank you in advance 🙏
Hello Hugo,
The bugs you listed are only a problem for untrusted code. That is, it gives threads with vcpu control or vspace caps the ability to cause a system halt, effectively giving them more rights than they should have, hence it being a vulnerability. But those operations, if done in good faith, would be user space bugs.
In the case of the webserver example, if you can trigger either of
in any way, with a current kernel it will avoid the kernel crashing, but it will still mean your user space will most likely be broken, resulting in
user experience of a non-working system. Because as far as I know,
example has no fault detection, recovery or restart mechanism implemented.
Greetings,
Indan
On 2024-11-05 10:34, Hugo V.C. wrote:
Hi Gerwin,
thanks for the fast response! Yes, this.I already know. But I'm
interested
in this specific case:
https://docs.sel4.systems/projects/sel4webserver/
as I guess it may not vulnerable as this example (by default) does not run SMP seL4. For "for cache maintenance ops" I'm not sure as this qemu environment. Any hints?
On Tuesday, November 5, 2024, Gerwin Klein
wrote: HI Hugo, The CHANGES.md file lists the vulnerable versions of seL4 for each
of
these (https://github.com/seL4/seL4/blob/master/CHANGES.md)
- for VCPU/SMP: seL4 versions 12.0.0 and 12.1.0. - for cache maintenance ops on AArch64: all versions before 13.0.0
from
5.0.0
Cheers, Gerwin
On 5 Nov 2024, at 10:02, Hugo V.C.
wrote: I'm forwarding this question here (tried on Mattermost Trustworthy Systems
group first) hoping someone can put some light on this?
---
Hi, I'm having a look to the vulns (in areas of the kernel that have not been formally verified) patched in seL4 13.0.0.
We have:
1) "NULL pointer dereference when injecting an IRQ for a non-associated VCPU on SMP configurations." 2) "On AArch64, when seL4 runs in EL1
On Tuesday, November 5, 2024, Indan Zupancic
wrote: the two bugs the same end the webserver the kernel would fault with a data abort in seL4_ARM_Page_Invalidate_Data and seL4_ARM_VSpace_Invalidate_Data when the user requested a dc ivac cache maintenance operation on a page that is not mapped writeable."
Extremely simple question: running version < 13.0.0 on top of Qemu (in example like https://docs.sel4.systems/projects/sel4webserver/) would it
be
vulnerable to any of those?
---
Best,
Wow... this requires coffee.... Thanks Indan. I'll come back when I digest
and more or less understand your very interesting explanation.
On Thursday, November 7, 2024, Indan Zupancic
Hello Hugo,
On 2024-11-07 08:09, Hugo V.C. wrote:
Can or can not the kernel be crashed in a exploitable way? An seL4 system halt is not a problem to me. An execution manipulation at kernel level it would be. A halt is a halt. No problem. But seL4 is formally verified, so, my concern is if an attacker can do more than halting the seL4.
The bugs are normally limited to halting the kernel in both cases for all user space setups:
- The cache invalidate causes a halt because there is no write access. This is a user space app calling cache maintenance syscalls on a page cap that is mapped read-only. Here seL4 was more permissive than the actual hardware. The only problem is the kernel halt, there are no other side effects.
- The VCPU bug only affects ARM SMP configurations (otherwise it would have been found by verification). Normally the kernel has no memory mapped at address 0, you can check the memory map to verify if this is the case for your platform. If there is no mapping for address 0, the kernel will halt and there are no other side effects.
The rest of the analysis is for the highly unlikely case that you do run an unverified kernel build on SMP ARM with HYP enabled, where an untrusted or compromised task has a cap to a disassociated VCPU, with a kernel that has memory mapped starting at address 0:
If for some reason it has, the NULL access itself is limited to a read of the NULL TCB's affinity, which will be used to decide whether to send an IPI and to which core (offset depends on the size of arch_tcb_t, but will be less than 4kb). If you have device memory at that address, that read may have some side effect (although I think ARM hardware usually doesn't). Otherwise that invalid read itself will have no side-effects and is harmless.
If it reads a valid core affinity, then things end up harmless: handleVCPUInjectInterruptIPI() will handle the IPI and do what was requested, it doesn't use or touch the NULL TCB in any way, it just does some internal bookkeeping.
If it reads an invalid core affinity, things get more interesting. It will try to send an IPI to core >= NUM_CORES. This results in an out-of-bound write past big_kernel_lock.node_owners[] in generic_ipi_send_mask(). The write is limited to writing the value '1' to the third word in one 64 or 32 bytes aligned chunk past the big kernel lock, limited in range to about 64 or 32 * 64 bytes, for 64-bit and 32-bit platforms respectively (less if L1 cache lines are 32 bytes).
You can check your kernel.elf binary to see what could be affected this way if you're interested. If you're unlucky it's within range of the kernel stack and the attacker can modify one stack word to the value of 1.
Assuming the attacker has no control over memory content at address 0-4kb, this is most likely useless, as they can't choose the address, nor the value written. This one write is all they get, because after that the kernel will hang forever in ipi_wait(), without releasing the big kernel lock, so any write they did to kernel memory can't affect other cores either.
If the attacker has control over address range 0-4kb and all other requirements are met, and the attacker gets very lucky, then the VCPU bug might be enough to take over the system. But it's still very unlikely, as all they can write is 1 to limited memory targets. That one write has to both escalate privileges and avoid the hang in ipi_wait() to be useful. They can't unlock the kernel lock because that requires a write of 0 to the wrong offset.
This kind of questions are not just for fun guys, this is what customers will ask you every single time a non verified code bug arises. And the infosec community needs that kind of analysis to increase trust.
Usually customers understand their own system enough to know whether a bug affects their system or not. An analysis like I did above is required for your specific system before you can be assured that known bugs don't affect your systems.
Greetings,
Indan
Thank you in advance 🙏
On Tuesday, November 5, 2024, Indan Zupancic
wrote: Hello Hugo,
The bugs you listed are only a problem for untrusted code. That is,
it gives
threads with vcpu control or vspace caps the ability to cause a
system halt,
effectively giving them more rights than they should have, hence it
being a
vulnerability. But those operations, if done in good faith, would be
user
space bugs.
In the case of the webserver example, if you can trigger either of
the two bugs
in any way, with a current kernel it will avoid the kernel crashing,
but it will
still mean your user space will most likely be broken, resulting in
the same end
user experience of a non-working system. Because as far as I know,
the webserver
example has no fault detection, recovery or restart mechanism
implemented.
Greetings,
Indan
On 2024-11-05 10:34, Hugo V.C. wrote:
Hi Gerwin,
thanks for the fast response! Yes, this.I already know. But I'm
interested
in this specific case:
https://docs.sel4.systems/projects/sel4webserver/
as I guess it may not vulnerable as this example (by default) does
not run
SMP seL4. For "for cache maintenance ops" I'm not sure as this qemu environment. Any hints?
On Tuesday, November 5, 2024, Gerwin Klein
wrote:
HI Hugo, The CHANGES.md file lists the vulnerable versions of seL4 for each
of
these (https://github.com/seL4/seL4/blob/master/CHANGES.md)
- for VCPU/SMP: seL4 versions 12.0.0 and 12.1.0. - for cache maintenance ops on AArch64: all versions before 13.0.0
from
5.0.0
Cheers, Gerwin
On 5 Nov 2024, at 10:02, Hugo V.C.
wrote: I'm forwarding this question here (tried on Mattermost Trustworthy Systems
group first) hoping someone can put some light on this?
---
Hi, I'm having a look to the vulns (in areas of the kernel that
have not
been formally verified) patched in seL4 13.0.0.
We have:
1) "NULL pointer dereference when injecting an IRQ for a
non-associated
VCPU on SMP configurations." 2) "On AArch64, when seL4 runs in EL1
the
kernel would fault with a data abort in
seL4_ARM_Page_Invalidate_Data and
seL4_ARM_VSpace_Invalidate_Data when the user requested a dc ivac
cache
maintenance operation on a page that is not mapped writeable."
Extremely simple question: running version < 13.0.0 on top of Qemu
(in
example like https://docs.sel4.systems/projects/sel4webserver/)
would it
be
vulnerable to any of those?
---
Best,
"But it's still very unlikely, as all they can write is 1 to limited memory targets." Interesting. Moreover as we are talking about a system with static memory definitions (no need to bypass any KASLR...) so, if the attacker is very, very lucky and the overwrited memory region is profitable, we would have an ultrareliable exploit. But as per your explanations, look very difficult. BTW: writing "1" in the right place can be fatal. Would be nice to test what ca be achieved. "Usually customers understand their own system enough to know whether
a bug affects their system or not."
Unfortunately here I strongly disagree. My experience is many times even
the software developers don't get the full impact of a bug (that's why we
have those patches for vulns in well known OSs that don't really patch and
things get exploited again).
When it comes to memory bugs, things can become very weird in terms of
knowing the impact. Fortunately in seL4 environment bugs impact are
"easier" to be defined as things in the kernel are "static". The bad news
is this also helps attackers.
If I were you guys I would create an exploitable environment and would try
to exploit it. Just for fun. Just to learn. Maybe you get a surprise.
On page 210 of this document (https://www.pentest.es/checkpoint_hack.pdf)
there's (at that time, year 2007, a 0day) for a memory vulnerability that
was supposed to be "almost impossible to exploit" as there were many
protections in place. At the end an exploit was possible. It took me few
months, but I did it. At that time ROP technique still didn't existed (in
fact this document is the genesys) so exploitation was painful. Nowadays
everything is automated.
If there's a way to exploit the bug you described, I bet it will be
exploited.
Attackers like to learn.
Probably someone is reading this and considering seL4 targets military
environments, yes, looks interesting.
And almost never you will have 100% verified code, so to me looks very
interesting to research what is the impact of bugs in unverified code when
they interact with verified code.
Maybe we learn something. And for sure we will have fun.
On Thursday, November 7, 2024, Indan Zupancic
Hello Hugo,
On 2024-11-07 08:09, Hugo V.C. wrote:
Can or can not the kernel be crashed in a exploitable way? An seL4 system halt is not a problem to me. An execution manipulation at kernel level it would be. A halt is a halt. No problem. But seL4 is formally verified, so, my concern is if an attacker can do more than halting the seL4.
The bugs are normally limited to halting the kernel in both cases for all user space setups:
- The cache invalidate causes a halt because there is no write access. This is a user space app calling cache maintenance syscalls on a page cap that is mapped read-only. Here seL4 was more permissive than the actual hardware. The only problem is the kernel halt, there are no other side effects.
- The VCPU bug only affects ARM SMP configurations (otherwise it would have been found by verification). Normally the kernel has no memory mapped at address 0, you can check the memory map to verify if this is the case for your platform. If there is no mapping for address 0, the kernel will halt and there are no other side effects.
The rest of the analysis is for the highly unlikely case that you do run an unverified kernel build on SMP ARM with HYP enabled, where an untrusted or compromised task has a cap to a disassociated VCPU, with a kernel that has memory mapped starting at address 0:
If for some reason it has, the NULL access itself is limited to a read of the NULL TCB's affinity, which will be used to decide whether to send an IPI and to which core (offset depends on the size of arch_tcb_t, but will be less than 4kb). If you have device memory at that address, that read may have some side effect (although I think ARM hardware usually doesn't). Otherwise that invalid read itself will have no side-effects and is harmless.
If it reads a valid core affinity, then things end up harmless: handleVCPUInjectInterruptIPI() will handle the IPI and do what was requested, it doesn't use or touch the NULL TCB in any way, it just does some internal bookkeeping.
If it reads an invalid core affinity, things get more interesting. It will try to send an IPI to core >= NUM_CORES. This results in an out-of-bound write past big_kernel_lock.node_owners[] in generic_ipi_send_mask(). The write is limited to writing the value '1' to the third word in one 64 or 32 bytes aligned chunk past the big kernel lock, limited in range to about 64 or 32 * 64 bytes, for 64-bit and 32-bit platforms respectively (less if L1 cache lines are 32 bytes).
You can check your kernel.elf binary to see what could be affected this way if you're interested. If you're unlucky it's within range of the kernel stack and the attacker can modify one stack word to the value of 1.
Assuming the attacker has no control over memory content at address 0-4kb, this is most likely useless, as they can't choose the address, nor the value written. This one write is all they get, because after that the kernel will hang forever in ipi_wait(), without releasing the big kernel lock, so any write they did to kernel memory can't affect other cores either.
If the attacker has control over address range 0-4kb and all other requirements are met, and the attacker gets very lucky, then the VCPU bug might be enough to take over the system. But it's still very unlikely, as all they can write is 1 to limited memory targets. That one write has to both escalate privileges and avoid the hang in ipi_wait() to be useful. They can't unlock the kernel lock because that requires a write of 0 to the wrong offset.
This kind of questions are not just for fun guys, this is what customers will ask you every single time a non verified code bug arises. And the infosec community needs that kind of analysis to increase trust.
Usually customers understand their own system enough to know whether a bug affects their system or not. An analysis like I did above is required for your specific system before you can be assured that known bugs don't affect your systems.
Greetings,
Indan
Thank you in advance 🙏
On Tuesday, November 5, 2024, Indan Zupancic
wrote: Hello Hugo,
The bugs you listed are only a problem for untrusted code. That is,
it gives
threads with vcpu control or vspace caps the ability to cause a
system halt,
effectively giving them more rights than they should have, hence it
being a
vulnerability. But those operations, if done in good faith, would be
user
space bugs.
In the case of the webserver example, if you can trigger either of
the two bugs
in any way, with a current kernel it will avoid the kernel crashing,
but it will
still mean your user space will most likely be broken, resulting in
the same end
user experience of a non-working system. Because as far as I know,
the webserver
example has no fault detection, recovery or restart mechanism
implemented.
Greetings,
Indan
On 2024-11-05 10:34, Hugo V.C. wrote:
Hi Gerwin,
thanks for the fast response! Yes, this.I already know. But I'm
interested
in this specific case:
https://docs.sel4.systems/projects/sel4webserver/
as I guess it may not vulnerable as this example (by default) does
not run
SMP seL4. For "for cache maintenance ops" I'm not sure as this qemu environment. Any hints?
On Tuesday, November 5, 2024, Gerwin Klein
wrote:
HI Hugo, The CHANGES.md file lists the vulnerable versions of seL4 for each
of
these (https://github.com/seL4/seL4/blob/master/CHANGES.md)
- for VCPU/SMP: seL4 versions 12.0.0 and 12.1.0. - for cache maintenance ops on AArch64: all versions before 13.0.0
from
5.0.0
Cheers, Gerwin
On 5 Nov 2024, at 10:02, Hugo V.C.
wrote: I'm forwarding this question here (tried on Mattermost Trustworthy Systems
group first) hoping someone can put some light on this?
---
Hi, I'm having a look to the vulns (in areas of the kernel that
have not
been formally verified) patched in seL4 13.0.0.
We have:
1) "NULL pointer dereference when injecting an IRQ for a
non-associated
VCPU on SMP configurations." 2) "On AArch64, when seL4 runs in EL1
the
kernel would fault with a data abort in
seL4_ARM_Page_Invalidate_Data and
seL4_ARM_VSpace_Invalidate_Data when the user requested a dc ivac
cache
maintenance operation on a page that is not mapped writeable."
Extremely simple question: running version < 13.0.0 on top of Qemu
(in
example like https://docs.sel4.systems/projects/sel4webserver/)
would it
be
vulnerable to any of those?
---
Best,
Sorry if this is unwelcome, but this discussion is a bit alarming. If there
is a conceivable exploit anywhere, it is a vulnerability that should be
contained. I wrote a post about Murphy's Law because the entry on it in
Wikipedia is woefully incorrect and it appears that somehow the zeitgeist
now has Murphy's Law as just a humorous observation and nothing more. It's
a real thing that systems people should understand and respect.
https://blog.bobtrower.com/2024/06/murphys-law-is-not-simply-adage.html
Note that you can comfortably say that a given random selection from, say,
2^1024 is improbable to the point of impossibility, but you *cannot* be
certain that the selection is random.
Cheers!
On Thu, Nov 7, 2024 at 3:31 PM Hugo V.C.
"But it's still very unlikely, as all they can write is 1 to limited memory targets."
Interesting. Moreover as we are talking about a system with static memory definitions (no need to bypass any KASLR...) so, if the attacker is very, very lucky and the overwrited memory region is profitable, we would have an ultrareliable exploit. But as per your explanations, look very difficult.
BTW: writing "1" in the right place can be fatal. Would be nice to test what ca be achieved.
"Usually customers understand their own system enough to know whether
a bug affects their system or not."
Unfortunately here I strongly disagree. My experience is many times even the software developers don't get the full impact of a bug (that's why we have those patches for vulns in well known OSs that don't really patch and things get exploited again).
When it comes to memory bugs, things can become very weird in terms of knowing the impact. Fortunately in seL4 environment bugs impact are "easier" to be defined as things in the kernel are "static". The bad news is this also helps attackers.
If I were you guys I would create an exploitable environment and would try to exploit it. Just for fun. Just to learn. Maybe you get a surprise.
On page 210 of this document (https://www.pentest.es/checkpoint_hack.pdf) there's (at that time, year 2007, a 0day) for a memory vulnerability that was supposed to be "almost impossible to exploit" as there were many protections in place. At the end an exploit was possible. It took me few months, but I did it. At that time ROP technique still didn't existed (in fact this document is the genesys) so exploitation was painful. Nowadays everything is automated.
If there's a way to exploit the bug you described, I bet it will be exploited. Attackers like to learn. Probably someone is reading this and considering seL4 targets military environments, yes, looks interesting.
And almost never you will have 100% verified code, so to me looks very interesting to research what is the impact of bugs in unverified code when they interact with verified code.
Maybe we learn something. And for sure we will have fun.
On Thursday, November 7, 2024, Indan Zupancic
wrote: Hello Hugo,
On 2024-11-07 08:09, Hugo V.C. wrote:
Can or can not the kernel be crashed in a exploitable way? An seL4 system halt is not a problem to me. An execution manipulation at kernel level it would be. A halt is a halt. No problem. But seL4 is formally verified, so, my concern is if an attacker can do more than halting the seL4.
The bugs are normally limited to halting the kernel in both cases for all user space setups:
- The cache invalidate causes a halt because there is no write access. This is a user space app calling cache maintenance syscalls on a page cap that is mapped read-only. Here seL4 was more permissive than the actual hardware. The only problem is the kernel halt, there are no other side effects.
- The VCPU bug only affects ARM SMP configurations (otherwise it would have been found by verification). Normally the kernel has no memory mapped at address 0, you can check the memory map to verify if this is the case for your platform. If there is no mapping for address 0, the kernel will halt and there are no other side effects.
The rest of the analysis is for the highly unlikely case that you do run an unverified kernel build on SMP ARM with HYP enabled, where an untrusted or compromised task has a cap to a disassociated VCPU, with a kernel that has memory mapped starting at address 0:
If for some reason it has, the NULL access itself is limited to a read of the NULL TCB's affinity, which will be used to decide whether to send an IPI and to which core (offset depends on the size of arch_tcb_t, but will be less than 4kb). If you have device memory at that address, that read may have some side effect (although I think ARM hardware usually doesn't). Otherwise that invalid read itself will have no side-effects and is harmless.
If it reads a valid core affinity, then things end up harmless: handleVCPUInjectInterruptIPI() will handle the IPI and do what was requested, it doesn't use or touch the NULL TCB in any way, it just does some internal bookkeeping.
If it reads an invalid core affinity, things get more interesting. It will try to send an IPI to core >= NUM_CORES. This results in an out-of-bound write past big_kernel_lock.node_owners[] in generic_ipi_send_mask(). The write is limited to writing the value '1' to the third word in one 64 or 32 bytes aligned chunk past the big kernel lock, limited in range to about 64 or 32 * 64 bytes, for 64-bit and 32-bit platforms respectively (less if L1 cache lines are 32 bytes).
You can check your kernel.elf binary to see what could be affected this way if you're interested. If you're unlucky it's within range of the kernel stack and the attacker can modify one stack word to the value of 1.
Assuming the attacker has no control over memory content at address 0-4kb, this is most likely useless, as they can't choose the address, nor the value written. This one write is all they get, because after that the kernel will hang forever in ipi_wait(), without releasing the big kernel lock, so any write they did to kernel memory can't affect other cores either.
If the attacker has control over address range 0-4kb and all other requirements are met, and the attacker gets very lucky, then the VCPU bug might be enough to take over the system. But it's still very unlikely, as all they can write is 1 to limited memory targets. That one write has to both escalate privileges and avoid the hang in ipi_wait() to be useful. They can't unlock the kernel lock because that requires a write of 0 to the wrong offset.
This kind of questions are not just for fun guys, this is what customers will ask you every single time a non verified code bug arises. And the infosec community needs that kind of analysis to increase trust.
Usually customers understand their own system enough to know whether a bug affects their system or not. An analysis like I did above is required for your specific system before you can be assured that known bugs don't affect your systems.
Greetings,
Indan
Thank you in advance 🙏
On Tuesday, November 5, 2024, Indan Zupancic
wrote: Hello Hugo,
The bugs you listed are only a problem for untrusted code. That is,
it gives
threads with vcpu control or vspace caps the ability to cause a
system halt,
effectively giving them more rights than they should have, hence it
being a
vulnerability. But those operations, if done in good faith, would be
user
space bugs.
In the case of the webserver example, if you can trigger either of
the two bugs
in any way, with a current kernel it will avoid the kernel crashing,
but it will
still mean your user space will most likely be broken, resulting in
the same end
user experience of a non-working system. Because as far as I know,
the webserver
example has no fault detection, recovery or restart mechanism
implemented.
Greetings,
Indan
On 2024-11-05 10:34, Hugo V.C. wrote:
Hi Gerwin,
thanks for the fast response! Yes, this.I already know. But I'm
interested
in this specific case:
https://docs.sel4.systems/projects/sel4webserver/
as I guess it may not vulnerable as this example (by default) does
not run
SMP seL4. For "for cache maintenance ops" I'm not sure as this qemu environment. Any hints?
On Tuesday, November 5, 2024, Gerwin Klein
wrote:
HI Hugo, The CHANGES.md file lists the vulnerable versions of seL4 for each
of
these (https://github.com/seL4/seL4/blob/master/CHANGES.md)
- for VCPU/SMP: seL4 versions 12.0.0 and 12.1.0. - for cache maintenance ops on AArch64: all versions before 13.0.0
from
5.0.0
Cheers, Gerwin
On 5 Nov 2024, at 10:02, Hugo V.C.
wrote: I'm forwarding this question here (tried on Mattermost Trustworthy Systems
group first) hoping someone can put some light on this?
---
Hi, I'm having a look to the vulns (in areas of the kernel that
have not
been formally verified) patched in seL4 13.0.0.
We have:
1) "NULL pointer dereference when injecting an IRQ for a
non-associated
VCPU on SMP configurations." 2) "On AArch64, when seL4 runs in EL1
the
kernel would fault with a data abort in
seL4_ARM_Page_Invalidate_Data and
seL4_ARM_VSpace_Invalidate_Data when the user requested a dc ivac
cache
maintenance operation on a page that is not mapped writeable."
Extremely simple question: running version < 13.0.0 on top of Qemu
(in
example like https://docs.sel4.systems/projects/sel4webserver/)
would it
be
vulnerable to any of those?
---
Best,
_______________________________________________ Devel mailing list -- devel@sel4.systems To unsubscribe send an email to devel-leave@sel4.systems
-- Bob Trower --- From Gmail webmail account. ---
participants (4)
-
Bob Trower
-
Gerwin Klein
-
Hugo V.C.
-
Indan Zupancic