ARM64: Handle SError in user-space instead of halting the kernel
Hello, By default any SError interrupt will halt the kernel. SErrors may be caused by e.g. writes to read-only device registers or ECC errors. With this patch SErrors can instead be reported as a VM data fault for the currently running task. Because SError is asynchronous it is not assured that the faulting task caused the error. With ARMv8.2 there is a new RAS feature which solves the SError problem for common cases: This patch does not use that, our hardware is v8.0, we need another solution for the SError problem. We would rather log the SError and continue than halt the whole system. As SError is easily triggered by device drivers, halting the whole system would make user-space device drivers not much more reliable than in-kernel ones. I do not know if VMs can also cause SErrors in seL4. However, that seems likely, in which case guest OSes can also halt the whole system. One thing I'm not sure about: SError is edge-triggered, while handleVMFaultEvent() does a MCS_DO_IF_BUDGET check. Does this means that if the current task has no budget, the SError IRQ will be lost? Or are pending faults saved by the schedule code? For us missing the SError would be better than halting the kernel, so it doesn't really matter, but I would like to know. I have an Atlassian account, but it seems you can't attach patches there, so I'm not sure how else to send this patch, other than creating a Github account. What is the preferred way to submit patches as an outsider? Best regards, Indan
Indan,
I have an Atlassian account, but it seems you can't attach patches there, so I'm not sure how else to send this patch, other than creating a Github account. What is the preferred way to submit patches as an outsider?
Please see https://docs.sel4.systems/processes/contributing.html for details, the preferred way is creating a pull request on GitHub, as the platform allows a public review process then. Furthermore, there is an automatic CI run that gives immediate feedback if this patch quality is ok. Note that the actual commits do not have to be linked to you GitHub account, just the DCO sign-off note must match the commit author. Thus, somebody could technically pick your commits from another public repo (or even a patch sent per mail) and create a pull request then. However, besides motivating someone to do this, it complicates further communication with you as author in case there are change requests. GitHub offers a convenient platform here. So I wonder, what is preventing you from creating an account there and doing pull requests? Such an account is useful for contributing to many open source projects. Even within seL4, there is more then just one single repository that might be affected by your changes. Axel
Hello, On 2021-05-26 23:24, Axel Heider wrote:
GitHub offers a convenient platform here. So I wonder, what is preventing you from creating an account there and doing pull requests?
In the past I had DNS based ad/tracking filtering and that utterly broke the Github website. Since then I have always avoided Github and that's where my reluctance comes from. Nowadays there are decent in-browser plugins and it seems Github can stand on its own legs now, as it doesn't depend on external websites as much any more. Other than that I find the Github website not very pleasant as far as usability goes. On 2021-05-27 03:23, Gerwin Klein wrote:
It sounds like you might have already attached a patch to you message, but it didn't come through on the mailing list. This happened to me before as well -- I think the list is set up to strip attachments for security. If the GitHub option doesn't work for you, feel free to send me the patch directly and I'll raise a GitHub PR for you.
Indeed it got dropped silently. I'll just create a Github account, that is the easiest solution long-term.
For the Jira tracker, there should be an "Attach" button on the top left of the issue screen once an issue is created. That should allow you to attach a file. Does that button appear for you? (If not, there might be some configuration option we might be able to tweak)
I do not see such button. Or do only creators of an issue see it?
In terms of content: yes, halting the machine based on errors the user can trigger from user space is definitely not ideal, and we should do something about it. This case is at least somewhat constrained to users that have direct access to device memory, but if we can lock it down more, that would be good.
Further testing seems to indicate this only applies to writes to read-only registers on our platform. Reads of write-only registers cause a (synchronous) Data Abort exception. As the patch is small I'll attach it at the bottom here as well. With this patch, if users enable this feature, all user space fault handlers should check whether the fault is an SError or not (EC 0x2f) before deciding what to do, keeping in mind the address is a virtual address possibly for another task and that the PC is of the current task, not the instruction causing the fault. People who care deeply about this issue should make sure they have ARMv8.2 hardware with FEAT_RAS and use that to implement proper SError handling in seL4 for such systems (see e.g. Linux kernel). But that would be another patch, unrelated to this one. Greetings, Indan --- diff --git a/include/arch/arm/arch/64/mode/machine/registerset.h b/include/arch/arm/arch/64/mode/machine/registerset.h index ca040770..2bf6e674 100644 --- a/include/arch/arm/arch/64/mode/machine/registerset.h +++ b/include/arch/arm/arch/64/mode/machine/registerset.h @@ -38,6 +38,7 @@ #define ESR_EC_LEL_SVC64 0x15 // SVC from a lower EL in AArch64 state #define ESR_EC_LEL_HVC64 0x16 // HVC from EL1 in AArch64 state #define ESR_EL1_EC_ENFP 0x7 // Access to Advanced SIMD or floating-point registers +#define ESR_EC_LEL_SERROR 0x2f // SError /* ID_AA64PFR0_EL1 register */ diff --git a/src/arch/arm/64/traps.S b/src/arch/arm/64/traps.S index ed9aad3d..0f70cbc7 100644 --- a/src/arch/arm/64/traps.S +++ b/src/arch/arm/64/traps.S @@ -29,6 +29,9 @@ #endif +#ifdef CONFIG_AARCH64_PASS_SERROR +#define lower_el_serr lower_el_sync +#endif .macro lsp_i _tmp mrs \_tmp, TPIDR @@ -160,6 +163,10 @@ BEGIN_FUNC(lower_el_sync) b.eq lel_ia cmp x24, #ESR_EC_LEL_SVC64 b.eq lel_syscall +#ifdef CONFIG_AARCH64_PASS_SERROR + cmp x24, #ESR_EC_LEL_SERROR + b.eq lel_da +#endif #ifdef CONFIG_ARM_HYPERVISOR_SUPPORT cmp x24, #ESR_EC_LEL_HVC64 b.eq lel_syscall @@ -233,6 +240,8 @@ BEGIN_FUNC(lower_el_irq) b c_handle_interrupt END_FUNC(lower_el_irq) +#ifndef CONFIG_AARCH64_PASS_SERROR + BEGIN_FUNC(lower_el_serr) #ifdef CONFIG_PLAT_TX2 eret @@ -240,3 +249,5 @@ BEGIN_FUNC(lower_el_serr) b invalid_vector_entry #endif END_FUNC(lower_el_serr) + +#endif diff --git a/src/arch/arm/config.cmake b/src/arch/arm/config.cmake index 781e6833..d8770f49 100644 --- a/src/arch/arm/config.cmake +++ b/src/arch/arm/config.cmake @@ -179,6 +179,17 @@ config_option( DEFAULT_DISABLED OFF ) +config_option( + KernelAArch64PassSError AARCH64_PASS_SERROR + "By default any SError interrupt will halt the kernel. \ +SErrors may be caused by e.g. writes to read-only device registers or ECC errors. \ +When this option is enabled SErrors will be reported as a VM data fault for the currently running task. \ +Because SError is asynchronous it is not assured that the faulting task caused the error. " + DEFAULT OFF + DEPENDS "KernelSel4ArchAarch64;NOT KernelVerificationBuild" +) +mark_as_advanced(KernelAArch64PassSError) + if(KernelAArch32FPUEnableContextSwitch OR KernelSel4ArchAarch64) set(KernelHaveFPU ON) endif()
Hello, On 2021-05-27 13:43, Indan Zupancic wrote:
I'll just create a Github account, that is the easiest solution long-term.
Seems like I already had an old Github account laying around. There is exists an issue about SError handling, so I just added a comment there instead of creating a new issue, see: https://github.com/seL4/seL4/issues/260 I also created a pull request: https://github.com/seL4/seL4/pull/369 Greetings, Indan
Hi Indan,
On 27 May 2021, at 00:51, Indan Zupancic
wrote: I have an Atlassian account, but it seems you can't attach patches there, so I'm not sure how else to send this patch, other than creating a Github account. What is the preferred way to submit patches as an outsider?
As Axel says, the preferred method would be a pull request on GitHub, which also puts the code discussion directly next to the code. It sounds like you might have already attached a patch to you message, but it didn't come through on the mailing list. This happened to me before as well -- I think the list is set up to strip attachments for security. If the GitHub option doesn't work for you, feel free to send me the patch directly and I'll raise a GitHub PR for you. For the Jira tracker, there should be an "Attach" button on the top left of the issue screen once an issue is created. That should allow you to attach a file. Does that button appear for you? (If not, there might be some configuration option we might be able to tweak) In terms of content: yes, halting the machine based on errors the user can trigger from user space is definitely not ideal, and we should do something about it. This case is at least somewhat constrained to users that have direct access to device memory, but if we can lock it down more, that would be good. Cheers, Gerwin
Hi guys,
just a note.
I understand that this is by design and, as pointed out, constrained to
users with access to device memory, but be aware that the media don't
understand (also don't care about) this kind of technical "details".
Moreover, based on my experience, I guess that potential attackers will go
hunting this kind of DoSs by abusing "by design" features (the same they
will target info leaks, side channels, etc). Expect a very high focus on
this kind of attacks.
Cheers,
El jue., 27 may. 2021 3:29, Gerwin Klein
Hi Indan,
On 27 May 2021, at 00:51, Indan Zupancic
wrote: I have an Atlassian account, but it seems you can't attach patches there, so I'm not sure how else to send this patch, other than creating a Github account. What is the preferred way to submit patches as an outsider?
As Axel says, the preferred method would be a pull request on GitHub, which also puts the code discussion directly next to the code.
It sounds like you might have already attached a patch to you message, but it didn't come through on the mailing list. This happened to me before as well -- I think the list is set up to strip attachments for security. If the GitHub option doesn't work for you, feel free to send me the patch directly and I'll raise a GitHub PR for you.
For the Jira tracker, there should be an "Attach" button on the top left of the issue screen once an issue is created. That should allow you to attach a file. Does that button appear for you? (If not, there might be some configuration option we might be able to tweak)
In terms of content: yes, halting the machine based on errors the user can trigger from user space is definitely not ideal, and we should do something about it. This case is at least somewhat constrained to users that have direct access to device memory, but if we can lock it down more, that would be good.
Cheers, Gerwin
_______________________________________________ Devel mailing list -- devel@sel4.systems To unsubscribe send an email to devel-leave@sel4.systems
participants (4)
-
Axel Heider
-
Gerwin Klein
-
Hugo V.C.
-
Indan Zupancic