x86 seL4 kernel bug - kernel page faults on seL4_VMEnter
Hello,
during some experimental work on Genode/seL4 and VM extensions, I
encountered the issue to trigger a kernel fault.
The symptom seems, that if a thread invokes seL4_VMEnter, but the actual
thread is not meant/supposed to run a VM, the kernel will (ever?/may?)
page fault with a nullpointer access - which effectively renders the
whole Genode system unusable.
A partial log is attached based on Genode on x86_64 with seL4 kernel
version 9.0.1.
The fault address points to [0], where the variable _expected_vmcs_ is
null and the kernel de-references it. I put a message before this line
and print the value of the variable (as visible in the log below).
The seL4 10.0.0 code looks similar, so I presume the same issue there .
(if the code is not verified - which according to the roadmap is not the
case [1] ?)
Cheers,
Alex B.
[0]
https://github.com/seL4/seL4/blob/0dd40b6c43a290173ea7782b97afbbbddfa23b36/s...
[1] https://sel4.systems/Info/Roadmap/home.pml
--
Alexander Boettcher
Genode Labs
https://www.genode-labs.com - https://www.genode.org
Genode Labs GmbH - Amtsgericht Dresden - HRB 28424 - Sitz Dresden
Geschäftsführer: Dr.-Ing. Norman Feske, Christian Helmuth
diff --git a/src/arch/x86/object/vcpu.c b/src/arch/x86/object/vcpu.c
index cf74fa4..68a1dca 100644
--- a/src/arch/x86/object/vcpu.c
+++ b/src/arch/x86/object/vcpu.c
@@ -1540,6 +1540,7 @@ restoreVMCS(void)
}
#ifndef CONFIG_KERNEL_SKIM_WINDOW
+ userError("opps expected_vmcs=%p", expected_vmcs);
if (getCurrentCR3().words[0] != expected_vmcs->last_host_cr3) {
expected_vmcs->last_host_cr3 = getCurrentCR3().words[0];
vmwrite(VMX_HOST_CR3, getCurrentCR3().words[0]);
Boot config: parsing cmdline '/boot/sel4 disable_iommu'
Boot config: console_port = 0x1808
Boot config: debug_port = 0x1808
Boot config: disable_iommu = true
Detected 1 boot module(s):
module #0: start=0xfd58000 end=0xffff330 size=0x2a7330
name='/boot/image.elf'
Parsing GRUB physical memory map
Physical Memory Region from 0 size 89400 type 1
Physical Memory Region from 89400 size 16c00 type 2
Physical Memory Region from d2000 size 2000 type 2
Physical Memory Region from dc000 size 24000 type 2
Physical Memory Region from 100000 size bb17c000 type 1
Adding physical memory region 0x100000-0xbb27c000
Physical Memory Region from bb27c000 size 6000 type 2
Physical Memory Region from bb282000 size dd000 type 1
Adding physical memory region 0xbb282000-0xbb35f000
Physical Memory Region from bb35f000 size 12000 type 2
Physical Memory Region from bb371000 size 81000 type 4
Physical Memory Region from bb3f2000 size 1d000 type 2
Physical Memory Region from bb40f000 size 60000 type 1
Adding physical memory region 0xbb40f000-0xbb46f000
Physical Memory Region from bb46f000 size 1f9000 type 2
Physical Memory Region from bb668000 size 80000 type 4
Physical Memory Region from bb6e8000 size 27000 type 2
Physical Memory Region from bb70f000 size 8000 type 1
Adding physical memory region 0xbb70f000-0xbb717000
Physical Memory Region from bb717000 size 8000 type 2
Physical Memory Region from bb71f000 size 4c000 type 1
Adding physical memory region 0xbb71f000-0xbb76b000
Physical Memory Region from bb76b000 size c000 type 4
Physical Memory Region from bb777000 size 3000 type 3
Physical Memory Region from bb77a000 size 7000 type 4
Physical Memory Region from bb781000 size 1000 type 3
Physical Memory Region from bb782000 size 9000 type 4
Physical Memory Region from bb78b000 size 1000 type 3
Physical Memory Region from bb78c000 size 13000 type 4
Physical Memory Region from bb79f000 size 60000 type 3
Physical Memory Region from bb7ff000 size 1000 type 1
Adding physical memory region 0xbb7ff000-0xbb800000
Physical Memory Region from bb800000 size 800000 type 2
Physical Memory Region from bc000000 size 4000000 type 2
Physical Memory Region from e0000000 size 10000000 type 2
Physical Memory Region from feaff000 size 1000 type 2
Physical Memory Region from fec00000 size 10000 type 2
Physical Memory Region from fed00000 size 400 type 2
Physical Memory Region from fed1c000 size 4000 type 2
Physical Memory Region from fed20000 size 70000 type 2
Physical Memory Region from fee00000 size 1000 type 2
Physical Memory Region from ff000000 size 1000000 type 2
Physical Memory Region from 100000000 size 38000000 type 1
Adding physical memory region 0x100000000-0x138000000
Multiboot gave us no video information
ACPI: RSDP paddr=0xf68e0
ACPI: RSDP vaddr=0xf68e0
ACPI: RSDT paddr=0xbb7f07a2
ACPI: RSDT vaddr=0xbb7f07a2
***WARNING*** SKIM window not enabled, this machine is probably
vulernable to Meltdown (https://www.meltdownattack.com), consider enabling
Kernel loaded to: start=0x200000 end=0xaa5000 size=0x8a5000 entry=0x201209
ACPI: RSDT paddr=0xbb7f07a2
ACPI: RSDT vaddr=0xbb7f07a2
ACPI: FADT paddr=0xbb7f0a00
ACPI: FADT vaddr=0xbb7f0a00
ACPI: FADT flags=0xc2ad
ACPI: MADT paddr=0xbb7feb45
ACPI: MADT vaddr=0xbb7feb45
ACPI: MADT apic_addr=0xfee00000
ACPI: MADT flags=0x1
ACPI: MADT_APIC apic_id=0x0
ACPI: MADT_APIC apic_id=0x1
ACPI: MADT_APIC apic_id=0x4
ACPI: MADT_APIC apic_id=0x5
ACPI: MADT_IOAPIC ioapic_id=1 ioapic_addr=0xfec00000 gsib=0
ACPI: MADT_ISO bus=0 source=0 gsi=2 flags=0x0
ACPI: MADT_ISO bus=0 source=9 gsi=9 flags=0xd
ACPI: 4 CPU(s) detected
ELF-loading userland images from boot modules:
size=0x110f000 v_entry=0x2000018 v_start=0x2000000 v_end=0x310f000
p_start=0x10000000 p_end=0x1110f000
Moving loaded userland images to final location: from=0x10000000
to=0xaa5000 size=0x110f000
Starting node #0 with APIC ID 0
Mapping kernel window is done
Starting node #1 with APIC ID 1
Starting node #2 with APIC ID 4
Starting node #3 with APIC ID 5
Booting all finished, dropped to user space
virtual address layout of core:
overall [0000000000002000,00000000c0000000)
core image [0000000002000000,000000000310f000)
ipc buffer [000000000310f000,0000000003110000)
boot_info [0000000003110000,0000000003112000)
stack area [0000000040000000,0000000050000000)
boot module 'timer' (113344 bytes)
boot module 'test-vmm_sel4' (9912 bytes)
boot module 'config' (609 bytes)
boot module 'ld.lib.so' (937616 bytes)
boot module 'init' (352672 bytes)
Genode sculpt_vc-2-ge2fd5e5
2909 MiB RAM and 261141 caps assigned to init
...
<
Hi Alex,
Yes, this is indeed a bug and no, the x86_64 VM code is not currently verified.
The following snippet should fix it, and give the misbehaving thread an 'unknown syscall' fault.
diff --git a/src/arch/x86/c_traps.c b/src/arch/x86/c_traps.c
index f17e5473..857896dd 100644
--- a/src/arch/x86/c_traps.c
+++ b/src/arch/x86/c_traps.c
@@ -111,7 +111,7 @@ slowpath(syscall_t syscall)
{
#ifdef CONFIG_VTX
- if (syscall == SysVMEnter) {
+ if (syscall == SysVMEnter && NODE_STATE(ksCurThread)->tcbArch.tcbVCPU) {
vcpu_update_state_sysvmenter(NODE_STATE(ksCurThread)->tcbArch.tcbVCPU);
if (NODE_STATE(ksCurThread)->tcbBoundNotification && notification_ptr_get_state(NODE_STATE(ksCurThread)->tcbBoundNotification) == NtfnState_Active) {
completeSignal(NODE_STATE(ksCurThread)->tcbBoundNotification, NODE_STATE(ksCurThread));
I'll create a task to investigate deeper and add a test for this to our test suite and confirm that the fix works. We won't have time to do this in the next week though. It would be great if you can let me know if the patch works for you.
Cheers
Anna.
________________________________________
From: Devel
Hi, thanks for the quick reply. The patch indeed helps to avoid the kernel crash, however the error result on 32 and 64 bit varies. On 64bit I get the promised invalid syscall: [init -> test-vmm_utils] try vmenter ... Error: unexpected exception during fault 'seL4_Fault_UnknownSyscall' - thread 'ep' in pd 'init -> test-vmm_utils' stopped On 32bit it looks like a page fault, instead of a unknown syscall: [init -> test-vmm_utils] try vmenter ... no RM attachment (READ pf_addr=0x0 pf_ip=0xffffffe7 from pager_object: pd='init -> test-vmm_utils' thread='ep') Warning: page-fault, pager_object: pd='init -> test-vmm_utils' thread='ep' ip=0xffffffe7 pf-addr=0x0 Cheers, Alex. On 26.09.2018 03:42, Anna.Lyons@data61.csiro.au wrote:
Yes, this is indeed a bug and no, the x86_64 VM code is not currently verified.
The following snippet should fix it, and give the misbehaving thread an 'unknown syscall' fault.
diff --git a/src/arch/x86/c_traps.c b/src/arch/x86/c_traps.c index f17e5473..857896dd 100644 --- a/src/arch/x86/c_traps.c +++ b/src/arch/x86/c_traps.c @@ -111,7 +111,7 @@ slowpath(syscall_t syscall) {
#ifdef CONFIG_VTX - if (syscall == SysVMEnter) { + if (syscall == SysVMEnter && NODE_STATE(ksCurThread)->tcbArch.tcbVCPU) { vcpu_update_state_sysvmenter(NODE_STATE(ksCurThread)->tcbArch.tcbVCPU); if (NODE_STATE(ksCurThread)->tcbBoundNotification && notification_ptr_get_state(NODE_STATE(ksCurThread)->tcbBoundNotification) == NtfnState_Active) { completeSignal(NODE_STATE(ksCurThread)->tcbBoundNotification, NODE_STATE(ksCurThread));
I'll create a task to investigate deeper and add a test for this to our test suite and confirm that the fix works. We won't have time to do this in the next week though. It would be great if you can let me know if the patch works for you.
Cheers Anna.
________________________________________ From: Devel
on behalf of Alexander Boettcher Sent: Wednesday, 26 September 2018 12:55 AM To: devel@sel4.systems Subject: [seL4] x86 seL4 kernel bug - kernel page faults on seL4_VMEnter Hello,
during some experimental work on Genode/seL4 and VM extensions, I encountered the issue to trigger a kernel fault.
The symptom seems, that if a thread invokes seL4_VMEnter, but the actual thread is not meant/supposed to run a VM, the kernel will (ever?/may?) page fault with a nullpointer access - which effectively renders the whole Genode system unusable.
A partial log is attached based on Genode on x86_64 with seL4 kernel version 9.0.1.
The fault address points to [0], where the variable _expected_vmcs_ is null and the kernel de-references it. I put a message before this line and print the value of the variable (as visible in the log below).
The seL4 10.0.0 code looks similar, so I presume the same issue there . (if the code is not verified - which according to the roadmap is not the case [1] ?)
Cheers,
Alex B.
[0] https://github.com/seL4/seL4/blob/0dd40b6c43a290173ea7782b97afbbbddfa23b36/s... [1] https://sel4.systems/Info/Roadmap/home.pml
-- Alexander Boettcher Genode Labs
https://www.genode-labs.com - https://www.genode.org
Genode Labs GmbH - Amtsgericht Dresden - HRB 28424 - Sitz Dresden Geschäftsführer: Dr.-Ing. Norman Feske, Christian Helmuth
diff --git a/src/arch/x86/object/vcpu.c b/src/arch/x86/object/vcpu.c index cf74fa4..68a1dca 100644 --- a/src/arch/x86/object/vcpu.c +++ b/src/arch/x86/object/vcpu.c @@ -1540,6 +1540,7 @@ restoreVMCS(void) }
#ifndef CONFIG_KERNEL_SKIM_WINDOW + userError("opps expected_vmcs=%p", expected_vmcs); if (getCurrentCR3().words[0] != expected_vmcs->last_host_cr3) { expected_vmcs->last_host_cr3 = getCurrentCR3().words[0]; vmwrite(VMX_HOST_CR3, getCurrentCR3().words[0]);
Boot config: parsing cmdline '/boot/sel4 disable_iommu' Boot config: console_port = 0x1808 Boot config: debug_port = 0x1808 Boot config: disable_iommu = true Detected 1 boot module(s): module #0: start=0xfd58000 end=0xffff330 size=0x2a7330 name='/boot/image.elf' Parsing GRUB physical memory map Physical Memory Region from 0 size 89400 type 1 Physical Memory Region from 89400 size 16c00 type 2 Physical Memory Region from d2000 size 2000 type 2 Physical Memory Region from dc000 size 24000 type 2 Physical Memory Region from 100000 size bb17c000 type 1 Adding physical memory region 0x100000-0xbb27c000 Physical Memory Region from bb27c000 size 6000 type 2 Physical Memory Region from bb282000 size dd000 type 1 Adding physical memory region 0xbb282000-0xbb35f000 Physical Memory Region from bb35f000 size 12000 type 2 Physical Memory Region from bb371000 size 81000 type 4 Physical Memory Region from bb3f2000 size 1d000 type 2 Physical Memory Region from bb40f000 size 60000 type 1 Adding physical memory region 0xbb40f000-0xbb46f000 Physical Memory Region from bb46f000 size 1f9000 type 2 Physical Memory Region from bb668000 size 80000 type 4 Physical Memory Region from bb6e8000 size 27000 type 2 Physical Memory Region from bb70f000 size 8000 type 1 Adding physical memory region 0xbb70f000-0xbb717000 Physical Memory Region from bb717000 size 8000 type 2 Physical Memory Region from bb71f000 size 4c000 type 1 Adding physical memory region 0xbb71f000-0xbb76b000 Physical Memory Region from bb76b000 size c000 type 4 Physical Memory Region from bb777000 size 3000 type 3 Physical Memory Region from bb77a000 size 7000 type 4 Physical Memory Region from bb781000 size 1000 type 3 Physical Memory Region from bb782000 size 9000 type 4 Physical Memory Region from bb78b000 size 1000 type 3 Physical Memory Region from bb78c000 size 13000 type 4 Physical Memory Region from bb79f000 size 60000 type 3 Physical Memory Region from bb7ff000 size 1000 type 1 Adding physical memory region 0xbb7ff000-0xbb800000 Physical Memory Region from bb800000 size 800000 type 2 Physical Memory Region from bc000000 size 4000000 type 2 Physical Memory Region from e0000000 size 10000000 type 2 Physical Memory Region from feaff000 size 1000 type 2 Physical Memory Region from fec00000 size 10000 type 2 Physical Memory Region from fed00000 size 400 type 2 Physical Memory Region from fed1c000 size 4000 type 2 Physical Memory Region from fed20000 size 70000 type 2 Physical Memory Region from fee00000 size 1000 type 2 Physical Memory Region from ff000000 size 1000000 type 2 Physical Memory Region from 100000000 size 38000000 type 1 Adding physical memory region 0x100000000-0x138000000 Multiboot gave us no video information ACPI: RSDP paddr=0xf68e0 ACPI: RSDP vaddr=0xf68e0 ACPI: RSDT paddr=0xbb7f07a2 ACPI: RSDT vaddr=0xbb7f07a2 ***WARNING*** SKIM window not enabled, this machine is probably vulernable to Meltdown (https://www.meltdownattack.com), consider enabling Kernel loaded to: start=0x200000 end=0xaa5000 size=0x8a5000 entry=0x201209 ACPI: RSDT paddr=0xbb7f07a2 ACPI: RSDT vaddr=0xbb7f07a2 ACPI: FADT paddr=0xbb7f0a00 ACPI: FADT vaddr=0xbb7f0a00 ACPI: FADT flags=0xc2ad ACPI: MADT paddr=0xbb7feb45 ACPI: MADT vaddr=0xbb7feb45 ACPI: MADT apic_addr=0xfee00000 ACPI: MADT flags=0x1 ACPI: MADT_APIC apic_id=0x0 ACPI: MADT_APIC apic_id=0x1 ACPI: MADT_APIC apic_id=0x4 ACPI: MADT_APIC apic_id=0x5 ACPI: MADT_IOAPIC ioapic_id=1 ioapic_addr=0xfec00000 gsib=0 ACPI: MADT_ISO bus=0 source=0 gsi=2 flags=0x0 ACPI: MADT_ISO bus=0 source=9 gsi=9 flags=0xd ACPI: 4 CPU(s) detected ELF-loading userland images from boot modules: size=0x110f000 v_entry=0x2000018 v_start=0x2000000 v_end=0x310f000 p_start=0x10000000 p_end=0x1110f000 Moving loaded userland images to final location: from=0x10000000 to=0xaa5000 size=0x110f000 Starting node #0 with APIC ID 0 Mapping kernel window is done Starting node #1 with APIC ID 1 Starting node #2 with APIC ID 4 Starting node #3 with APIC ID 5 Booting all finished, dropped to user space virtual address layout of core: overall [0000000000002000,00000000c0000000) core image [0000000002000000,000000000310f000) ipc buffer [000000000310f000,0000000003110000) boot_info [0000000003110000,0000000003112000) stack area [0000000040000000,0000000050000000) boot module 'timer' (113344 bytes) boot module 'test-vmm_sel4' (9912 bytes) boot module 'config' (609 bytes) boot module 'ld.lib.so' (937616 bytes) boot module 'init' (352672 bytes)
Genode sculpt_vc-2-ge2fd5e5 2909 MiB RAM and 261141 caps assigned to init
...
<
> ========== KERNEL EXCEPTION ========== Vector: 0xe ErrCode: 0x0 IP: 0xffffffff80835236 SP: 0xffffffff8087ef18 FLAGS: 0x10086 CR0: 0x8001003b CR2: 0x3298 (page-fault address) CR3: 0xba043000 (page-directory physical address) CR4: 0x6620
Stack Dump: *0xffffffff8087ef18 == 0x10000c9 *0xffffffff8087ef20 == 0xffffffff80a93000 *0xffffffff8087ef28 == 0xffffffff80890040 *0xffffffff8087ef30 == 0x40 *0xffffffff8087ef38 == 0xffffff80ba295400 *0xffffffff8087ef40 == 0xffffffff8087efb8 *0xffffffff8087ef48 == 0xffffffffffffffe7 *0xffffffff8087ef50 == 0xffffffff80a93000 *0xffffffff8087ef58 == 0xffffffff80a93000 *0xffffffff8087ef60 == 0x0 *0xffffffff8087ef68 == 0x0 *0xffffffff8087ef70 == 0xffffffff80890040 *0xffffffff8087ef78 == 0xffffffffffffffff *0xffffffff8087ef80 == 0xffffffff808357fe *0xffffffff8087ef88 == 0xffffffff80a93000 *0xffffffff8087ef90 == 0xffffffff80838c93 *0xffffffff8087ef98 == 0x1001d4 *0xffffffff8087efa0 == 0x0 *0xffffffff8087efa8 == 0xffffffffffffffe7 *0xffffffff8087efb0 == 0xffffffff80839982
Halting... halting... Kernel entry via Interrupt, irq 157
_______________________________________________ Devel mailing list Devel@sel4.systems https://sel4.systems/lists/listinfo/devel
-- Alexander Boettcher Genode Labs https://www.genode-labs.com - https://www.genode.org Genode Labs GmbH - Amtsgericht Dresden - HRB 28424 - Sitz Dresden Geschäftsführer: Dr.-Ing. Norman Feske, Christian Helmuth
participants (2)
-
Alexander Boettcher
-
Anna.Lyons@data61.csiro.au