Zynq UltraScale+ locks up after hours running
Hello, We are running seL4 microkernel on 4 cores Zynq UltraScale+ (zcu102 board). The implementation includes multiple processes, hypervisor and virtual machine running on dedicated core. After several hours running (it could be 2 or even 8 hours) the whole microkernel locks up. After some investigation I have found that no interrupts generated anymore - at least there is no interrupts coming to ISR. Inside ISR I have monitored PL2 Physical Timer Control register, which feeds a scheduler and didn't find any problems - it stays enabled and not masked. I will appreciate any idea/direction for approaching this problem. Thank you, Leonid ________________________________ This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc.
I will appreciate any idea/direction for approaching this problem.
Are you able to access a hardware debugger? At the point where the cores appear to lock up it's helpful to be able to see where all the threads are and what the hardware state is. Alternatively, are you able simulate your application with the zcu102 qemu platform?
I can connect JTAG, but I am not sure how I can attach it to the running system. seL4 project it pretty big and I don't know how I can load it to the board through JTAG. I can run this project on zcu102. It seems to me that zcu102 is running a little bit faster than my target device. The system is also locked from time to time but with lower rate. Thank you, Leonid -----Original Message----- From: Mcleod, Kent (Data61, Kensington NSW) <Kent.Mcleod@data61.csiro.au> Sent: Wednesday, October 2, 2019 11:50 PM To: Leonid Meyerovich <lmeyerovich@i-a-i.com>; devel@sel4.systems Subject: Re: [seL4] Zynq UltraScale+ locks up after hours running
I will appreciate any idea/direction for approaching this problem.
Are you able to access a hardware debugger? At the point where the cores appear to lock up it's helpful to be able to see where all the threads are and what the hardware state is. Alternatively, are you able simulate your application with the zcu102 qemu platform? ________________________________ This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc.
Hi Leonid, Could you provide a bit more about your software configuration? For instance, do you have multiple VMs running on dedicated hardware cores? How are the VM and processes configured? Also, you mean there were no interrupts at all on all the four cores? Regards, Yanyan On Wed, 2019-10-02 at 16:01 +0000, Leonid Meyerovich wrote:
Hello,
We are running seL4 microkernel on 4 cores Zynq UltraScale+ (zcu102 board). The implementation includes multiple processes, hypervisor and virtual machine running on dedicated core. After several hours running (it could be 2 or even 8 hours) the whole microkernel locks up. After some investigation I have found that no interrupts generated anymore - at least there is no interrupts coming to ISR. Inside ISR I have monitored PL2 Physical Timer Control register, which feeds a scheduler and didn't find any problems - it stays enabled and not masked.
I will appreciate any idea/direction for approaching this problem.
Thank you,
Leonid
This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc. _______________________________________________ Devel mailing list Devel@sel4.systems https://sel4.systems/lists/listinfo/devel
Hi Yanyan, I am running initial task and hypervisor on core 0. Hypervisor creates 2 VM and running them on core 1 and 2 Core 3 runs 7 processes that communicate through notification objects and shared memory (in pairs) On process on core 3 implements UART based connection (, this is PL uart, Rx use interrupt). On of core 3 process also runs SADA driver (also uses interrupt) VMs communicate to the rest of the system through 'virtual channels' - exceptions and shared memory. All hardware interrupts are processed by core 0 (please, correct me if I am wrong). But as far as I understand PL2 physical timer interrupt runs on every core. Every processes prints some messages on terminal. I have never seen that these messages have been printed partially what the system lock up, process completes printing the message and has never scheduled again. I have also printed some messages from inside of ISR after getting time interrupt (print interrupt counter once a second) and I don't see these messages when the system locks up. Thank you, Leonid -----Original Message----- From: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au> Sent: Thursday, October 3, 2019 2:44 AM To: Leonid Meyerovich <lmeyerovich@i-a-i.com>; devel@sel4.systems Subject: Re: [seL4] Zynq UltraScale+ locks up after hours running Hi Leonid, Could you provide a bit more about your software configuration? For instance, do you have multiple VMs running on dedicated hardware cores? How are the VM and processes configured? Also, you mean there were no interrupts at all on all the four cores? Regards, Yanyan On Wed, 2019-10-02 at 16:01 +0000, Leonid Meyerovich wrote:
Hello,
We are running seL4 microkernel on 4 cores Zynq UltraScale+ (zcu102 board). The implementation includes multiple processes, hypervisor and virtual machine running on dedicated core. After several hours running (it could be 2 or even 8 hours) the whole microkernel locks up. After some investigation I have found that no interrupts generated anymore - at least there is no interrupts coming to ISR. Inside ISR I have monitored PL2 Physical Timer Control register, which feeds a scheduler and didn't find any problems - it stays enabled and not masked.
I will appreciate any idea/direction for approaching this problem.
Thank you,
Leonid
This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc. _______________________________________________ Devel mailing list Devel@sel4.systems https://sel4.systems/lists/listinfo/devel
________________________________ This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc.
Hi Leonid, What do you mean by "hypervisor on core 0"? Do you mean the VMM? I assume you create a VMM for each VM running, and also pin the VMMs on the corresponding physical cores? If so, core 1 and core 2 also should receive virtual timer interrupts and VGIC maintenance interrupts. Is it possible that you stop running the processes on core 3 and just keep running the VMs on different cores? Do the VMs have any accesses to physical hardware, for instance, clocks or watchdogs? Regards, Yanyan -----Original Message----- From: Leonid Meyerovich <lmeyerovich@i-a-i.com> Sent: Thursday, 3 October 2019 8:45 PM To: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Yanyan, I am running initial task and hypervisor on core 0. Hypervisor creates 2 VM and running them on core 1 and 2 Core 3 runs 7 processes that communicate through notification objects and shared memory (in pairs) On process on core 3 implements UART based connection (, this is PL uart, Rx use interrupt). On of core 3 process also runs SADA driver (also uses interrupt) VMs communicate to the rest of the system through 'virtual channels' - exceptions and shared memory. All hardware interrupts are processed by core 0 (please, correct me if I am wrong). But as far as I understand PL2 physical timer interrupt runs on every core. Every processes prints some messages on terminal. I have never seen that these messages have been printed partially what the system lock up, process completes printing the message and has never scheduled again. I have also printed some messages from inside of ISR after getting time interrupt (print interrupt counter once a second) and I don't see these messages when the system locks up. Thank you, Leonid -----Original Message----- From: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au> Sent: Thursday, October 3, 2019 2:44 AM To: Leonid Meyerovich <lmeyerovich@i-a-i.com>; devel@sel4.systems Subject: Re: [seL4] Zynq UltraScale+ locks up after hours running Hi Leonid, Could you provide a bit more about your software configuration? For instance, do you have multiple VMs running on dedicated hardware cores? How are the VM and processes configured? Also, you mean there were no interrupts at all on all the four cores? Regards, Yanyan On Wed, 2019-10-02 at 16:01 +0000, Leonid Meyerovich wrote:
Hello,
We are running seL4 microkernel on 4 cores Zynq UltraScale+ (zcu102 board). The implementation includes multiple processes, hypervisor and virtual machine running on dedicated core. After several hours running (it could be 2 or even 8 hours) the whole microkernel locks up. After some investigation I have found that no interrupts generated anymore - at least there is no interrupts coming to ISR. Inside ISR I have monitored PL2 Physical Timer Control register, which feeds a scheduler and didn't find any problems - it stays enabled and not masked.
I will appreciate any idea/direction for approaching this problem.
Thank you,
Leonid
This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc. _______________________________________________ Devel mailing list Devel@sel4.systems https://sel4.systems/lists/listinfo/devel
________________________________ This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc.
Hi Yanyan, Yes, VMM is running on core 0, creates 2 VM and run them on core 1 and core2. VM don't have any access to hardware and they receive virtual timer and maintenance interrupts. I am not sure I can stop running all processes on core 3, I should think about it, but in this case the whole condition will change. Also I probably didn't mention that R5 also runs an application and communicates to the application, which is running on core3 through openAmp. Thanks, Leonid -----Original Message----- From: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au> Sent: Friday, October 4, 2019 8:16 AM To: Leonid Meyerovich <lmeyerovich@i-a-i.com>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Leonid, What do you mean by "hypervisor on core 0"? Do you mean the VMM? I assume you create a VMM for each VM running, and also pin the VMMs on the corresponding physical cores? If so, core 1 and core 2 also should receive virtual timer interrupts and VGIC maintenance interrupts. Is it possible that you stop running the processes on core 3 and just keep running the VMs on different cores? Do the VMs have any accesses to physical hardware, for instance, clocks or watchdogs? Regards, Yanyan -----Original Message----- From: Leonid Meyerovich <lmeyerovich@i-a-i.com> Sent: Thursday, 3 October 2019 8:45 PM To: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Yanyan, I am running initial task and hypervisor on core 0. Hypervisor creates 2 VM and running them on core 1 and 2 Core 3 runs 7 processes that communicate through notification objects and shared memory (in pairs) On process on core 3 implements UART based connection (, this is PL uart, Rx use interrupt). On of core 3 process also runs SADA driver (also uses interrupt) VMs communicate to the rest of the system through 'virtual channels' - exceptions and shared memory. All hardware interrupts are processed by core 0 (please, correct me if I am wrong). But as far as I understand PL2 physical timer interrupt runs on every core. Every processes prints some messages on terminal. I have never seen that these messages have been printed partially what the system lock up, process completes printing the message and has never scheduled again. I have also printed some messages from inside of ISR after getting time interrupt (print interrupt counter once a second) and I don't see these messages when the system locks up. Thank you, Leonid -----Original Message----- From: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au> Sent: Thursday, October 3, 2019 2:44 AM To: Leonid Meyerovich <lmeyerovich@i-a-i.com>; devel@sel4.systems Subject: Re: [seL4] Zynq UltraScale+ locks up after hours running Hi Leonid, Could you provide a bit more about your software configuration? For instance, do you have multiple VMs running on dedicated hardware cores? How are the VM and processes configured? Also, you mean there were no interrupts at all on all the four cores? Regards, Yanyan On Wed, 2019-10-02 at 16:01 +0000, Leonid Meyerovich wrote:
Hello,
We are running seL4 microkernel on 4 cores Zynq UltraScale+ (zcu102 board). The implementation includes multiple processes, hypervisor and virtual machine running on dedicated core. After several hours running (it could be 2 or even 8 hours) the whole microkernel locks up. After some investigation I have found that no interrupts generated anymore - at least there is no interrupts coming to ISR. Inside ISR I have monitored PL2 Physical Timer Control register, which feeds a scheduler and didn't find any problems - it stays enabled and not masked.
I will appreciate any idea/direction for approaching this problem.
Thank you,
Leonid
This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc. _______________________________________________ Devel mailing list Devel@sel4.systems https://sel4.systems/lists/listinfo/devel
________________________________ This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc. ________________________________ This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc.
Hi Leonid, I forgot to ask, what seL4 version are you using? Is it from the Data61 Github repo? If you also use CAmkES, which version is it? It would be helpful if you could isolate the 2 VMs, so we could have a smaller system to understand. Based on your description, my understanding is you use one VMM running on core 0 to manage two VMs running on core 1 and core 2? R5 is the always-on core? What does the application do? Regards, Yanyan -----Original Message----- From: Leonid Meyerovich <lmeyerovich@i-a-i.com> Sent: Friday, 4 October 2019 8:26 PM To: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Yanyan, Yes, VMM is running on core 0, creates 2 VM and run them on core 1 and core2. VM don't have any access to hardware and they receive virtual timer and maintenance interrupts. I am not sure I can stop running all processes on core 3, I should think about it, but in this case the whole condition will change. Also I probably didn't mention that R5 also runs an application and communicates to the application, which is running on core3 through openAmp. Thanks, Leonid -----Original Message----- From: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au> Sent: Friday, October 4, 2019 8:16 AM To: Leonid Meyerovich <lmeyerovich@i-a-i.com>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Leonid, What do you mean by "hypervisor on core 0"? Do you mean the VMM? I assume you create a VMM for each VM running, and also pin the VMMs on the corresponding physical cores? If so, core 1 and core 2 also should receive virtual timer interrupts and VGIC maintenance interrupts. Is it possible that you stop running the processes on core 3 and just keep running the VMs on different cores? Do the VMs have any accesses to physical hardware, for instance, clocks or watchdogs? Regards, Yanyan -----Original Message----- From: Leonid Meyerovich <lmeyerovich@i-a-i.com> Sent: Thursday, 3 October 2019 8:45 PM To: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Yanyan, I am running initial task and hypervisor on core 0. Hypervisor creates 2 VM and running them on core 1 and 2 Core 3 runs 7 processes that communicate through notification objects and shared memory (in pairs) On process on core 3 implements UART based connection (, this is PL uart, Rx use interrupt). On of core 3 process also runs SADA driver (also uses interrupt) VMs communicate to the rest of the system through 'virtual channels' - exceptions and shared memory. All hardware interrupts are processed by core 0 (please, correct me if I am wrong). But as far as I understand PL2 physical timer interrupt runs on every core. Every processes prints some messages on terminal. I have never seen that these messages have been printed partially what the system lock up, process completes printing the message and has never scheduled again. I have also printed some messages from inside of ISR after getting time interrupt (print interrupt counter once a second) and I don't see these messages when the system locks up. Thank you, Leonid -----Original Message----- From: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au> Sent: Thursday, October 3, 2019 2:44 AM To: Leonid Meyerovich <lmeyerovich@i-a-i.com>; devel@sel4.systems Subject: Re: [seL4] Zynq UltraScale+ locks up after hours running Hi Leonid, Could you provide a bit more about your software configuration? For instance, do you have multiple VMs running on dedicated hardware cores? How are the VM and processes configured? Also, you mean there were no interrupts at all on all the four cores? Regards, Yanyan On Wed, 2019-10-02 at 16:01 +0000, Leonid Meyerovich wrote:
Hello,
We are running seL4 microkernel on 4 cores Zynq UltraScale+ (zcu102 board). The implementation includes multiple processes, hypervisor and virtual machine running on dedicated core. After several hours running (it could be 2 or even 8 hours) the whole microkernel locks up. After some investigation I have found that no interrupts generated anymore - at least there is no interrupts coming to ISR. Inside ISR I have monitored PL2 Physical Timer Control register, which feeds a scheduler and didn't find any problems - it stays enabled and not masked.
I will appreciate any idea/direction for approaching this problem.
Thank you,
Leonid
This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc. _______________________________________________ Devel mailing list Devel@sel4.systems https://sel4.systems/lists/listinfo/devel
________________________________ This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc. ________________________________ This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc.
Hi Yanyan, I am running seL4 version 8.0.0.0. Core 3 is running 7 processes which intensively communicate to each other, sending a messages. One of this process sends/receives a lot of message to both VMs ("virtual channels"), another sends/receives a lot messages (~1000 per sec) to R5 (openAmp). VMM is running on core 0 and VMs are running on core 1 and 2. Thanks, Leonid -----Original Message----- From: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au> Sent: Friday, October 4, 2019 8:58 AM To: Leonid Meyerovich <lmeyerovich@i-a-i.com>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Leonid, I forgot to ask, what seL4 version are you using? Is it from the Data61 Github repo? If you also use CAmkES, which version is it? It would be helpful if you could isolate the 2 VMs, so we could have a smaller system to understand. Based on your description, my understanding is you use one VMM running on core 0 to manage two VMs running on core 1 and core 2? R5 is the always-on core? What does the application do? Regards, Yanyan -----Original Message----- From: Leonid Meyerovich <lmeyerovich@i-a-i.com> Sent: Friday, 4 October 2019 8:26 PM To: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Yanyan, Yes, VMM is running on core 0, creates 2 VM and run them on core 1 and core2. VM don't have any access to hardware and they receive virtual timer and maintenance interrupts. I am not sure I can stop running all processes on core 3, I should think about it, but in this case the whole condition will change. Also I probably didn't mention that R5 also runs an application and communicates to the application, which is running on core3 through openAmp. Thanks, Leonid -----Original Message----- From: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au> Sent: Friday, October 4, 2019 8:16 AM To: Leonid Meyerovich <lmeyerovich@i-a-i.com>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Leonid, What do you mean by "hypervisor on core 0"? Do you mean the VMM? I assume you create a VMM for each VM running, and also pin the VMMs on the corresponding physical cores? If so, core 1 and core 2 also should receive virtual timer interrupts and VGIC maintenance interrupts. Is it possible that you stop running the processes on core 3 and just keep running the VMs on different cores? Do the VMs have any accesses to physical hardware, for instance, clocks or watchdogs? Regards, Yanyan -----Original Message----- From: Leonid Meyerovich <lmeyerovich@i-a-i.com> Sent: Thursday, 3 October 2019 8:45 PM To: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Yanyan, I am running initial task and hypervisor on core 0. Hypervisor creates 2 VM and running them on core 1 and 2 Core 3 runs 7 processes that communicate through notification objects and shared memory (in pairs) On process on core 3 implements UART based connection (, this is PL uart, Rx use interrupt). On of core 3 process also runs SADA driver (also uses interrupt) VMs communicate to the rest of the system through 'virtual channels' - exceptions and shared memory. All hardware interrupts are processed by core 0 (please, correct me if I am wrong). But as far as I understand PL2 physical timer interrupt runs on every core. Every processes prints some messages on terminal. I have never seen that these messages have been printed partially what the system lock up, process completes printing the message and has never scheduled again. I have also printed some messages from inside of ISR after getting time interrupt (print interrupt counter once a second) and I don't see these messages when the system locks up. Thank you, Leonid -----Original Message----- From: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au> Sent: Thursday, October 3, 2019 2:44 AM To: Leonid Meyerovich <lmeyerovich@i-a-i.com>; devel@sel4.systems Subject: Re: [seL4] Zynq UltraScale+ locks up after hours running Hi Leonid, Could you provide a bit more about your software configuration? For instance, do you have multiple VMs running on dedicated hardware cores? How are the VM and processes configured? Also, you mean there were no interrupts at all on all the four cores? Regards, Yanyan On Wed, 2019-10-02 at 16:01 +0000, Leonid Meyerovich wrote:
Hello,
We are running seL4 microkernel on 4 cores Zynq UltraScale+ (zcu102 board). The implementation includes multiple processes, hypervisor and virtual machine running on dedicated core. After several hours running (it could be 2 or even 8 hours) the whole microkernel locks up. After some investigation I have found that no interrupts generated anymore - at least there is no interrupts coming to ISR. Inside ISR I have monitored PL2 Physical Timer Control register, which feeds a scheduler and didn't find any problems - it stays enabled and not masked.
I will appreciate any idea/direction for approaching this problem.
Thank you,
Leonid
This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc. _______________________________________________ Devel mailing list Devel@sel4.systems https://sel4.systems/lists/listinfo/devel
________________________________ This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc. ________________________________ This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc. ________________________________ This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc.
Hi Leonid, It looks like the seL4 v8.0.0 does not include Armv8 virtualization extension support yet ... So basically you wrote the VMM based on the libsel4armvmm? I am a bit confused by the interrupt log you provided. It looks like the log output depends on getting kernel timer interrupts. case IRQTimer: timerTick(); resetTimer(); if ((++tickCount % 200) == 0) { printInt(0); printf("- - - \n"); printInt(1); printf("- - - \n"); printInt(2); printf("- - - \n"); printInt(3); printf("- - - - - - - \n"); cleanIntStat(); } break; ---irq stat: cpu=0 (2) 26_50_50 0_258_258 - - - ---irq stat: cpu=1 (2) 26_50_50 0_258_258 - - - ---irq stat: cpu=2 (2) 26_50_50 0_258_258 - - - ---irq stat: cpu=3 (3) 26_50_50 65_1_1 59_1_1 - - - - - - - ---irq stat: cpu=0 (2) 26_50_50 0_257_257 - - - ---irq stat: cpu=1 (2) 26_50_50 0_280_280 - - - ---irq stat: cpu=2 (2) 26_50_50 0_280_280 I am a bit confused, and correct me if I am wrong. If you do not get kernel timer interrupts, how could you get the output above? I guess at least one core got kernel timer interrupts to get the output? Regards, Yanyan -----Original Message----- From: Leonid Meyerovich <lmeyerovich@i-a-i.com> Sent: Friday, 4 October 2019 9:11 PM To: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Yanyan, I am running seL4 version 8.0.0.0. Core 3 is running 7 processes which intensively communicate to each other, sending a messages. One of this process sends/receives a lot of message to both VMs ("virtual channels"), another sends/receives a lot messages (~1000 per sec) to R5 (openAmp). VMM is running on core 0 and VMs are running on core 1 and 2. Thanks, Leonid -----Original Message----- From: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au> Sent: Friday, October 4, 2019 8:58 AM To: Leonid Meyerovich <lmeyerovich@i-a-i.com>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Leonid, I forgot to ask, what seL4 version are you using? Is it from the Data61 Github repo? If you also use CAmkES, which version is it? It would be helpful if you could isolate the 2 VMs, so we could have a smaller system to understand. Based on your description, my understanding is you use one VMM running on core 0 to manage two VMs running on core 1 and core 2? R5 is the always-on core? What does the application do? Regards, Yanyan -----Original Message----- From: Leonid Meyerovich <lmeyerovich@i-a-i.com> Sent: Friday, 4 October 2019 8:26 PM To: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Yanyan, Yes, VMM is running on core 0, creates 2 VM and run them on core 1 and core2. VM don't have any access to hardware and they receive virtual timer and maintenance interrupts. I am not sure I can stop running all processes on core 3, I should think about it, but in this case the whole condition will change. Also I probably didn't mention that R5 also runs an application and communicates to the application, which is running on core3 through openAmp. Thanks, Leonid -----Original Message----- From: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au> Sent: Friday, October 4, 2019 8:16 AM To: Leonid Meyerovich <lmeyerovich@i-a-i.com>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Leonid, What do you mean by "hypervisor on core 0"? Do you mean the VMM? I assume you create a VMM for each VM running, and also pin the VMMs on the corresponding physical cores? If so, core 1 and core 2 also should receive virtual timer interrupts and VGIC maintenance interrupts. Is it possible that you stop running the processes on core 3 and just keep running the VMs on different cores? Do the VMs have any accesses to physical hardware, for instance, clocks or watchdogs? Regards, Yanyan -----Original Message----- From: Leonid Meyerovich <lmeyerovich@i-a-i.com> Sent: Thursday, 3 October 2019 8:45 PM To: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Yanyan, I am running initial task and hypervisor on core 0. Hypervisor creates 2 VM and running them on core 1 and 2 Core 3 runs 7 processes that communicate through notification objects and shared memory (in pairs) On process on core 3 implements UART based connection (, this is PL uart, Rx use interrupt). On of core 3 process also runs SADA driver (also uses interrupt) VMs communicate to the rest of the system through 'virtual channels' - exceptions and shared memory. All hardware interrupts are processed by core 0 (please, correct me if I am wrong). But as far as I understand PL2 physical timer interrupt runs on every core. Every processes prints some messages on terminal. I have never seen that these messages have been printed partially what the system lock up, process completes printing the message and has never scheduled again. I have also printed some messages from inside of ISR after getting time interrupt (print interrupt counter once a second) and I don't see these messages when the system locks up. Thank you, Leonid -----Original Message----- From: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au> Sent: Thursday, October 3, 2019 2:44 AM To: Leonid Meyerovich <lmeyerovich@i-a-i.com>; devel@sel4.systems Subject: Re: [seL4] Zynq UltraScale+ locks up after hours running Hi Leonid, Could you provide a bit more about your software configuration? For instance, do you have multiple VMs running on dedicated hardware cores? How are the VM and processes configured? Also, you mean there were no interrupts at all on all the four cores? Regards, Yanyan On Wed, 2019-10-02 at 16:01 +0000, Leonid Meyerovich wrote:
Hello,
We are running seL4 microkernel on 4 cores Zynq UltraScale+ (zcu102 board). The implementation includes multiple processes, hypervisor and virtual machine running on dedicated core. After several hours running (it could be 2 or even 8 hours) the whole microkernel locks up. After some investigation I have found that no interrupts generated anymore - at least there is no interrupts coming to ISR. Inside ISR I have monitored PL2 Physical Timer Control register, which feeds a scheduler and didn't find any problems - it stays enabled and not masked.
I will appreciate any idea/direction for approaching this problem.
Thank you,
Leonid
This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc. _______________________________________________ Devel mailing list Devel@sel4.systems https://sel4.systems/lists/listinfo/devel
________________________________ This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc. ________________________________ This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc. ________________________________ This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc.
Hi Yanyan, Yes, our VMM is based on the libsel4armvmm. My prints is implemented in ISR and it is printing until the system receives timer interrupt. That is how I understand that I don't have a timer interrupt anymore. Usually I print from timer interrupt every second (the log below was implemented for different purpose). BTW, kernel is running on every core, how it is possible that all cores stop receiving a timer interrupt (actually not just timer, but also IPI)? Thanks, Leonid -----Original Message----- From: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au> Sent: Friday, October 4, 2019 9:49 AM To: Leonid Meyerovich <lmeyerovich@i-a-i.com>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Leonid, It looks like the seL4 v8.0.0 does not include Armv8 virtualization extension support yet ... So basically you wrote the VMM based on the libsel4armvmm? I am a bit confused by the interrupt log you provided. It looks like the log output depends on getting kernel timer interrupts. case IRQTimer: timerTick(); resetTimer(); if ((++tickCount % 200) == 0) { printInt(0); printf("- - - \n"); printInt(1); printf("- - - \n"); printInt(2); printf("- - - \n"); printInt(3); printf("- - - - - - - \n"); cleanIntStat(); } break; ---irq stat: cpu=0 (2) 26_50_50 0_258_258 - - - ---irq stat: cpu=1 (2) 26_50_50 0_258_258 - - - ---irq stat: cpu=2 (2) 26_50_50 0_258_258 - - - ---irq stat: cpu=3 (3) 26_50_50 65_1_1 59_1_1 - - - - - - - ---irq stat: cpu=0 (2) 26_50_50 0_257_257 - - - ---irq stat: cpu=1 (2) 26_50_50 0_280_280 - - - ---irq stat: cpu=2 (2) 26_50_50 0_280_280 I am a bit confused, and correct me if I am wrong. If you do not get kernel timer interrupts, how could you get the output above? I guess at least one core got kernel timer interrupts to get the output? Regards, Yanyan -----Original Message----- From: Leonid Meyerovich <lmeyerovich@i-a-i.com> Sent: Friday, 4 October 2019 9:11 PM To: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Yanyan, I am running seL4 version 8.0.0.0. Core 3 is running 7 processes which intensively communicate to each other, sending a messages. One of this process sends/receives a lot of message to both VMs ("virtual channels"), another sends/receives a lot messages (~1000 per sec) to R5 (openAmp). VMM is running on core 0 and VMs are running on core 1 and 2. Thanks, Leonid -----Original Message----- From: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au> Sent: Friday, October 4, 2019 8:58 AM To: Leonid Meyerovich <lmeyerovich@i-a-i.com>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Leonid, I forgot to ask, what seL4 version are you using? Is it from the Data61 Github repo? If you also use CAmkES, which version is it? It would be helpful if you could isolate the 2 VMs, so we could have a smaller system to understand. Based on your description, my understanding is you use one VMM running on core 0 to manage two VMs running on core 1 and core 2? R5 is the always-on core? What does the application do? Regards, Yanyan -----Original Message----- From: Leonid Meyerovich <lmeyerovich@i-a-i.com> Sent: Friday, 4 October 2019 8:26 PM To: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Yanyan, Yes, VMM is running on core 0, creates 2 VM and run them on core 1 and core2. VM don't have any access to hardware and they receive virtual timer and maintenance interrupts. I am not sure I can stop running all processes on core 3, I should think about it, but in this case the whole condition will change. Also I probably didn't mention that R5 also runs an application and communicates to the application, which is running on core3 through openAmp. Thanks, Leonid -----Original Message----- From: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au> Sent: Friday, October 4, 2019 8:16 AM To: Leonid Meyerovich <lmeyerovich@i-a-i.com>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Leonid, What do you mean by "hypervisor on core 0"? Do you mean the VMM? I assume you create a VMM for each VM running, and also pin the VMMs on the corresponding physical cores? If so, core 1 and core 2 also should receive virtual timer interrupts and VGIC maintenance interrupts. Is it possible that you stop running the processes on core 3 and just keep running the VMs on different cores? Do the VMs have any accesses to physical hardware, for instance, clocks or watchdogs? Regards, Yanyan -----Original Message----- From: Leonid Meyerovich <lmeyerovich@i-a-i.com> Sent: Thursday, 3 October 2019 8:45 PM To: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Yanyan, I am running initial task and hypervisor on core 0. Hypervisor creates 2 VM and running them on core 1 and 2 Core 3 runs 7 processes that communicate through notification objects and shared memory (in pairs) On process on core 3 implements UART based connection (, this is PL uart, Rx use interrupt). On of core 3 process also runs SADA driver (also uses interrupt) VMs communicate to the rest of the system through 'virtual channels' - exceptions and shared memory. All hardware interrupts are processed by core 0 (please, correct me if I am wrong). But as far as I understand PL2 physical timer interrupt runs on every core. Every processes prints some messages on terminal. I have never seen that these messages have been printed partially what the system lock up, process completes printing the message and has never scheduled again. I have also printed some messages from inside of ISR after getting time interrupt (print interrupt counter once a second) and I don't see these messages when the system locks up. Thank you, Leonid -----Original Message----- From: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au> Sent: Thursday, October 3, 2019 2:44 AM To: Leonid Meyerovich <lmeyerovich@i-a-i.com>; devel@sel4.systems Subject: Re: [seL4] Zynq UltraScale+ locks up after hours running Hi Leonid, Could you provide a bit more about your software configuration? For instance, do you have multiple VMs running on dedicated hardware cores? How are the VM and processes configured? Also, you mean there were no interrupts at all on all the four cores? Regards, Yanyan On Wed, 2019-10-02 at 16:01 +0000, Leonid Meyerovich wrote:
Hello,
We are running seL4 microkernel on 4 cores Zynq UltraScale+ (zcu102 board). The implementation includes multiple processes, hypervisor and virtual machine running on dedicated core. After several hours running (it could be 2 or even 8 hours) the whole microkernel locks up. After some investigation I have found that no interrupts generated anymore - at least there is no interrupts coming to ISR. Inside ISR I have monitored PL2 Physical Timer Control register, which feeds a scheduler and didn't find any problems - it stays enabled and not masked.
I will appreciate any idea/direction for approaching this problem.
Thank you,
Leonid
This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc. _______________________________________________ Devel mailing list Devel@sel4.systems https://sel4.systems/lists/listinfo/devel
________________________________ This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc. ________________________________ This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc. ________________________________ This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc. ________________________________ This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc.
HI Leonid, For the big lock kernel, a possibility is that one core is in kernel mode, and for some reasons, it does not release the kernel big lock. Other cores keep trying to grab the lock when they need to handle interrupts or system calls, but they fail because the lock is not released. Regards, Yanyan -----Original Message----- From: Leonid Meyerovich <lmeyerovich@i-a-i.com> Sent: Friday, 4 October 2019 10:00 PM To: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Yanyan, Yes, our VMM is based on the libsel4armvmm. My prints is implemented in ISR and it is printing until the system receives timer interrupt. That is how I understand that I don't have a timer interrupt anymore. Usually I print from timer interrupt every second (the log below was implemented for different purpose). BTW, kernel is running on every core, how it is possible that all cores stop receiving a timer interrupt (actually not just timer, but also IPI)? Thanks, Leonid -----Original Message----- From: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au> Sent: Friday, October 4, 2019 9:49 AM To: Leonid Meyerovich <lmeyerovich@i-a-i.com>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Leonid, It looks like the seL4 v8.0.0 does not include Armv8 virtualization extension support yet ... So basically you wrote the VMM based on the libsel4armvmm? I am a bit confused by the interrupt log you provided. It looks like the log output depends on getting kernel timer interrupts. case IRQTimer: timerTick(); resetTimer(); if ((++tickCount % 200) == 0) { printInt(0); printf("- - - \n"); printInt(1); printf("- - - \n"); printInt(2); printf("- - - \n"); printInt(3); printf("- - - - - - - \n"); cleanIntStat(); } break; ---irq stat: cpu=0 (2) 26_50_50 0_258_258 - - - ---irq stat: cpu=1 (2) 26_50_50 0_258_258 - - - ---irq stat: cpu=2 (2) 26_50_50 0_258_258 - - - ---irq stat: cpu=3 (3) 26_50_50 65_1_1 59_1_1 - - - - - - - ---irq stat: cpu=0 (2) 26_50_50 0_257_257 - - - ---irq stat: cpu=1 (2) 26_50_50 0_280_280 - - - ---irq stat: cpu=2 (2) 26_50_50 0_280_280 I am a bit confused, and correct me if I am wrong. If you do not get kernel timer interrupts, how could you get the output above? I guess at least one core got kernel timer interrupts to get the output? Regards, Yanyan -----Original Message----- From: Leonid Meyerovich <lmeyerovich@i-a-i.com> Sent: Friday, 4 October 2019 9:11 PM To: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Yanyan, I am running seL4 version 8.0.0.0. Core 3 is running 7 processes which intensively communicate to each other, sending a messages. One of this process sends/receives a lot of message to both VMs ("virtual channels"), another sends/receives a lot messages (~1000 per sec) to R5 (openAmp). VMM is running on core 0 and VMs are running on core 1 and 2. Thanks, Leonid -----Original Message----- From: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au> Sent: Friday, October 4, 2019 8:58 AM To: Leonid Meyerovich <lmeyerovich@i-a-i.com>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Leonid, I forgot to ask, what seL4 version are you using? Is it from the Data61 Github repo? If you also use CAmkES, which version is it? It would be helpful if you could isolate the 2 VMs, so we could have a smaller system to understand. Based on your description, my understanding is you use one VMM running on core 0 to manage two VMs running on core 1 and core 2? R5 is the always-on core? What does the application do? Regards, Yanyan -----Original Message----- From: Leonid Meyerovich <lmeyerovich@i-a-i.com> Sent: Friday, 4 October 2019 8:26 PM To: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Yanyan, Yes, VMM is running on core 0, creates 2 VM and run them on core 1 and core2. VM don't have any access to hardware and they receive virtual timer and maintenance interrupts. I am not sure I can stop running all processes on core 3, I should think about it, but in this case the whole condition will change. Also I probably didn't mention that R5 also runs an application and communicates to the application, which is running on core3 through openAmp. Thanks, Leonid -----Original Message----- From: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au> Sent: Friday, October 4, 2019 8:16 AM To: Leonid Meyerovich <lmeyerovich@i-a-i.com>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Leonid, What do you mean by "hypervisor on core 0"? Do you mean the VMM? I assume you create a VMM for each VM running, and also pin the VMMs on the corresponding physical cores? If so, core 1 and core 2 also should receive virtual timer interrupts and VGIC maintenance interrupts. Is it possible that you stop running the processes on core 3 and just keep running the VMs on different cores? Do the VMs have any accesses to physical hardware, for instance, clocks or watchdogs? Regards, Yanyan -----Original Message----- From: Leonid Meyerovich <lmeyerovich@i-a-i.com> Sent: Thursday, 3 October 2019 8:45 PM To: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Yanyan, I am running initial task and hypervisor on core 0. Hypervisor creates 2 VM and running them on core 1 and 2 Core 3 runs 7 processes that communicate through notification objects and shared memory (in pairs) On process on core 3 implements UART based connection (, this is PL uart, Rx use interrupt). On of core 3 process also runs SADA driver (also uses interrupt) VMs communicate to the rest of the system through 'virtual channels' - exceptions and shared memory. All hardware interrupts are processed by core 0 (please, correct me if I am wrong). But as far as I understand PL2 physical timer interrupt runs on every core. Every processes prints some messages on terminal. I have never seen that these messages have been printed partially what the system lock up, process completes printing the message and has never scheduled again. I have also printed some messages from inside of ISR after getting time interrupt (print interrupt counter once a second) and I don't see these messages when the system locks up. Thank you, Leonid -----Original Message----- From: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au> Sent: Thursday, October 3, 2019 2:44 AM To: Leonid Meyerovich <lmeyerovich@i-a-i.com>; devel@sel4.systems Subject: Re: [seL4] Zynq UltraScale+ locks up after hours running Hi Leonid, Could you provide a bit more about your software configuration? For instance, do you have multiple VMs running on dedicated hardware cores? How are the VM and processes configured? Also, you mean there were no interrupts at all on all the four cores? Regards, Yanyan On Wed, 2019-10-02 at 16:01 +0000, Leonid Meyerovich wrote:
Hello,
We are running seL4 microkernel on 4 cores Zynq UltraScale+ (zcu102 board). The implementation includes multiple processes, hypervisor and virtual machine running on dedicated core. After several hours running (it could be 2 or even 8 hours) the whole microkernel locks up. After some investigation I have found that no interrupts generated anymore - at least there is no interrupts coming to ISR. Inside ISR I have monitored PL2 Physical Timer Control register, which feeds a scheduler and didn't find any problems - it stays enabled and not masked.
I will appreciate any idea/direction for approaching this problem.
Thank you,
Leonid
This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc. _______________________________________________ Devel mailing list Devel@sel4.systems https://sel4.systems/lists/listinfo/devel
________________________________ This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc. ________________________________ This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc. ________________________________ This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc. ________________________________ This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc.
Hi Yanyan, It would explain everything. How to proof/disprove it? Thanks, Leonid -----Original Message----- From: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au> Sent: Friday, October 4, 2019 10:11 AM To: Leonid Meyerovich <lmeyerovich@i-a-i.com>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running HI Leonid, For the big lock kernel, a possibility is that one core is in kernel mode, and for some reasons, it does not release the kernel big lock. Other cores keep trying to grab the lock when they need to handle interrupts or system calls, but they fail because the lock is not released. Regards, Yanyan -----Original Message----- From: Leonid Meyerovich <lmeyerovich@i-a-i.com> Sent: Friday, 4 October 2019 10:00 PM To: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Yanyan, Yes, our VMM is based on the libsel4armvmm. My prints is implemented in ISR and it is printing until the system receives timer interrupt. That is how I understand that I don't have a timer interrupt anymore. Usually I print from timer interrupt every second (the log below was implemented for different purpose). BTW, kernel is running on every core, how it is possible that all cores stop receiving a timer interrupt (actually not just timer, but also IPI)? Thanks, Leonid -----Original Message----- From: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au> Sent: Friday, October 4, 2019 9:49 AM To: Leonid Meyerovich <lmeyerovich@i-a-i.com>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Leonid, It looks like the seL4 v8.0.0 does not include Armv8 virtualization extension support yet ... So basically you wrote the VMM based on the libsel4armvmm? I am a bit confused by the interrupt log you provided. It looks like the log output depends on getting kernel timer interrupts. case IRQTimer: timerTick(); resetTimer(); if ((++tickCount % 200) == 0) { printInt(0); printf("- - - \n"); printInt(1); printf("- - - \n"); printInt(2); printf("- - - \n"); printInt(3); printf("- - - - - - - \n"); cleanIntStat(); } break; ---irq stat: cpu=0 (2) 26_50_50 0_258_258 - - - ---irq stat: cpu=1 (2) 26_50_50 0_258_258 - - - ---irq stat: cpu=2 (2) 26_50_50 0_258_258 - - - ---irq stat: cpu=3 (3) 26_50_50 65_1_1 59_1_1 - - - - - - - ---irq stat: cpu=0 (2) 26_50_50 0_257_257 - - - ---irq stat: cpu=1 (2) 26_50_50 0_280_280 - - - ---irq stat: cpu=2 (2) 26_50_50 0_280_280 I am a bit confused, and correct me if I am wrong. If you do not get kernel timer interrupts, how could you get the output above? I guess at least one core got kernel timer interrupts to get the output? Regards, Yanyan -----Original Message----- From: Leonid Meyerovich <lmeyerovich@i-a-i.com> Sent: Friday, 4 October 2019 9:11 PM To: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Yanyan, I am running seL4 version 8.0.0.0. Core 3 is running 7 processes which intensively communicate to each other, sending a messages. One of this process sends/receives a lot of message to both VMs ("virtual channels"), another sends/receives a lot messages (~1000 per sec) to R5 (openAmp). VMM is running on core 0 and VMs are running on core 1 and 2. Thanks, Leonid -----Original Message----- From: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au> Sent: Friday, October 4, 2019 8:58 AM To: Leonid Meyerovich <lmeyerovich@i-a-i.com>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Leonid, I forgot to ask, what seL4 version are you using? Is it from the Data61 Github repo? If you also use CAmkES, which version is it? It would be helpful if you could isolate the 2 VMs, so we could have a smaller system to understand. Based on your description, my understanding is you use one VMM running on core 0 to manage two VMs running on core 1 and core 2? R5 is the always-on core? What does the application do? Regards, Yanyan -----Original Message----- From: Leonid Meyerovich <lmeyerovich@i-a-i.com> Sent: Friday, 4 October 2019 8:26 PM To: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Yanyan, Yes, VMM is running on core 0, creates 2 VM and run them on core 1 and core2. VM don't have any access to hardware and they receive virtual timer and maintenance interrupts. I am not sure I can stop running all processes on core 3, I should think about it, but in this case the whole condition will change. Also I probably didn't mention that R5 also runs an application and communicates to the application, which is running on core3 through openAmp. Thanks, Leonid -----Original Message----- From: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au> Sent: Friday, October 4, 2019 8:16 AM To: Leonid Meyerovich <lmeyerovich@i-a-i.com>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Leonid, What do you mean by "hypervisor on core 0"? Do you mean the VMM? I assume you create a VMM for each VM running, and also pin the VMMs on the corresponding physical cores? If so, core 1 and core 2 also should receive virtual timer interrupts and VGIC maintenance interrupts. Is it possible that you stop running the processes on core 3 and just keep running the VMs on different cores? Do the VMs have any accesses to physical hardware, for instance, clocks or watchdogs? Regards, Yanyan -----Original Message----- From: Leonid Meyerovich <lmeyerovich@i-a-i.com> Sent: Thursday, 3 October 2019 8:45 PM To: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Yanyan, I am running initial task and hypervisor on core 0. Hypervisor creates 2 VM and running them on core 1 and 2 Core 3 runs 7 processes that communicate through notification objects and shared memory (in pairs) On process on core 3 implements UART based connection (, this is PL uart, Rx use interrupt). On of core 3 process also runs SADA driver (also uses interrupt) VMs communicate to the rest of the system through 'virtual channels' - exceptions and shared memory. All hardware interrupts are processed by core 0 (please, correct me if I am wrong). But as far as I understand PL2 physical timer interrupt runs on every core. Every processes prints some messages on terminal. I have never seen that these messages have been printed partially what the system lock up, process completes printing the message and has never scheduled again. I have also printed some messages from inside of ISR after getting time interrupt (print interrupt counter once a second) and I don't see these messages when the system locks up. Thank you, Leonid -----Original Message----- From: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au> Sent: Thursday, October 3, 2019 2:44 AM To: Leonid Meyerovich <lmeyerovich@i-a-i.com>; devel@sel4.systems Subject: Re: [seL4] Zynq UltraScale+ locks up after hours running Hi Leonid, Could you provide a bit more about your software configuration? For instance, do you have multiple VMs running on dedicated hardware cores? How are the VM and processes configured? Also, you mean there were no interrupts at all on all the four cores? Regards, Yanyan On Wed, 2019-10-02 at 16:01 +0000, Leonid Meyerovich wrote:
Hello,
We are running seL4 microkernel on 4 cores Zynq UltraScale+ (zcu102 board). The implementation includes multiple processes, hypervisor and virtual machine running on dedicated core. After several hours running (it could be 2 or even 8 hours) the whole microkernel locks up. After some investigation I have found that no interrupts generated anymore - at least there is no interrupts coming to ISR. Inside ISR I have monitored PL2 Physical Timer Control register, which feeds a scheduler and didn't find any problems - it stays enabled and not masked.
I will appreciate any idea/direction for approaching this problem.
Thank you,
Leonid
This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc. _______________________________________________ Devel mailing list Devel@sel4.systems https://sel4.systems/lists/listinfo/devel
________________________________ This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc. ________________________________ This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc. ________________________________ This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc. ________________________________ This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc. ________________________________ This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc.
Hi Leonid, It would be helpful if you could use JTAG, so you can observe the instruction pointers of the cores. If it is the case, three cores should execute several instructions repeatedly. If you do not have access to JTAG, maybe you could read CPU timestamps in the while loop of the inline function clh_lock_acqure in kernel/include/smp/lock.h. If a core has spent too much time in the loop (e.g. several seconds), and three cores report the same condition, we probably hit the scenario. Regards, Yanyan -----Original Message----- From: Leonid Meyerovich <lmeyerovich@i-a-i.com> Sent: Friday, 4 October 2019 10:19 PM To: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Yanyan, It would explain everything. How to proof/disprove it? Thanks, Leonid -----Original Message----- From: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au> Sent: Friday, October 4, 2019 10:11 AM To: Leonid Meyerovich <lmeyerovich@i-a-i.com>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running HI Leonid, For the big lock kernel, a possibility is that one core is in kernel mode, and for some reasons, it does not release the kernel big lock. Other cores keep trying to grab the lock when they need to handle interrupts or system calls, but they fail because the lock is not released. Regards, Yanyan -----Original Message----- From: Leonid Meyerovich <lmeyerovich@i-a-i.com> Sent: Friday, 4 October 2019 10:00 PM To: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Yanyan, Yes, our VMM is based on the libsel4armvmm. My prints is implemented in ISR and it is printing until the system receives timer interrupt. That is how I understand that I don't have a timer interrupt anymore. Usually I print from timer interrupt every second (the log below was implemented for different purpose). BTW, kernel is running on every core, how it is possible that all cores stop receiving a timer interrupt (actually not just timer, but also IPI)? Thanks, Leonid -----Original Message----- From: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au> Sent: Friday, October 4, 2019 9:49 AM To: Leonid Meyerovich <lmeyerovich@i-a-i.com>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Leonid, It looks like the seL4 v8.0.0 does not include Armv8 virtualization extension support yet ... So basically you wrote the VMM based on the libsel4armvmm? I am a bit confused by the interrupt log you provided. It looks like the log output depends on getting kernel timer interrupts. case IRQTimer: timerTick(); resetTimer(); if ((++tickCount % 200) == 0) { printInt(0); printf("- - - \n"); printInt(1); printf("- - - \n"); printInt(2); printf("- - - \n"); printInt(3); printf("- - - - - - - \n"); cleanIntStat(); } break; ---irq stat: cpu=0 (2) 26_50_50 0_258_258 - - - ---irq stat: cpu=1 (2) 26_50_50 0_258_258 - - - ---irq stat: cpu=2 (2) 26_50_50 0_258_258 - - - ---irq stat: cpu=3 (3) 26_50_50 65_1_1 59_1_1 - - - - - - - ---irq stat: cpu=0 (2) 26_50_50 0_257_257 - - - ---irq stat: cpu=1 (2) 26_50_50 0_280_280 - - - ---irq stat: cpu=2 (2) 26_50_50 0_280_280 I am a bit confused, and correct me if I am wrong. If you do not get kernel timer interrupts, how could you get the output above? I guess at least one core got kernel timer interrupts to get the output? Regards, Yanyan -----Original Message----- From: Leonid Meyerovich <lmeyerovich@i-a-i.com> Sent: Friday, 4 October 2019 9:11 PM To: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Yanyan, I am running seL4 version 8.0.0.0. Core 3 is running 7 processes which intensively communicate to each other, sending a messages. One of this process sends/receives a lot of message to both VMs ("virtual channels"), another sends/receives a lot messages (~1000 per sec) to R5 (openAmp). VMM is running on core 0 and VMs are running on core 1 and 2. Thanks, Leonid -----Original Message----- From: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au> Sent: Friday, October 4, 2019 8:58 AM To: Leonid Meyerovich <lmeyerovich@i-a-i.com>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Leonid, I forgot to ask, what seL4 version are you using? Is it from the Data61 Github repo? If you also use CAmkES, which version is it? It would be helpful if you could isolate the 2 VMs, so we could have a smaller system to understand. Based on your description, my understanding is you use one VMM running on core 0 to manage two VMs running on core 1 and core 2? R5 is the always-on core? What does the application do? Regards, Yanyan -----Original Message----- From: Leonid Meyerovich <lmeyerovich@i-a-i.com> Sent: Friday, 4 October 2019 8:26 PM To: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Yanyan, Yes, VMM is running on core 0, creates 2 VM and run them on core 1 and core2. VM don't have any access to hardware and they receive virtual timer and maintenance interrupts. I am not sure I can stop running all processes on core 3, I should think about it, but in this case the whole condition will change. Also I probably didn't mention that R5 also runs an application and communicates to the application, which is running on core3 through openAmp. Thanks, Leonid -----Original Message----- From: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au> Sent: Friday, October 4, 2019 8:16 AM To: Leonid Meyerovich <lmeyerovich@i-a-i.com>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Leonid, What do you mean by "hypervisor on core 0"? Do you mean the VMM? I assume you create a VMM for each VM running, and also pin the VMMs on the corresponding physical cores? If so, core 1 and core 2 also should receive virtual timer interrupts and VGIC maintenance interrupts. Is it possible that you stop running the processes on core 3 and just keep running the VMs on different cores? Do the VMs have any accesses to physical hardware, for instance, clocks or watchdogs? Regards, Yanyan -----Original Message----- From: Leonid Meyerovich <lmeyerovich@i-a-i.com> Sent: Thursday, 3 October 2019 8:45 PM To: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Yanyan, I am running initial task and hypervisor on core 0. Hypervisor creates 2 VM and running them on core 1 and 2 Core 3 runs 7 processes that communicate through notification objects and shared memory (in pairs) On process on core 3 implements UART based connection (, this is PL uart, Rx use interrupt). On of core 3 process also runs SADA driver (also uses interrupt) VMs communicate to the rest of the system through 'virtual channels' - exceptions and shared memory. All hardware interrupts are processed by core 0 (please, correct me if I am wrong). But as far as I understand PL2 physical timer interrupt runs on every core. Every processes prints some messages on terminal. I have never seen that these messages have been printed partially what the system lock up, process completes printing the message and has never scheduled again. I have also printed some messages from inside of ISR after getting time interrupt (print interrupt counter once a second) and I don't see these messages when the system locks up. Thank you, Leonid -----Original Message----- From: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au> Sent: Thursday, October 3, 2019 2:44 AM To: Leonid Meyerovich <lmeyerovich@i-a-i.com>; devel@sel4.systems Subject: Re: [seL4] Zynq UltraScale+ locks up after hours running Hi Leonid, Could you provide a bit more about your software configuration? For instance, do you have multiple VMs running on dedicated hardware cores? How are the VM and processes configured? Also, you mean there were no interrupts at all on all the four cores? Regards, Yanyan On Wed, 2019-10-02 at 16:01 +0000, Leonid Meyerovich wrote:
Hello,
We are running seL4 microkernel on 4 cores Zynq UltraScale+ (zcu102 board). The implementation includes multiple processes, hypervisor and virtual machine running on dedicated core. After several hours running (it could be 2 or even 8 hours) the whole microkernel locks up. After some investigation I have found that no interrupts generated anymore - at least there is no interrupts coming to ISR. Inside ISR I have monitored PL2 Physical Timer Control register, which feeds a scheduler and didn't find any problems - it stays enabled and not masked.
I will appreciate any idea/direction for approaching this problem.
Thank you,
Leonid
This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc. _______________________________________________ Devel mailing list Devel@sel4.systems https://sel4.systems/lists/listinfo/devel
________________________________ This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc. ________________________________ This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc. ________________________________ This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc. ________________________________ This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc. ________________________________ This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc.
Hi Yanyan, Actually I do have JTAG, but I am not sure how I can use it without any development environment. Is there any JTAG application that I can use while connected to my device through JTAG interface? The second method is also looks promising. What is CPU timestamp? I can use simple counter to estimate the time waiting for big lock. Thanks, Leonid -----Original Message----- From: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au> Sent: Friday, October 4, 2019 10:35 AM To: Leonid Meyerovich <lmeyerovich@i-a-i.com>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Leonid, It would be helpful if you could use JTAG, so you can observe the instruction pointers of the cores. If it is the case, three cores should execute several instructions repeatedly. If you do not have access to JTAG, maybe you could read CPU timestamps in the while loop of the inline function clh_lock_acqure in kernel/include/smp/lock.h. If a core has spent too much time in the loop (e.g. several seconds), and three cores report the same condition, we probably hit the scenario. Regards, Yanyan -----Original Message----- From: Leonid Meyerovich <lmeyerovich@i-a-i.com> Sent: Friday, 4 October 2019 10:19 PM To: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Yanyan, It would explain everything. How to proof/disprove it? Thanks, Leonid -----Original Message----- From: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au> Sent: Friday, October 4, 2019 10:11 AM To: Leonid Meyerovich <lmeyerovich@i-a-i.com>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running HI Leonid, For the big lock kernel, a possibility is that one core is in kernel mode, and for some reasons, it does not release the kernel big lock. Other cores keep trying to grab the lock when they need to handle interrupts or system calls, but they fail because the lock is not released. Regards, Yanyan -----Original Message----- From: Leonid Meyerovich <lmeyerovich@i-a-i.com> Sent: Friday, 4 October 2019 10:00 PM To: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Yanyan, Yes, our VMM is based on the libsel4armvmm. My prints is implemented in ISR and it is printing until the system receives timer interrupt. That is how I understand that I don't have a timer interrupt anymore. Usually I print from timer interrupt every second (the log below was implemented for different purpose). BTW, kernel is running on every core, how it is possible that all cores stop receiving a timer interrupt (actually not just timer, but also IPI)? Thanks, Leonid -----Original Message----- From: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au> Sent: Friday, October 4, 2019 9:49 AM To: Leonid Meyerovich <lmeyerovich@i-a-i.com>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Leonid, It looks like the seL4 v8.0.0 does not include Armv8 virtualization extension support yet ... So basically you wrote the VMM based on the libsel4armvmm? I am a bit confused by the interrupt log you provided. It looks like the log output depends on getting kernel timer interrupts. case IRQTimer: timerTick(); resetTimer(); if ((++tickCount % 200) == 0) { printInt(0); printf("- - - \n"); printInt(1); printf("- - - \n"); printInt(2); printf("- - - \n"); printInt(3); printf("- - - - - - - \n"); cleanIntStat(); } break; ---irq stat: cpu=0 (2) 26_50_50 0_258_258 - - - ---irq stat: cpu=1 (2) 26_50_50 0_258_258 - - - ---irq stat: cpu=2 (2) 26_50_50 0_258_258 - - - ---irq stat: cpu=3 (3) 26_50_50 65_1_1 59_1_1 - - - - - - - ---irq stat: cpu=0 (2) 26_50_50 0_257_257 - - - ---irq stat: cpu=1 (2) 26_50_50 0_280_280 - - - ---irq stat: cpu=2 (2) 26_50_50 0_280_280 I am a bit confused, and correct me if I am wrong. If you do not get kernel timer interrupts, how could you get the output above? I guess at least one core got kernel timer interrupts to get the output? Regards, Yanyan -----Original Message----- From: Leonid Meyerovich <lmeyerovich@i-a-i.com> Sent: Friday, 4 October 2019 9:11 PM To: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Yanyan, I am running seL4 version 8.0.0.0. Core 3 is running 7 processes which intensively communicate to each other, sending a messages. One of this process sends/receives a lot of message to both VMs ("virtual channels"), another sends/receives a lot messages (~1000 per sec) to R5 (openAmp). VMM is running on core 0 and VMs are running on core 1 and 2. Thanks, Leonid -----Original Message----- From: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au> Sent: Friday, October 4, 2019 8:58 AM To: Leonid Meyerovich <lmeyerovich@i-a-i.com>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Leonid, I forgot to ask, what seL4 version are you using? Is it from the Data61 Github repo? If you also use CAmkES, which version is it? It would be helpful if you could isolate the 2 VMs, so we could have a smaller system to understand. Based on your description, my understanding is you use one VMM running on core 0 to manage two VMs running on core 1 and core 2? R5 is the always-on core? What does the application do? Regards, Yanyan -----Original Message----- From: Leonid Meyerovich <lmeyerovich@i-a-i.com> Sent: Friday, 4 October 2019 8:26 PM To: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Yanyan, Yes, VMM is running on core 0, creates 2 VM and run them on core 1 and core2. VM don't have any access to hardware and they receive virtual timer and maintenance interrupts. I am not sure I can stop running all processes on core 3, I should think about it, but in this case the whole condition will change. Also I probably didn't mention that R5 also runs an application and communicates to the application, which is running on core3 through openAmp. Thanks, Leonid -----Original Message----- From: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au> Sent: Friday, October 4, 2019 8:16 AM To: Leonid Meyerovich <lmeyerovich@i-a-i.com>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Leonid, What do you mean by "hypervisor on core 0"? Do you mean the VMM? I assume you create a VMM for each VM running, and also pin the VMMs on the corresponding physical cores? If so, core 1 and core 2 also should receive virtual timer interrupts and VGIC maintenance interrupts. Is it possible that you stop running the processes on core 3 and just keep running the VMs on different cores? Do the VMs have any accesses to physical hardware, for instance, clocks or watchdogs? Regards, Yanyan -----Original Message----- From: Leonid Meyerovich <lmeyerovich@i-a-i.com> Sent: Thursday, 3 October 2019 8:45 PM To: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Yanyan, I am running initial task and hypervisor on core 0. Hypervisor creates 2 VM and running them on core 1 and 2 Core 3 runs 7 processes that communicate through notification objects and shared memory (in pairs) On process on core 3 implements UART based connection (, this is PL uart, Rx use interrupt). On of core 3 process also runs SADA driver (also uses interrupt) VMs communicate to the rest of the system through 'virtual channels' - exceptions and shared memory. All hardware interrupts are processed by core 0 (please, correct me if I am wrong). But as far as I understand PL2 physical timer interrupt runs on every core. Every processes prints some messages on terminal. I have never seen that these messages have been printed partially what the system lock up, process completes printing the message and has never scheduled again. I have also printed some messages from inside of ISR after getting time interrupt (print interrupt counter once a second) and I don't see these messages when the system locks up. Thank you, Leonid -----Original Message----- From: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au> Sent: Thursday, October 3, 2019 2:44 AM To: Leonid Meyerovich <lmeyerovich@i-a-i.com>; devel@sel4.systems Subject: Re: [seL4] Zynq UltraScale+ locks up after hours running Hi Leonid, Could you provide a bit more about your software configuration? For instance, do you have multiple VMs running on dedicated hardware cores? How are the VM and processes configured? Also, you mean there were no interrupts at all on all the four cores? Regards, Yanyan On Wed, 2019-10-02 at 16:01 +0000, Leonid Meyerovich wrote:
Hello,
We are running seL4 microkernel on 4 cores Zynq UltraScale+ (zcu102 board). The implementation includes multiple processes, hypervisor and virtual machine running on dedicated core. After several hours running (it could be 2 or even 8 hours) the whole microkernel locks up. After some investigation I have found that no interrupts generated anymore - at least there is no interrupts coming to ISR. Inside ISR I have monitored PL2 Physical Timer Control register, which feeds a scheduler and didn't find any problems - it stays enabled and not masked.
I will appreciate any idea/direction for approaching this problem.
Thank you,
Leonid
This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc. _______________________________________________ Devel mailing list Devel@sel4.systems https://sel4.systems/lists/listinfo/devel
________________________________ This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc. ________________________________ This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc. ________________________________ This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc. ________________________________ This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc. ________________________________ This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc. ________________________________ This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc.
Hi Leonid, Yes I mean CPU cycle counters. Regards, Yanyan ________________________________ From: Leonid Meyerovich <lmeyerovich@i-a-i.com> Sent: Friday, October 4, 2019 11:00:31 PM To: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au>; devel@sel4.systems <devel@sel4.systems> Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Yanyan, Actually I do have JTAG, but I am not sure how I can use it without any development environment. Is there any JTAG application that I can use while connected to my device through JTAG interface? The second method is also looks promising. What is CPU timestamp? I can use simple counter to estimate the time waiting for big lock. Thanks, Leonid -----Original Message----- From: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au> Sent: Friday, October 4, 2019 10:35 AM To: Leonid Meyerovich <lmeyerovich@i-a-i.com>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Leonid, It would be helpful if you could use JTAG, so you can observe the instruction pointers of the cores. If it is the case, three cores should execute several instructions repeatedly. If you do not have access to JTAG, maybe you could read CPU timestamps in the while loop of the inline function clh_lock_acqure in kernel/include/smp/lock.h. If a core has spent too much time in the loop (e.g. several seconds), and three cores report the same condition, we probably hit the scenario. Regards, Yanyan -----Original Message----- From: Leonid Meyerovich <lmeyerovich@i-a-i.com> Sent: Friday, 4 October 2019 10:19 PM To: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Yanyan, It would explain everything. How to proof/disprove it? Thanks, Leonid -----Original Message----- From: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au> Sent: Friday, October 4, 2019 10:11 AM To: Leonid Meyerovich <lmeyerovich@i-a-i.com>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running HI Leonid, For the big lock kernel, a possibility is that one core is in kernel mode, and for some reasons, it does not release the kernel big lock. Other cores keep trying to grab the lock when they need to handle interrupts or system calls, but they fail because the lock is not released. Regards, Yanyan -----Original Message----- From: Leonid Meyerovich <lmeyerovich@i-a-i.com> Sent: Friday, 4 October 2019 10:00 PM To: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Yanyan, Yes, our VMM is based on the libsel4armvmm. My prints is implemented in ISR and it is printing until the system receives timer interrupt. That is how I understand that I don't have a timer interrupt anymore. Usually I print from timer interrupt every second (the log below was implemented for different purpose). BTW, kernel is running on every core, how it is possible that all cores stop receiving a timer interrupt (actually not just timer, but also IPI)? Thanks, Leonid -----Original Message----- From: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au> Sent: Friday, October 4, 2019 9:49 AM To: Leonid Meyerovich <lmeyerovich@i-a-i.com>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Leonid, It looks like the seL4 v8.0.0 does not include Armv8 virtualization extension support yet ... So basically you wrote the VMM based on the libsel4armvmm? I am a bit confused by the interrupt log you provided. It looks like the log output depends on getting kernel timer interrupts. case IRQTimer: timerTick(); resetTimer(); if ((++tickCount % 200) == 0) { printInt(0); printf("- - - \n"); printInt(1); printf("- - - \n"); printInt(2); printf("- - - \n"); printInt(3); printf("- - - - - - - \n"); cleanIntStat(); } break; ---irq stat: cpu=0 (2) 26_50_50 0_258_258 - - - ---irq stat: cpu=1 (2) 26_50_50 0_258_258 - - - ---irq stat: cpu=2 (2) 26_50_50 0_258_258 - - - ---irq stat: cpu=3 (3) 26_50_50 65_1_1 59_1_1 - - - - - - - ---irq stat: cpu=0 (2) 26_50_50 0_257_257 - - - ---irq stat: cpu=1 (2) 26_50_50 0_280_280 - - - ---irq stat: cpu=2 (2) 26_50_50 0_280_280 I am a bit confused, and correct me if I am wrong. If you do not get kernel timer interrupts, how could you get the output above? I guess at least one core got kernel timer interrupts to get the output? Regards, Yanyan -----Original Message----- From: Leonid Meyerovich <lmeyerovich@i-a-i.com> Sent: Friday, 4 October 2019 9:11 PM To: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Yanyan, I am running seL4 version 8.0.0.0. Core 3 is running 7 processes which intensively communicate to each other, sending a messages. One of this process sends/receives a lot of message to both VMs ("virtual channels"), another sends/receives a lot messages (~1000 per sec) to R5 (openAmp). VMM is running on core 0 and VMs are running on core 1 and 2. Thanks, Leonid -----Original Message----- From: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au> Sent: Friday, October 4, 2019 8:58 AM To: Leonid Meyerovich <lmeyerovich@i-a-i.com>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Leonid, I forgot to ask, what seL4 version are you using? Is it from the Data61 Github repo? If you also use CAmkES, which version is it? It would be helpful if you could isolate the 2 VMs, so we could have a smaller system to understand. Based on your description, my understanding is you use one VMM running on core 0 to manage two VMs running on core 1 and core 2? R5 is the always-on core? What does the application do? Regards, Yanyan -----Original Message----- From: Leonid Meyerovich <lmeyerovich@i-a-i.com> Sent: Friday, 4 October 2019 8:26 PM To: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Yanyan, Yes, VMM is running on core 0, creates 2 VM and run them on core 1 and core2. VM don't have any access to hardware and they receive virtual timer and maintenance interrupts. I am not sure I can stop running all processes on core 3, I should think about it, but in this case the whole condition will change. Also I probably didn't mention that R5 also runs an application and communicates to the application, which is running on core3 through openAmp. Thanks, Leonid -----Original Message----- From: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au> Sent: Friday, October 4, 2019 8:16 AM To: Leonid Meyerovich <lmeyerovich@i-a-i.com>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Leonid, What do you mean by "hypervisor on core 0"? Do you mean the VMM? I assume you create a VMM for each VM running, and also pin the VMMs on the corresponding physical cores? If so, core 1 and core 2 also should receive virtual timer interrupts and VGIC maintenance interrupts. Is it possible that you stop running the processes on core 3 and just keep running the VMs on different cores? Do the VMs have any accesses to physical hardware, for instance, clocks or watchdogs? Regards, Yanyan -----Original Message----- From: Leonid Meyerovich <lmeyerovich@i-a-i.com> Sent: Thursday, 3 October 2019 8:45 PM To: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Yanyan, I am running initial task and hypervisor on core 0. Hypervisor creates 2 VM and running them on core 1 and 2 Core 3 runs 7 processes that communicate through notification objects and shared memory (in pairs) On process on core 3 implements UART based connection (, this is PL uart, Rx use interrupt). On of core 3 process also runs SADA driver (also uses interrupt) VMs communicate to the rest of the system through 'virtual channels' - exceptions and shared memory. All hardware interrupts are processed by core 0 (please, correct me if I am wrong). But as far as I understand PL2 physical timer interrupt runs on every core. Every processes prints some messages on terminal. I have never seen that these messages have been printed partially what the system lock up, process completes printing the message and has never scheduled again. I have also printed some messages from inside of ISR after getting time interrupt (print interrupt counter once a second) and I don't see these messages when the system locks up. Thank you, Leonid -----Original Message----- From: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au> Sent: Thursday, October 3, 2019 2:44 AM To: Leonid Meyerovich <lmeyerovich@i-a-i.com>; devel@sel4.systems Subject: Re: [seL4] Zynq UltraScale+ locks up after hours running Hi Leonid, Could you provide a bit more about your software configuration? For instance, do you have multiple VMs running on dedicated hardware cores? How are the VM and processes configured? Also, you mean there were no interrupts at all on all the four cores? Regards, Yanyan On Wed, 2019-10-02 at 16:01 +0000, Leonid Meyerovich wrote:
Hello,
We are running seL4 microkernel on 4 cores Zynq UltraScale+ (zcu102 board). The implementation includes multiple processes, hypervisor and virtual machine running on dedicated core. After several hours running (it could be 2 or even 8 hours) the whole microkernel locks up. After some investigation I have found that no interrupts generated anymore - at least there is no interrupts coming to ISR. Inside ISR I have monitored PL2 Physical Timer Control register, which feeds a scheduler and didn't find any problems - it stays enabled and not masked.
I will appreciate any idea/direction for approaching this problem.
Thank you,
Leonid
This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc. _______________________________________________ Devel mailing list Devel@sel4.systems https://sel4.systems/lists/listinfo/devel
________________________________ This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc. ________________________________ This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc. ________________________________ This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc. ________________________________ This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc. ________________________________ This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc. ________________________________ This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc.
Hi Yanyan, I thought about just long integer variable, incremented inside clh_lock_acquire loop. I may also try to enable CONFIG_ENABLE_BENCHMARKS and to use timestamp() function. Thanks, Leonid From: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au> Sent: Friday, October 4, 2019 11:23 AM To: Leonid Meyerovich <lmeyerovich@i-a-i.com>; devel@sel4.systems Subject: Re: [seL4] Zynq UltraScale+ locks up after hours running Hi Leonid, Yes I mean CPU cycle counters. Regards, Yanyan ________________________________ From: Leonid Meyerovich <lmeyerovich@i-a-i.com<mailto:lmeyerovich@i-a-i.com>> Sent: Friday, October 4, 2019 11:00:31 PM To: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au<mailto:Yanyan.Shen@data61.csiro.au>>; devel@sel4.systems<mailto:devel@sel4.systems> <devel@sel4.systems<mailto:devel@sel4.systems>> Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Yanyan, Actually I do have JTAG, but I am not sure how I can use it without any development environment. Is there any JTAG application that I can use while connected to my device through JTAG interface? The second method is also looks promising. What is CPU timestamp? I can use simple counter to estimate the time waiting for big lock. Thanks, Leonid -----Original Message----- From: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au<mailto:Yanyan.Shen@data61.csiro.au>> Sent: Friday, October 4, 2019 10:35 AM To: Leonid Meyerovich <lmeyerovich@i-a-i.com<mailto:lmeyerovich@i-a-i.com>>; devel@sel4.systems<mailto:devel@sel4.systems> Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Leonid, It would be helpful if you could use JTAG, so you can observe the instruction pointers of the cores. If it is the case, three cores should execute several instructions repeatedly. If you do not have access to JTAG, maybe you could read CPU timestamps in the while loop of the inline function clh_lock_acqure in kernel/include/smp/lock.h. If a core has spent too much time in the loop (e.g. several seconds), and three cores report the same condition, we probably hit the scenario. Regards, Yanyan -----Original Message----- From: Leonid Meyerovich <lmeyerovich@i-a-i.com<mailto:lmeyerovich@i-a-i.com>> Sent: Friday, 4 October 2019 10:19 PM To: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au<mailto:Yanyan.Shen@data61.csiro.au>>; devel@sel4.systems<mailto:devel@sel4.systems> Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Yanyan, It would explain everything. How to proof/disprove it? Thanks, Leonid -----Original Message----- From: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au<mailto:Yanyan.Shen@data61.csiro.au>> Sent: Friday, October 4, 2019 10:11 AM To: Leonid Meyerovich <lmeyerovich@i-a-i.com<mailto:lmeyerovich@i-a-i.com>>; devel@sel4.systems<mailto:devel@sel4.systems> Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running HI Leonid, For the big lock kernel, a possibility is that one core is in kernel mode, and for some reasons, it does not release the kernel big lock. Other cores keep trying to grab the lock when they need to handle interrupts or system calls, but they fail because the lock is not released. Regards, Yanyan -----Original Message----- From: Leonid Meyerovich <lmeyerovich@i-a-i.com<mailto:lmeyerovich@i-a-i.com>> Sent: Friday, 4 October 2019 10:00 PM To: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au<mailto:Yanyan.Shen@data61.csiro.au>>; devel@sel4.systems<mailto:devel@sel4.systems> Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Yanyan, Yes, our VMM is based on the libsel4armvmm. My prints is implemented in ISR and it is printing until the system receives timer interrupt. That is how I understand that I don't have a timer interrupt anymore. Usually I print from timer interrupt every second (the log below was implemented for different purpose). BTW, kernel is running on every core, how it is possible that all cores stop receiving a timer interrupt (actually not just timer, but also IPI)? Thanks, Leonid -----Original Message----- From: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au<mailto:Yanyan.Shen@data61.csiro.au>> Sent: Friday, October 4, 2019 9:49 AM To: Leonid Meyerovich <lmeyerovich@i-a-i.com<mailto:lmeyerovich@i-a-i.com>>; devel@sel4.systems<mailto:devel@sel4.systems> Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Leonid, It looks like the seL4 v8.0.0 does not include Armv8 virtualization extension support yet ... So basically you wrote the VMM based on the libsel4armvmm? I am a bit confused by the interrupt log you provided. It looks like the log output depends on getting kernel timer interrupts. case IRQTimer: timerTick(); resetTimer(); if ((++tickCount % 200) == 0) { printInt(0); printf("- - - \n"); printInt(1); printf("- - - \n"); printInt(2); printf("- - - \n"); printInt(3); printf("- - - - - - - \n"); cleanIntStat(); } break; ---irq stat: cpu=0 (2) 26_50_50 0_258_258 - - - ---irq stat: cpu=1 (2) 26_50_50 0_258_258 - - - ---irq stat: cpu=2 (2) 26_50_50 0_258_258 - - - ---irq stat: cpu=3 (3) 26_50_50 65_1_1 59_1_1 - - - - - - - ---irq stat: cpu=0 (2) 26_50_50 0_257_257 - - - ---irq stat: cpu=1 (2) 26_50_50 0_280_280 - - - ---irq stat: cpu=2 (2) 26_50_50 0_280_280 I am a bit confused, and correct me if I am wrong. If you do not get kernel timer interrupts, how could you get the output above? I guess at least one core got kernel timer interrupts to get the output? Regards, Yanyan -----Original Message----- From: Leonid Meyerovich <lmeyerovich@i-a-i.com<mailto:lmeyerovich@i-a-i.com>> Sent: Friday, 4 October 2019 9:11 PM To: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au<mailto:Yanyan.Shen@data61.csiro.au>>; devel@sel4.systems<mailto:devel@sel4.systems> Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Yanyan, I am running seL4 version 8.0.0.0. Core 3 is running 7 processes which intensively communicate to each other, sending a messages. One of this process sends/receives a lot of message to both VMs ("virtual channels"), another sends/receives a lot messages (~1000 per sec) to R5 (openAmp). VMM is running on core 0 and VMs are running on core 1 and 2. Thanks, Leonid -----Original Message----- From: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au<mailto:Yanyan.Shen@data61.csiro.au>> Sent: Friday, October 4, 2019 8:58 AM To: Leonid Meyerovich <lmeyerovich@i-a-i.com<mailto:lmeyerovich@i-a-i.com>>; devel@sel4.systems<mailto:devel@sel4.systems> Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Leonid, I forgot to ask, what seL4 version are you using? Is it from the Data61 Github repo? If you also use CAmkES, which version is it? It would be helpful if you could isolate the 2 VMs, so we could have a smaller system to understand. Based on your description, my understanding is you use one VMM running on core 0 to manage two VMs running on core 1 and core 2? R5 is the always-on core? What does the application do? Regards, Yanyan -----Original Message----- From: Leonid Meyerovich <lmeyerovich@i-a-i.com<mailto:lmeyerovich@i-a-i.com>> Sent: Friday, 4 October 2019 8:26 PM To: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au<mailto:Yanyan.Shen@data61.csiro.au>>; devel@sel4.systems<mailto:devel@sel4.systems> Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Yanyan, Yes, VMM is running on core 0, creates 2 VM and run them on core 1 and core2. VM don't have any access to hardware and they receive virtual timer and maintenance interrupts. I am not sure I can stop running all processes on core 3, I should think about it, but in this case the whole condition will change. Also I probably didn't mention that R5 also runs an application and communicates to the application, which is running on core3 through openAmp. Thanks, Leonid -----Original Message----- From: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au<mailto:Yanyan.Shen@data61.csiro.au>> Sent: Friday, October 4, 2019 8:16 AM To: Leonid Meyerovich <lmeyerovich@i-a-i.com<mailto:lmeyerovich@i-a-i.com>>; devel@sel4.systems<mailto:devel@sel4.systems> Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Leonid, What do you mean by "hypervisor on core 0"? Do you mean the VMM? I assume you create a VMM for each VM running, and also pin the VMMs on the corresponding physical cores? If so, core 1 and core 2 also should receive virtual timer interrupts and VGIC maintenance interrupts. Is it possible that you stop running the processes on core 3 and just keep running the VMs on different cores? Do the VMs have any accesses to physical hardware, for instance, clocks or watchdogs? Regards, Yanyan -----Original Message----- From: Leonid Meyerovich <lmeyerovich@i-a-i.com<mailto:lmeyerovich@i-a-i.com>> Sent: Thursday, 3 October 2019 8:45 PM To: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au<mailto:Yanyan.Shen@data61.csiro.au>>; devel@sel4.systems<mailto:devel@sel4.systems> Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Yanyan, I am running initial task and hypervisor on core 0. Hypervisor creates 2 VM and running them on core 1 and 2 Core 3 runs 7 processes that communicate through notification objects and shared memory (in pairs) On process on core 3 implements UART based connection (, this is PL uart, Rx use interrupt). On of core 3 process also runs SADA driver (also uses interrupt) VMs communicate to the rest of the system through 'virtual channels' - exceptions and shared memory. All hardware interrupts are processed by core 0 (please, correct me if I am wrong). But as far as I understand PL2 physical timer interrupt runs on every core. Every processes prints some messages on terminal. I have never seen that these messages have been printed partially what the system lock up, process completes printing the message and has never scheduled again. I have also printed some messages from inside of ISR after getting time interrupt (print interrupt counter once a second) and I don't see these messages when the system locks up. Thank you, Leonid -----Original Message----- From: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au<mailto:Yanyan.Shen@data61.csiro.au>> Sent: Thursday, October 3, 2019 2:44 AM To: Leonid Meyerovich <lmeyerovich@i-a-i.com<mailto:lmeyerovich@i-a-i.com>>; devel@sel4.systems<mailto:devel@sel4.systems> Subject: Re: [seL4] Zynq UltraScale+ locks up after hours running Hi Leonid, Could you provide a bit more about your software configuration? For instance, do you have multiple VMs running on dedicated hardware cores? How are the VM and processes configured? Also, you mean there were no interrupts at all on all the four cores? Regards, Yanyan On Wed, 2019-10-02 at 16:01 +0000, Leonid Meyerovich wrote:
Hello,
We are running seL4 microkernel on 4 cores Zynq UltraScale+ (zcu102 board). The implementation includes multiple processes, hypervisor and virtual machine running on dedicated core. After several hours running (it could be 2 or even 8 hours) the whole microkernel locks up. After some investigation I have found that no interrupts generated anymore - at least there is no interrupts coming to ISR. Inside ISR I have monitored PL2 Physical Timer Control register, which feeds a scheduler and didn't find any problems - it stays enabled and not masked.
I will appreciate any idea/direction for approaching this problem.
Thank you,
Leonid
This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc. _______________________________________________ Devel mailing list Devel@sel4.systems<mailto:Devel@sel4.systems> https://sel4.systems/lists/listinfo/devel
________________________________ This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc. ________________________________ This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc. ________________________________ This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc. ________________________________ This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc. ________________________________ This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc. ________________________________ This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc. ________________________________ This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc.
Hi Leonid, Version 8.0.0.0 is from January 2018, almost 2 years old. We strongly recommend working off the latest release. Gernot
On 4 Oct 2019, at 23:11, Leonid Meyerovich <lmeyerovich@i-a-i.com> wrote:
Hi Yanyan,
I am running seL4 version 8.0.0.0. Core 3 is running 7 processes which intensively communicate to each other, sending a messages. One of this process sends/receives a lot of message to both VMs ("virtual channels"), another sends/receives a lot messages (~1000 per sec) to R5 (openAmp).
VMM is running on core 0 and VMs are running on core 1 and 2.
Thanks, Leonid
-----Original Message----- From: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au> Sent: Friday, October 4, 2019 8:58 AM To: Leonid Meyerovich <lmeyerovich@i-a-i.com>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running
Hi Leonid,
I forgot to ask, what seL4 version are you using? Is it from the Data61 Github repo? If you also use CAmkES, which version is it? It would be helpful if you could isolate the 2 VMs, so we could have a smaller system to understand. Based on your description, my understanding is you use one VMM running on core 0 to manage two VMs running on core 1 and core 2?
R5 is the always-on core? What does the application do?
Regards, Yanyan
-----Original Message----- From: Leonid Meyerovich <lmeyerovich@i-a-i.com> Sent: Friday, 4 October 2019 8:26 PM To: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running
Hi Yanyan,
Yes, VMM is running on core 0, creates 2 VM and run them on core 1 and core2. VM don't have any access to hardware and they receive virtual timer and maintenance interrupts. I am not sure I can stop running all processes on core 3, I should think about it, but in this case the whole condition will change. Also I probably didn't mention that R5 also runs an application and communicates to the application, which is running on core3 through openAmp.
Thanks, Leonid
-----Original Message----- From: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au> Sent: Friday, October 4, 2019 8:16 AM To: Leonid Meyerovich <lmeyerovich@i-a-i.com>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running
Hi Leonid,
What do you mean by "hypervisor on core 0"? Do you mean the VMM? I assume you create a VMM for each VM running, and also pin the VMMs on the corresponding physical cores? If so, core 1 and core 2 also should receive virtual timer interrupts and VGIC maintenance interrupts. Is it possible that you stop running the processes on core 3 and just keep running the VMs on different cores? Do the VMs have any accesses to physical hardware, for instance, clocks or watchdogs?
Regards, Yanyan
-----Original Message----- From: Leonid Meyerovich <lmeyerovich@i-a-i.com> Sent: Thursday, 3 October 2019 8:45 PM To: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running
Hi Yanyan,
I am running initial task and hypervisor on core 0. Hypervisor creates 2 VM and running them on core 1 and 2 Core 3 runs 7 processes that communicate through notification objects and shared memory (in pairs) On process on core 3 implements UART based connection (, this is PL uart, Rx use interrupt). On of core 3 process also runs SADA driver (also uses interrupt)
VMs communicate to the rest of the system through 'virtual channels' - exceptions and shared memory.
All hardware interrupts are processed by core 0 (please, correct me if I am wrong). But as far as I understand PL2 physical timer interrupt runs on every core.
Every processes prints some messages on terminal. I have never seen that these messages have been printed partially what the system lock up, process completes printing the message and has never scheduled again. I have also printed some messages from inside of ISR after getting time interrupt (print interrupt counter once a second) and I don't see these messages when the system locks up.
Thank you, Leonid
-----Original Message----- From: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au> Sent: Thursday, October 3, 2019 2:44 AM To: Leonid Meyerovich <lmeyerovich@i-a-i.com>; devel@sel4.systems Subject: Re: [seL4] Zynq UltraScale+ locks up after hours running
Hi Leonid,
Could you provide a bit more about your software configuration? For instance, do you have multiple VMs running on dedicated hardware cores? How are the VM and processes configured?
Also, you mean there were no interrupts at all on all the four cores?
Regards, Yanyan
On Wed, 2019-10-02 at 16:01 +0000, Leonid Meyerovich wrote:
Hello,
We are running seL4 microkernel on 4 cores Zynq UltraScale+ (zcu102 board). The implementation includes multiple processes, hypervisor and virtual machine running on dedicated core. After several hours running (it could be 2 or even 8 hours) the whole microkernel locks up. After some investigation I have found that no interrupts generated anymore - at least there is no interrupts coming to ISR. Inside ISR I have monitored PL2 Physical Timer Control register, which feeds a scheduler and didn't find any problems - it stays enabled and not masked.
I will appreciate any idea/direction for approaching this problem.
Thank you,
Leonid
This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc. _______________________________________________ Devel mailing list Devel@sel4.systems https://sel4.systems/lists/listinfo/devel
________________________________ This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc.
________________________________ This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc.
________________________________ This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc. _______________________________________________ Devel mailing list Devel@sel4.systems https://sel4.systems/lists/listinfo/devel
Hi Yanyan, The interrupts is disabled in kernel mode, is there any circumstances that it will not be enabled before returning to userland? Thanks, Leonid -----Original Message----- From: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au> Sent: Friday, October 4, 2019 8:58 AM To: Leonid Meyerovich <lmeyerovich@i-a-i.com>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Leonid, I forgot to ask, what seL4 version are you using? Is it from the Data61 Github repo? If you also use CAmkES, which version is it? It would be helpful if you could isolate the 2 VMs, so we could have a smaller system to understand. Based on your description, my understanding is you use one VMM running on core 0 to manage two VMs running on core 1 and core 2? R5 is the always-on core? What does the application do? Regards, Yanyan -----Original Message----- From: Leonid Meyerovich <lmeyerovich@i-a-i.com> Sent: Friday, 4 October 2019 8:26 PM To: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Yanyan, Yes, VMM is running on core 0, creates 2 VM and run them on core 1 and core2. VM don't have any access to hardware and they receive virtual timer and maintenance interrupts. I am not sure I can stop running all processes on core 3, I should think about it, but in this case the whole condition will change. Also I probably didn't mention that R5 also runs an application and communicates to the application, which is running on core3 through openAmp. Thanks, Leonid -----Original Message----- From: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au> Sent: Friday, October 4, 2019 8:16 AM To: Leonid Meyerovich <lmeyerovich@i-a-i.com>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Leonid, What do you mean by "hypervisor on core 0"? Do you mean the VMM? I assume you create a VMM for each VM running, and also pin the VMMs on the corresponding physical cores? If so, core 1 and core 2 also should receive virtual timer interrupts and VGIC maintenance interrupts. Is it possible that you stop running the processes on core 3 and just keep running the VMs on different cores? Do the VMs have any accesses to physical hardware, for instance, clocks or watchdogs? Regards, Yanyan -----Original Message----- From: Leonid Meyerovich <lmeyerovich@i-a-i.com> Sent: Thursday, 3 October 2019 8:45 PM To: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Yanyan, I am running initial task and hypervisor on core 0. Hypervisor creates 2 VM and running them on core 1 and 2 Core 3 runs 7 processes that communicate through notification objects and shared memory (in pairs) On process on core 3 implements UART based connection (, this is PL uart, Rx use interrupt). On of core 3 process also runs SADA driver (also uses interrupt) VMs communicate to the rest of the system through 'virtual channels' - exceptions and shared memory. All hardware interrupts are processed by core 0 (please, correct me if I am wrong). But as far as I understand PL2 physical timer interrupt runs on every core. Every processes prints some messages on terminal. I have never seen that these messages have been printed partially what the system lock up, process completes printing the message and has never scheduled again. I have also printed some messages from inside of ISR after getting time interrupt (print interrupt counter once a second) and I don't see these messages when the system locks up. Thank you, Leonid -----Original Message----- From: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au> Sent: Thursday, October 3, 2019 2:44 AM To: Leonid Meyerovich <lmeyerovich@i-a-i.com>; devel@sel4.systems Subject: Re: [seL4] Zynq UltraScale+ locks up after hours running Hi Leonid, Could you provide a bit more about your software configuration? For instance, do you have multiple VMs running on dedicated hardware cores? How are the VM and processes configured? Also, you mean there were no interrupts at all on all the four cores? Regards, Yanyan On Wed, 2019-10-02 at 16:01 +0000, Leonid Meyerovich wrote:
Hello,
We are running seL4 microkernel on 4 cores Zynq UltraScale+ (zcu102 board). The implementation includes multiple processes, hypervisor and virtual machine running on dedicated core. After several hours running (it could be 2 or even 8 hours) the whole microkernel locks up. After some investigation I have found that no interrupts generated anymore - at least there is no interrupts coming to ISR. Inside ISR I have monitored PL2 Physical Timer Control register, which feeds a scheduler and didn't find any problems - it stays enabled and not masked.
I will appreciate any idea/direction for approaching this problem.
Thank you,
Leonid
This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc. _______________________________________________ Devel mailing list Devel@sel4.systems https://sel4.systems/lists/listinfo/devel
________________________________ This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc. ________________________________ This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc. ________________________________ This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc.
HI Leonid, You mean all interrupts are disabled or just interrupts associated with the VMs are disabled? If it is the later case, one possibility I can think of is that the VMM does not handle the maintenance interrupts so that the corresponding interrupts are not Acked, and thus not enabled. Since you use one VMM for two VMs, is it possible that the VMM mixes up the maintenance interrupts from different VMs on different cores? Regards, Yanyan -----Original Message----- From: Leonid Meyerovich <lmeyerovich@i-a-i.com> Sent: Friday, 4 October 2019 9:43 PM To: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Yanyan, The interrupts is disabled in kernel mode, is there any circumstances that it will not be enabled before returning to userland? Thanks, Leonid -----Original Message----- From: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au> Sent: Friday, October 4, 2019 8:58 AM To: Leonid Meyerovich <lmeyerovich@i-a-i.com>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Leonid, I forgot to ask, what seL4 version are you using? Is it from the Data61 Github repo? If you also use CAmkES, which version is it? It would be helpful if you could isolate the 2 VMs, so we could have a smaller system to understand. Based on your description, my understanding is you use one VMM running on core 0 to manage two VMs running on core 1 and core 2? R5 is the always-on core? What does the application do? Regards, Yanyan -----Original Message----- From: Leonid Meyerovich <lmeyerovich@i-a-i.com> Sent: Friday, 4 October 2019 8:26 PM To: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Yanyan, Yes, VMM is running on core 0, creates 2 VM and run them on core 1 and core2. VM don't have any access to hardware and they receive virtual timer and maintenance interrupts. I am not sure I can stop running all processes on core 3, I should think about it, but in this case the whole condition will change. Also I probably didn't mention that R5 also runs an application and communicates to the application, which is running on core3 through openAmp. Thanks, Leonid -----Original Message----- From: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au> Sent: Friday, October 4, 2019 8:16 AM To: Leonid Meyerovich <lmeyerovich@i-a-i.com>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Leonid, What do you mean by "hypervisor on core 0"? Do you mean the VMM? I assume you create a VMM for each VM running, and also pin the VMMs on the corresponding physical cores? If so, core 1 and core 2 also should receive virtual timer interrupts and VGIC maintenance interrupts. Is it possible that you stop running the processes on core 3 and just keep running the VMs on different cores? Do the VMs have any accesses to physical hardware, for instance, clocks or watchdogs? Regards, Yanyan -----Original Message----- From: Leonid Meyerovich <lmeyerovich@i-a-i.com> Sent: Thursday, 3 October 2019 8:45 PM To: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Yanyan, I am running initial task and hypervisor on core 0. Hypervisor creates 2 VM and running them on core 1 and 2 Core 3 runs 7 processes that communicate through notification objects and shared memory (in pairs) On process on core 3 implements UART based connection (, this is PL uart, Rx use interrupt). On of core 3 process also runs SADA driver (also uses interrupt) VMs communicate to the rest of the system through 'virtual channels' - exceptions and shared memory. All hardware interrupts are processed by core 0 (please, correct me if I am wrong). But as far as I understand PL2 physical timer interrupt runs on every core. Every processes prints some messages on terminal. I have never seen that these messages have been printed partially what the system lock up, process completes printing the message and has never scheduled again. I have also printed some messages from inside of ISR after getting time interrupt (print interrupt counter once a second) and I don't see these messages when the system locks up. Thank you, Leonid -----Original Message----- From: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au> Sent: Thursday, October 3, 2019 2:44 AM To: Leonid Meyerovich <lmeyerovich@i-a-i.com>; devel@sel4.systems Subject: Re: [seL4] Zynq UltraScale+ locks up after hours running Hi Leonid, Could you provide a bit more about your software configuration? For instance, do you have multiple VMs running on dedicated hardware cores? How are the VM and processes configured? Also, you mean there were no interrupts at all on all the four cores? Regards, Yanyan On Wed, 2019-10-02 at 16:01 +0000, Leonid Meyerovich wrote:
Hello,
We are running seL4 microkernel on 4 cores Zynq UltraScale+ (zcu102 board). The implementation includes multiple processes, hypervisor and virtual machine running on dedicated core. After several hours running (it could be 2 or even 8 hours) the whole microkernel locks up. After some investigation I have found that no interrupts generated anymore - at least there is no interrupts coming to ISR. Inside ISR I have monitored PL2 Physical Timer Control register, which feeds a scheduler and didn't find any problems - it stays enabled and not masked.
I will appreciate any idea/direction for approaching this problem.
Thank you,
Leonid
This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc. _______________________________________________ Devel mailing list Devel@sel4.systems https://sel4.systems/lists/listinfo/devel
________________________________ This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc. ________________________________ This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc. ________________________________ This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc.
Hi Yanyan, A far as I understand in seL4 kernel runs with all interrupts disabled, please correct me if I am wrong. Even VM failed to acknowledge an interrupt, all other cores/processed should still keep working. Is it true? At lease all processes running on core 3 should still get a timer interrupt and communicate with each other. Thanks, Leonid -----Original Message----- From: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au> Sent: Friday, October 4, 2019 9:57 AM To: Leonid Meyerovich <lmeyerovich@i-a-i.com>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running HI Leonid, You mean all interrupts are disabled or just interrupts associated with the VMs are disabled? If it is the later case, one possibility I can think of is that the VMM does not handle the maintenance interrupts so that the corresponding interrupts are not Acked, and thus not enabled. Since you use one VMM for two VMs, is it possible that the VMM mixes up the maintenance interrupts from different VMs on different cores? Regards, Yanyan -----Original Message----- From: Leonid Meyerovich <lmeyerovich@i-a-i.com> Sent: Friday, 4 October 2019 9:43 PM To: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Yanyan, The interrupts is disabled in kernel mode, is there any circumstances that it will not be enabled before returning to userland? Thanks, Leonid -----Original Message----- From: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au> Sent: Friday, October 4, 2019 8:58 AM To: Leonid Meyerovich <lmeyerovich@i-a-i.com>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Leonid, I forgot to ask, what seL4 version are you using? Is it from the Data61 Github repo? If you also use CAmkES, which version is it? It would be helpful if you could isolate the 2 VMs, so we could have a smaller system to understand. Based on your description, my understanding is you use one VMM running on core 0 to manage two VMs running on core 1 and core 2? R5 is the always-on core? What does the application do? Regards, Yanyan -----Original Message----- From: Leonid Meyerovich <lmeyerovich@i-a-i.com> Sent: Friday, 4 October 2019 8:26 PM To: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Yanyan, Yes, VMM is running on core 0, creates 2 VM and run them on core 1 and core2. VM don't have any access to hardware and they receive virtual timer and maintenance interrupts. I am not sure I can stop running all processes on core 3, I should think about it, but in this case the whole condition will change. Also I probably didn't mention that R5 also runs an application and communicates to the application, which is running on core3 through openAmp. Thanks, Leonid -----Original Message----- From: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au> Sent: Friday, October 4, 2019 8:16 AM To: Leonid Meyerovich <lmeyerovich@i-a-i.com>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Leonid, What do you mean by "hypervisor on core 0"? Do you mean the VMM? I assume you create a VMM for each VM running, and also pin the VMMs on the corresponding physical cores? If so, core 1 and core 2 also should receive virtual timer interrupts and VGIC maintenance interrupts. Is it possible that you stop running the processes on core 3 and just keep running the VMs on different cores? Do the VMs have any accesses to physical hardware, for instance, clocks or watchdogs? Regards, Yanyan -----Original Message----- From: Leonid Meyerovich <lmeyerovich@i-a-i.com> Sent: Thursday, 3 October 2019 8:45 PM To: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hi Yanyan, I am running initial task and hypervisor on core 0. Hypervisor creates 2 VM and running them on core 1 and 2 Core 3 runs 7 processes that communicate through notification objects and shared memory (in pairs) On process on core 3 implements UART based connection (, this is PL uart, Rx use interrupt). On of core 3 process also runs SADA driver (also uses interrupt) VMs communicate to the rest of the system through 'virtual channels' - exceptions and shared memory. All hardware interrupts are processed by core 0 (please, correct me if I am wrong). But as far as I understand PL2 physical timer interrupt runs on every core. Every processes prints some messages on terminal. I have never seen that these messages have been printed partially what the system lock up, process completes printing the message and has never scheduled again. I have also printed some messages from inside of ISR after getting time interrupt (print interrupt counter once a second) and I don't see these messages when the system locks up. Thank you, Leonid -----Original Message----- From: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au> Sent: Thursday, October 3, 2019 2:44 AM To: Leonid Meyerovich <lmeyerovich@i-a-i.com>; devel@sel4.systems Subject: Re: [seL4] Zynq UltraScale+ locks up after hours running Hi Leonid, Could you provide a bit more about your software configuration? For instance, do you have multiple VMs running on dedicated hardware cores? How are the VM and processes configured? Also, you mean there were no interrupts at all on all the four cores? Regards, Yanyan On Wed, 2019-10-02 at 16:01 +0000, Leonid Meyerovich wrote:
Hello,
We are running seL4 microkernel on 4 cores Zynq UltraScale+ (zcu102 board). The implementation includes multiple processes, hypervisor and virtual machine running on dedicated core. After several hours running (it could be 2 or even 8 hours) the whole microkernel locks up. After some investigation I have found that no interrupts generated anymore - at least there is no interrupts coming to ISR. Inside ISR I have monitored PL2 Physical Timer Control register, which feeds a scheduler and didn't find any problems - it stays enabled and not masked.
I will appreciate any idea/direction for approaching this problem.
Thank you,
Leonid
This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc. _______________________________________________ Devel mailing list Devel@sel4.systems https://sel4.systems/lists/listinfo/devel
________________________________ This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc. ________________________________ This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc. ________________________________ This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc. ________________________________ This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc.
Hi Everybody, I am not sure this is related to my previous question/e-mail, but I have collected some more data about an interrupts in my system. I observe a lot of interrupts with ID=0, and later with ID=1 (see below) Both IDs 0 and 1 are initialized as inactive in while booting all cores init_irqs(cap_t root_cnode_cap) { irq_t i; for (i = 0; i <= maxIRQ; i++) { setIRQState(IRQInactive, i); } COULD SOMEBODY EXPLAIN WHAT THESE INTERRUPTS MEANS AND WHY I AM GETTING THEM. Thank you, Leonid I have printed interrupt statistic from inside of handleInterrupt(irq_t irq) function. It has a following structure: void handleInterrupt(irq_t irq) { switch (intStateIRQTable[irq]) { case IRQSignal: { ... } case IRQTimer: timerTick(); resetTimer(); if ((++tickCount % 200) == 0) { printInt(0); printf("- - - \n"); printInt(1); printf("- - - \n"); printInt(2); printf("- - - \n"); printInt(3); printf("- - - - - - - \n"); cleanIntStat(); } break; #ifdef ENABLE_SMP_SUPPORT case IRQIPI: handleIPI(irq, true); break; #endif /* ENABLE_SMP_SUPPORT */ case IRQReserved: #ifdef CONFIG_IRQ_REPORTING //printf("Received reserved IRQ: %d", (int)irq); #endif handleReservedIRQ(irq); break; case IRQInactive: /* * This case shouldn't happen anyway unless the hardware or * platform code is broken. Hopefully masking it again should make * the interrupt go away. */ maskInterrupt(true, irq); #ifdef CONFIG_IRQ_REPORTING printf("Received disabled IRQ: %d\n", (int)irq); #endif break; default: /* No corresponding haskell error */ fail("Invalid IRQ state"); } ackInterrupt(irq); } printInt function prints all collected interrupts for one core in the following format: ---irq stat: cpu=0 (2) 26_50_50 0_258_258 the number inside parentheses is the number of interrupt for particular time interval - 1 sec, ignore it Every interrupt statistic: ID_(enter counter)_(exit_counter) each counter is for 1 sec interval ---irq stat: cpu=0 (2) 26_50_50 0_258_258 - - - ---irq stat: cpu=1 (2) 26_50_50 0_258_258 - - - ---irq stat: cpu=2 (2) 26_50_50 0_258_258 - - - ---irq stat: cpu=3 (3) 26_50_50 65_1_1 59_1_1 - - - - - - - ---irq stat: cpu=0 (2) 26_50_50 0_257_257 - - - ---irq stat: cpu=1 (2) 26_50_50 0_280_280 - - - ---irq stat: cpu=2 (2) 26_50_50 0_280_280 .................................... end of the first excerpt - - - - - - - ---irq stat: cpu=0 (1) 26_50_50 - - - ---irq stat: cpu=1 (1) 26_50_50 - - - ---irq stat: cpu=2 (1) 26_50_50 - - - ---irq stat: cpu=3 (1) 26_50_50 - - - - - - - .................................... end of the another excerpt - - - - - - - ---irq stat: cpu=0 (1) 26_50_51 - - - ---irq stat: cpu=1 (2) 26_50_50 0_4382_4382 - - - ---irq stat: cpu=2 (2) 26_50_49 0_4382_4382 - - - ---irq stat: cpu=3 (2) 26_50_50 0_4382_4382 - - - - - - - .................................... end of the another excerpt - - - - - - - ---irq stat: cpu=0 (2) 26_50_51 1_317_317 - - - ---irq stat: cpu=1 (4) 26_50_50 25_34_34 1_132_132 27_16_16 - - - ---irq stat: cpu=2 (4) 26_50_50 27_17_17 25_32_32 1_130_130 - - - ---irq stat: cpu=3 (2) 26_50_49 1_28_28 - - - - - - - .................................... end of the another excerpt -----Original Message----- From: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au> Sent: Thursday, October 3, 2019 2:44 AM To: Leonid Meyerovich <lmeyerovich@i-a-i.com>; devel@sel4.systems Subject: Re: [seL4] Zynq UltraScale+ locks up after hours running Hi Leonid, Could you provide a bit more about your software configuration? For instance, do you have multiple VMs running on dedicated hardware cores? How are the VM and processes configured? Also, you mean there were no interrupts at all on all the four cores? Regards, Yanyan On Wed, 2019-10-02 at 16:01 +0000, Leonid Meyerovich wrote:
Hello,
We are running seL4 microkernel on 4 cores Zynq UltraScale+ (zcu102 board). The implementation includes multiple processes, hypervisor and virtual machine running on dedicated core. After several hours running (it could be 2 or even 8 hours) the whole microkernel locks up. After some investigation I have found that no interrupts generated anymore - at least there is no interrupts coming to ISR. Inside ISR I have monitored PL2 Physical Timer Control register, which feeds a scheduler and didn't find any problems - it stays enabled and not masked.
I will appreciate any idea/direction for approaching this problem.
Thank you,
Leonid
This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc. _______________________________________________ Devel mailing list Devel@sel4.systems https://sel4.systems/lists/listinfo/devel
________________________________ This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc.
I forgot to mention: my board is zcu102, 4 cores A53 -----Original Message----- From: Devel <devel-bounces@sel4.systems> On Behalf Of Leonid Meyerovich Sent: Thursday, October 3, 2019 2:40 PM To: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au>; devel@sel4.systems Subject: Re: [seL4] Zynq UltraScale+ locks up after hours running Hi Everybody, I am not sure this is related to my previous question/e-mail, but I have collected some more data about an interrupts in my system. I observe a lot of interrupts with ID=0, and later with ID=1 (see below) Both IDs 0 and 1 are initialized as inactive in while booting all cores init_irqs(cap_t root_cnode_cap) { irq_t i; for (i = 0; i <= maxIRQ; i++) { setIRQState(IRQInactive, i); } COULD SOMEBODY EXPLAIN WHAT THESE INTERRUPTS MEANS AND WHY I AM GETTING THEM. Thank you, Leonid I have printed interrupt statistic from inside of handleInterrupt(irq_t irq) function. It has a following structure: void handleInterrupt(irq_t irq) { switch (intStateIRQTable[irq]) { case IRQSignal: { ... } case IRQTimer: timerTick(); resetTimer(); if ((++tickCount % 200) == 0) { printInt(0); printf("- - - \n"); printInt(1); printf("- - - \n"); printInt(2); printf("- - - \n"); printInt(3); printf("- - - - - - - \n"); cleanIntStat(); } break; #ifdef ENABLE_SMP_SUPPORT case IRQIPI: handleIPI(irq, true); break; #endif /* ENABLE_SMP_SUPPORT */ case IRQReserved: #ifdef CONFIG_IRQ_REPORTING //printf("Received reserved IRQ: %d", (int)irq); #endif handleReservedIRQ(irq); break; case IRQInactive: /* * This case shouldn't happen anyway unless the hardware or * platform code is broken. Hopefully masking it again should make * the interrupt go away. */ maskInterrupt(true, irq); #ifdef CONFIG_IRQ_REPORTING printf("Received disabled IRQ: %d\n", (int)irq); #endif break; default: /* No corresponding haskell error */ fail("Invalid IRQ state"); } ackInterrupt(irq); } printInt function prints all collected interrupts for one core in the following format: ---irq stat: cpu=0 (2) 26_50_50 0_258_258 the number inside parentheses is the number of interrupt for particular time interval - 1 sec, ignore it Every interrupt statistic: ID_(enter counter)_(exit_counter) each counter is for 1 sec interval ---irq stat: cpu=0 (2) 26_50_50 0_258_258 - - - ---irq stat: cpu=1 (2) 26_50_50 0_258_258 - - - ---irq stat: cpu=2 (2) 26_50_50 0_258_258 - - - ---irq stat: cpu=3 (3) 26_50_50 65_1_1 59_1_1 - - - - - - - ---irq stat: cpu=0 (2) 26_50_50 0_257_257 - - - ---irq stat: cpu=1 (2) 26_50_50 0_280_280 - - - ---irq stat: cpu=2 (2) 26_50_50 0_280_280 .................................... end of the first excerpt - - - - - - - ---irq stat: cpu=0 (1) 26_50_50 - - - ---irq stat: cpu=1 (1) 26_50_50 - - - ---irq stat: cpu=2 (1) 26_50_50 - - - ---irq stat: cpu=3 (1) 26_50_50 - - - - - - - .................................... end of the another excerpt - - - - - - - ---irq stat: cpu=0 (1) 26_50_51 - - - ---irq stat: cpu=1 (2) 26_50_50 0_4382_4382 - - - ---irq stat: cpu=2 (2) 26_50_49 0_4382_4382 - - - ---irq stat: cpu=3 (2) 26_50_50 0_4382_4382 - - - - - - - .................................... end of the another excerpt - - - - - - - ---irq stat: cpu=0 (2) 26_50_51 1_317_317 - - - ---irq stat: cpu=1 (4) 26_50_50 25_34_34 1_132_132 27_16_16 - - - ---irq stat: cpu=2 (4) 26_50_50 27_17_17 25_32_32 1_130_130 - - - ---irq stat: cpu=3 (2) 26_50_49 1_28_28 - - - - - - - .................................... end of the another excerpt -----Original Message----- From: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au> Sent: Thursday, October 3, 2019 2:44 AM To: Leonid Meyerovich <lmeyerovich@i-a-i.com>; devel@sel4.systems Subject: Re: [seL4] Zynq UltraScale+ locks up after hours running Hi Leonid, Could you provide a bit more about your software configuration? For instance, do you have multiple VMs running on dedicated hardware cores? How are the VM and processes configured? Also, you mean there were no interrupts at all on all the four cores? Regards, Yanyan On Wed, 2019-10-02 at 16:01 +0000, Leonid Meyerovich wrote:
Hello,
We are running seL4 microkernel on 4 cores Zynq UltraScale+ (zcu102 board). The implementation includes multiple processes, hypervisor and virtual machine running on dedicated core. After several hours running (it could be 2 or even 8 hours) the whole microkernel locks up. After some investigation I have found that no interrupts generated anymore - at least there is no interrupts coming to ISR. Inside ISR I have monitored PL2 Physical Timer Control register, which feeds a scheduler and didn't find any problems - it stays enabled and not masked.
I will appreciate any idea/direction for approaching this problem.
Thank you,
Leonid
This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc. _______________________________________________ Devel mailing list Devel@sel4.systems https://sel4.systems/lists/listinfo/devel
________________________________ This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc. _______________________________________________ Devel mailing list Devel@sel4.systems https://sel4.systems/lists/listinfo/devel ________________________________ This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc.
Hello Leonid, I'm assuming you're pointing to code from src/arch/arm/kernel/boot.c? If that’s the case, then you're right, every interrupt in the system is turned off initially. The kernel then turns on some PPIs for timer (and if in hypervisor mode, vgic maintenance). If you're running an SMP system, the irq_remote_call_ipi and irq_reschedule_ipi (interrupts 0 and 1) are turned back on such that cores can communicate with each other. See the links below. https://github.com/seL4/seL4/blob/8234026c1fb827abee75731b2575ddb741687eff/i... https://github.com/seL4/seL4/blob/7798b4767dcb7bcd3699f191f685435cfa645518/s... Chris -----Original Message----- From: Devel <devel-bounces@sel4.systems> On Behalf Of Leonid Meyerovich Sent: Thursday, October 3, 2019 2:51 PM To: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au>; devel@sel4.systems Subject: Re: [seL4] Zynq UltraScale+ locks up after hours running CAUTION: This email originated from outside the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe. I forgot to mention: my board is zcu102, 4 cores A53 -----Original Message----- From: Devel <devel-bounces@sel4.systems> On Behalf Of Leonid Meyerovich Sent: Thursday, October 3, 2019 2:40 PM To: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au>; devel@sel4.systems Subject: Re: [seL4] Zynq UltraScale+ locks up after hours running Hi Everybody, I am not sure this is related to my previous question/e-mail, but I have collected some more data about an interrupts in my system. I observe a lot of interrupts with ID=0, and later with ID=1 (see below) Both IDs 0 and 1 are initialized as inactive in while booting all cores init_irqs(cap_t root_cnode_cap) { irq_t i; for (i = 0; i <= maxIRQ; i++) { setIRQState(IRQInactive, i); } COULD SOMEBODY EXPLAIN WHAT THESE INTERRUPTS MEANS AND WHY I AM GETTING THEM. Thank you, Leonid I have printed interrupt statistic from inside of handleInterrupt(irq_t irq) function. It has a following structure: void handleInterrupt(irq_t irq) { switch (intStateIRQTable[irq]) { case IRQSignal: { ... } case IRQTimer: timerTick(); resetTimer(); if ((++tickCount % 200) == 0) { printInt(0); printf("- - - \n"); printInt(1); printf("- - - \n"); printInt(2); printf("- - - \n"); printInt(3); printf("- - - - - - - \n"); cleanIntStat(); } break; #ifdef ENABLE_SMP_SUPPORT case IRQIPI: handleIPI(irq, true); break; #endif /* ENABLE_SMP_SUPPORT */ case IRQReserved: #ifdef CONFIG_IRQ_REPORTING //printf("Received reserved IRQ: %d", (int)irq); #endif handleReservedIRQ(irq); break; case IRQInactive: /* * This case shouldn't happen anyway unless the hardware or * platform code is broken. Hopefully masking it again should make * the interrupt go away. */ maskInterrupt(true, irq); #ifdef CONFIG_IRQ_REPORTING printf("Received disabled IRQ: %d\n", (int)irq); #endif break; default: /* No corresponding haskell error */ fail("Invalid IRQ state"); } ackInterrupt(irq); } printInt function prints all collected interrupts for one core in the following format: ---irq stat: cpu=0 (2) 26_50_50 0_258_258 the number inside parentheses is the number of interrupt for particular time interval - 1 sec, ignore it Every interrupt statistic: ID_(enter counter)_(exit_counter) each counter is for 1 sec interval ---irq stat: cpu=0 (2) 26_50_50 0_258_258 - - - ---irq stat: cpu=1 (2) 26_50_50 0_258_258 - - - ---irq stat: cpu=2 (2) 26_50_50 0_258_258 - - - ---irq stat: cpu=3 (3) 26_50_50 65_1_1 59_1_1 - - - - - - - ---irq stat: cpu=0 (2) 26_50_50 0_257_257 - - - ---irq stat: cpu=1 (2) 26_50_50 0_280_280 - - - ---irq stat: cpu=2 (2) 26_50_50 0_280_280 .................................... end of the first excerpt - - - - - - - ---irq stat: cpu=0 (1) 26_50_50 - - - ---irq stat: cpu=1 (1) 26_50_50 - - - ---irq stat: cpu=2 (1) 26_50_50 - - - ---irq stat: cpu=3 (1) 26_50_50 - - - - - - - .................................... end of the another excerpt - - - - - - - ---irq stat: cpu=0 (1) 26_50_51 - - - ---irq stat: cpu=1 (2) 26_50_50 0_4382_4382 - - - ---irq stat: cpu=2 (2) 26_50_49 0_4382_4382 - - - ---irq stat: cpu=3 (2) 26_50_50 0_4382_4382 - - - - - - - .................................... end of the another excerpt - - - - - - - ---irq stat: cpu=0 (2) 26_50_51 1_317_317 - - - ---irq stat: cpu=1 (4) 26_50_50 25_34_34 1_132_132 27_16_16 - - - ---irq stat: cpu=2 (4) 26_50_50 27_17_17 25_32_32 1_130_130 - - - ---irq stat: cpu=3 (2) 26_50_49 1_28_28 - - - - - - - .................................... end of the another excerpt -----Original Message----- From: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au> Sent: Thursday, October 3, 2019 2:44 AM To: Leonid Meyerovich <lmeyerovich@i-a-i.com>; devel@sel4.systems Subject: Re: [seL4] Zynq UltraScale+ locks up after hours running Hi Leonid, Could you provide a bit more about your software configuration? For instance, do you have multiple VMs running on dedicated hardware cores? How are the VM and processes configured? Also, you mean there were no interrupts at all on all the four cores? Regards, Yanyan On Wed, 2019-10-02 at 16:01 +0000, Leonid Meyerovich wrote:
Hello,
We are running seL4 microkernel on 4 cores Zynq UltraScale+ (zcu102 board). The implementation includes multiple processes, hypervisor and virtual machine running on dedicated core. After several hours running (it could be 2 or even 8 hours) the whole microkernel locks up. After some investigation I have found that no interrupts generated anymore - at least there is no interrupts coming to ISR. Inside ISR I have monitored PL2 Physical Timer Control register, which feeds a scheduler and didn't find any problems - it stays enabled and not masked.
I will appreciate any idea/direction for approaching this problem.
Thank you,
Leonid
This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc. _______________________________________________ Devel mailing list Devel@sel4.systems https://sel4.systems/lists/listinfo/devel
________________________________ This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc. _______________________________________________ Devel mailing list Devel@sel4.systems https://sel4.systems/lists/listinfo/devel ________________________________ This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc. _______________________________________________ Devel mailing list Devel@sel4.systems https://sel4.systems/lists/listinfo/devel
Thank you, Chris. Now I understand. I was confused, because in kernel/include/plat/zynqmp/plat/machine.h these interrupts are commented out Leonid -----Original Message----- From: Chris Guikema <Chris.Guikema@dornerworks.com> Sent: Thursday, October 3, 2019 3:44 PM To: Leonid Meyerovich <lmeyerovich@i-a-i.com>; Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au>; devel@sel4.systems Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running Hello Leonid, I'm assuming you're pointing to code from src/arch/arm/kernel/boot.c? If that’s the case, then you're right, every interrupt in the system is turned off initially. The kernel then turns on some PPIs for timer (and if in hypervisor mode, vgic maintenance). If you're running an SMP system, the irq_remote_call_ipi and irq_reschedule_ipi (interrupts 0 and 1) are turned back on such that cores can communicate with each other. See the links below. https://github.com/seL4/seL4/blob/8234026c1fb827abee75731b2575ddb741687eff/i... https://github.com/seL4/seL4/blob/7798b4767dcb7bcd3699f191f685435cfa645518/s... Chris -----Original Message----- From: Devel <devel-bounces@sel4.systems> On Behalf Of Leonid Meyerovich Sent: Thursday, October 3, 2019 2:51 PM To: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au>; devel@sel4.systems Subject: Re: [seL4] Zynq UltraScale+ locks up after hours running CAUTION: This email originated from outside the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe. I forgot to mention: my board is zcu102, 4 cores A53 -----Original Message----- From: Devel <devel-bounces@sel4.systems> On Behalf Of Leonid Meyerovich Sent: Thursday, October 3, 2019 2:40 PM To: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au>; devel@sel4.systems Subject: Re: [seL4] Zynq UltraScale+ locks up after hours running Hi Everybody, I am not sure this is related to my previous question/e-mail, but I have collected some more data about an interrupts in my system. I observe a lot of interrupts with ID=0, and later with ID=1 (see below) Both IDs 0 and 1 are initialized as inactive in while booting all cores init_irqs(cap_t root_cnode_cap) { irq_t i; for (i = 0; i <= maxIRQ; i++) { setIRQState(IRQInactive, i); } COULD SOMEBODY EXPLAIN WHAT THESE INTERRUPTS MEANS AND WHY I AM GETTING THEM. Thank you, Leonid I have printed interrupt statistic from inside of handleInterrupt(irq_t irq) function. It has a following structure: void handleInterrupt(irq_t irq) { switch (intStateIRQTable[irq]) { case IRQSignal: { ... } case IRQTimer: timerTick(); resetTimer(); if ((++tickCount % 200) == 0) { printInt(0); printf("- - - \n"); printInt(1); printf("- - - \n"); printInt(2); printf("- - - \n"); printInt(3); printf("- - - - - - - \n"); cleanIntStat(); } break; #ifdef ENABLE_SMP_SUPPORT case IRQIPI: handleIPI(irq, true); break; #endif /* ENABLE_SMP_SUPPORT */ case IRQReserved: #ifdef CONFIG_IRQ_REPORTING //printf("Received reserved IRQ: %d", (int)irq); #endif handleReservedIRQ(irq); break; case IRQInactive: /* * This case shouldn't happen anyway unless the hardware or * platform code is broken. Hopefully masking it again should make * the interrupt go away. */ maskInterrupt(true, irq); #ifdef CONFIG_IRQ_REPORTING printf("Received disabled IRQ: %d\n", (int)irq); #endif break; default: /* No corresponding haskell error */ fail("Invalid IRQ state"); } ackInterrupt(irq); } printInt function prints all collected interrupts for one core in the following format: ---irq stat: cpu=0 (2) 26_50_50 0_258_258 the number inside parentheses is the number of interrupt for particular time interval - 1 sec, ignore it Every interrupt statistic: ID_(enter counter)_(exit_counter) each counter is for 1 sec interval ---irq stat: cpu=0 (2) 26_50_50 0_258_258 - - - ---irq stat: cpu=1 (2) 26_50_50 0_258_258 - - - ---irq stat: cpu=2 (2) 26_50_50 0_258_258 - - - ---irq stat: cpu=3 (3) 26_50_50 65_1_1 59_1_1 - - - - - - - ---irq stat: cpu=0 (2) 26_50_50 0_257_257 - - - ---irq stat: cpu=1 (2) 26_50_50 0_280_280 - - - ---irq stat: cpu=2 (2) 26_50_50 0_280_280 .................................... end of the first excerpt - - - - - - - ---irq stat: cpu=0 (1) 26_50_50 - - - ---irq stat: cpu=1 (1) 26_50_50 - - - ---irq stat: cpu=2 (1) 26_50_50 - - - ---irq stat: cpu=3 (1) 26_50_50 - - - - - - - .................................... end of the another excerpt - - - - - - - ---irq stat: cpu=0 (1) 26_50_51 - - - ---irq stat: cpu=1 (2) 26_50_50 0_4382_4382 - - - ---irq stat: cpu=2 (2) 26_50_49 0_4382_4382 - - - ---irq stat: cpu=3 (2) 26_50_50 0_4382_4382 - - - - - - - .................................... end of the another excerpt - - - - - - - ---irq stat: cpu=0 (2) 26_50_51 1_317_317 - - - ---irq stat: cpu=1 (4) 26_50_50 25_34_34 1_132_132 27_16_16 - - - ---irq stat: cpu=2 (4) 26_50_50 27_17_17 25_32_32 1_130_130 - - - ---irq stat: cpu=3 (2) 26_50_49 1_28_28 - - - - - - - .................................... end of the another excerpt -----Original Message----- From: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au> Sent: Thursday, October 3, 2019 2:44 AM To: Leonid Meyerovich <lmeyerovich@i-a-i.com>; devel@sel4.systems Subject: Re: [seL4] Zynq UltraScale+ locks up after hours running Hi Leonid, Could you provide a bit more about your software configuration? For instance, do you have multiple VMs running on dedicated hardware cores? How are the VM and processes configured? Also, you mean there were no interrupts at all on all the four cores? Regards, Yanyan On Wed, 2019-10-02 at 16:01 +0000, Leonid Meyerovich wrote:
Hello,
We are running seL4 microkernel on 4 cores Zynq UltraScale+ (zcu102 board). The implementation includes multiple processes, hypervisor and virtual machine running on dedicated core. After several hours running (it could be 2 or even 8 hours) the whole microkernel locks up. After some investigation I have found that no interrupts generated anymore - at least there is no interrupts coming to ISR. Inside ISR I have monitored PL2 Physical Timer Control register, which feeds a scheduler and didn't find any problems - it stays enabled and not masked.
I will appreciate any idea/direction for approaching this problem.
Thank you,
Leonid
This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc. _______________________________________________ Devel mailing list Devel@sel4.systems https://sel4.systems/lists/listinfo/devel
________________________________ This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc. _______________________________________________ Devel mailing list Devel@sel4.systems https://sel4.systems/lists/listinfo/devel ________________________________ This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc. _______________________________________________ Devel mailing list Devel@sel4.systems https://sel4.systems/lists/listinfo/devel ________________________________ This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc.
participants (5)
-
Chris Guikema
-
Heiser, Gernot (Data61, Kensington NSW)
-
Leonid Meyerovich
-
Mcleod, Kent (Data61, Kensington NSW)
-
Shen, Yanyan (Data61, Kensington NSW)