Hi Yanyan,

 

I thought about just long integer variable, incremented inside clh_lock_acquire loop.

I may also try to enable CONFIG_ENABLE_BENCHMARKS and to use timestamp() function.

 

Thanks,

Leonid

 

 

From: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au>
Sent: Friday, October 4, 2019 11:23 AM
To: Leonid Meyerovich <lmeyerovich@i-a-i.com>; devel@sel4.systems
Subject: Re: [seL4] Zynq UltraScale+ locks up after hours running

 

Hi Leonid,

Yes I mean CPU cycle counters.

Regards,

Yanyan


From: Leonid Meyerovich <lmeyerovich@i-a-i.com>
Sent: Friday, October 4, 2019 11:00:31 PM
To: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au>; devel@sel4.systems <devel@sel4.systems>
Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running

 

Hi Yanyan,

Actually I do have JTAG, but I am not sure how I can use it without any development environment. Is there any JTAG application that I can use while connected to my device through JTAG interface?

The second method is also looks promising. What is CPU timestamp? I can use simple counter to estimate the time waiting for big lock.

Thanks,
Leonid


-----Original Message-----
From: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au>
Sent: Friday, October 4, 2019 10:35 AM
To: Leonid Meyerovich <lmeyerovich@i-a-i.com>; devel@sel4.systems
Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running

Hi Leonid,

It would be helpful if you could use JTAG, so you can observe the instruction pointers of the cores. If it is the case, three cores should execute several instructions repeatedly.

If  you do not have access to JTAG, maybe you could read CPU timestamps in the while loop of the inline function clh_lock_acqure in kernel/include/smp/lock.h. If a core has spent too much time in the loop (e.g. several seconds), and three cores report the same condition, we probably hit the scenario.


Regards,
Yanyan


-----Original Message-----
From: Leonid Meyerovich <lmeyerovich@i-a-i.com>
Sent: Friday, 4 October 2019 10:19 PM
To: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au>; devel@sel4.systems
Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running

Hi Yanyan,

It would explain everything.
How to proof/disprove it?

Thanks,
Leonid

-----Original Message-----
From: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au>
Sent: Friday, October 4, 2019 10:11 AM
To: Leonid Meyerovich <lmeyerovich@i-a-i.com>; devel@sel4.systems
Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running

HI Leonid,

For the big lock kernel, a possibility is that one core is in kernel mode, and for some reasons, it does not release the kernel big lock. Other cores keep trying to grab the lock when they need to handle interrupts or system calls, but they fail because the lock is not released.


Regards,
Yanyan


-----Original Message-----
From: Leonid Meyerovich <lmeyerovich@i-a-i.com>
Sent: Friday, 4 October 2019 10:00 PM
To: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au>; devel@sel4.systems
Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running

Hi Yanyan,

Yes, our VMM is based on the libsel4armvmm.
My prints is implemented in ISR and it is printing until the system receives timer interrupt. That is how I understand that I don't have a timer interrupt anymore.
Usually I print from timer interrupt every second (the log below was implemented for different purpose).

BTW, kernel is running on every core, how it is possible that all cores stop receiving a timer interrupt (actually not just timer, but also IPI)?

Thanks,
Leonid




-----Original Message-----
From: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au>
Sent: Friday, October 4, 2019 9:49 AM
To: Leonid Meyerovich <lmeyerovich@i-a-i.com>; devel@sel4.systems
Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running

Hi Leonid,

It looks like the seL4 v8.0.0 does not include Armv8 virtualization extension support yet ...
So basically you wrote the VMM based on the libsel4armvmm?

I am a bit confused by the interrupt log you provided.

It looks like the log output depends on getting kernel timer interrupts.

case IRQTimer:
        timerTick();
        resetTimer();
        if ((++tickCount % 200) == 0)
        {
        printInt(0);
        printf("- - - \n");
        printInt(1);
        printf("- - - \n");
        printInt(2);
        printf("- - - \n");
        printInt(3);
        printf("- - - - - - -  \n");
        cleanIntStat();
        }
        break;


 ---irq stat: cpu=0 (2) 26_50_50    0_258_258
- - -
---irq stat: cpu=1 (2) 26_50_50    0_258_258
- - -
---irq stat: cpu=2 (2) 26_50_50    0_258_258
- - -
---irq stat: cpu=3 (3) 26_50_50    65_1_1    59_1_1
- - - - - - -
---irq stat: cpu=0 (2) 26_50_50    0_257_257
- - -
---irq stat: cpu=1 (2) 26_50_50    0_280_280
- - -
---irq stat: cpu=2 (2) 26_50_50    0_280_280


I am a bit confused, and correct me if I am wrong.  If you do not get kernel timer interrupts, how could you get the output above? I guess at least one core got kernel timer interrupts to get the output?


Regards,
Yanyan


-----Original Message-----
From: Leonid Meyerovich <lmeyerovich@i-a-i.com>
Sent: Friday, 4 October 2019 9:11 PM
To: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au>; devel@sel4.systems
Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running

Hi Yanyan,

I am running seL4 version 8.0.0.0.
Core 3 is running 7 processes which intensively communicate to each other, sending a messages.
One of this process sends/receives a lot of message to both VMs ("virtual channels"), another sends/receives a lot messages (~1000 per sec) to R5 (openAmp).

VMM is running on core 0 and VMs are running on core 1 and 2.

Thanks,
Leonid


-----Original Message-----
From: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au>
Sent: Friday, October 4, 2019 8:58 AM
To: Leonid Meyerovich <lmeyerovich@i-a-i.com>; devel@sel4.systems
Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running

Hi Leonid,

I forgot to ask, what seL4 version are you using? Is it from the Data61 Github repo? If you also use CAmkES, which version is it? It would be helpful if you could isolate the 2 VMs, so we could have a smaller system to understand. Based on your description, my understanding is you use one VMM running on core 0 to manage two VMs running on core 1 and core 2?

R5 is the always-on core? What does the application do?

Regards,
Yanyan


-----Original Message-----
From: Leonid Meyerovich <lmeyerovich@i-a-i.com>
Sent: Friday, 4 October 2019 8:26 PM
To: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au>; devel@sel4.systems
Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running

Hi Yanyan,

Yes, VMM is running on core 0, creates 2 VM and run them on core 1 and core2. VM don't have any access to hardware and they receive virtual timer and maintenance interrupts.
I am not sure I can stop running all processes on core 3, I should think about it, but in this case the whole condition will change. Also I probably didn't mention that R5 also runs an application and communicates to the application, which is running on core3 through openAmp.

Thanks,
Leonid

-----Original Message-----
From: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au>
Sent: Friday, October 4, 2019 8:16 AM
To: Leonid Meyerovich <lmeyerovich@i-a-i.com>; devel@sel4.systems
Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running

Hi Leonid,

What do you mean by "hypervisor on core 0"? Do you mean the VMM? I assume you create a VMM for each VM  running, and also pin the VMMs on the corresponding physical cores? If so, core 1 and core 2 also should receive virtual timer interrupts and VGIC maintenance interrupts. Is it possible that you stop running the processes on core 3 and just keep running the VMs on different cores? Do the VMs have any accesses to physical hardware, for instance, clocks or watchdogs?


Regards,
Yanyan


-----Original Message-----
From: Leonid Meyerovich <lmeyerovich@i-a-i.com>
Sent: Thursday, 3 October 2019 8:45 PM
To: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au>; devel@sel4.systems
Subject: RE: [seL4] Zynq UltraScale+ locks up after hours running

Hi Yanyan,

I am running initial task and hypervisor on core 0.
Hypervisor creates 2 VM and running them on core 1 and 2 Core 3 runs 7 processes that communicate through notification objects and shared memory (in pairs) On process on core 3 implements UART based connection (, this is PL uart, Rx use  interrupt). On of core 3 process also runs SADA driver (also uses interrupt)

VMs communicate to the rest of the system through 'virtual channels' - exceptions and shared memory.

All hardware interrupts are processed by core 0 (please, correct me if I am wrong). But as far as I understand PL2 physical timer interrupt runs on every core.

Every processes prints some messages on terminal. I have never seen that these messages have been printed partially what the system lock up, process completes printing the message and has never scheduled again.
I have also printed some messages from inside of ISR after getting time interrupt (print interrupt counter once a second) and I don't see these messages when the system locks up.


Thank you,
Leonid


-----Original Message-----
From: Shen, Yanyan (Data61, Kensington NSW) <Yanyan.Shen@data61.csiro.au>
Sent: Thursday, October 3, 2019 2:44 AM
To: Leonid Meyerovich <lmeyerovich@i-a-i.com>; devel@sel4.systems
Subject: Re: [seL4] Zynq UltraScale+ locks up after hours running

Hi Leonid,

Could you provide a bit more about your software configuration? For instance, do you have multiple VMs running on dedicated hardware cores?
How are the VM and processes configured?

Also, you mean there were no interrupts at all on all the four cores?


Regards,
Yanyan

On Wed, 2019-10-02 at 16:01 +0000, Leonid Meyerovich wrote:
> Hello,
>
> We are running seL4 microkernel on 4 cores Zynq UltraScale+ (zcu102
> board). The implementation includes multiple processes, hypervisor and
> virtual machine running on dedicated core. After several hours running
> (it could be 2 or even 8 hours) the whole microkernel locks up. After
> some investigation I have found that no interrupts generated anymore -
> at least there is no interrupts coming to ISR.
> Inside ISR I have monitored PL2 Physical Timer Control register, which
> feeds a scheduler and didn't find any problems - it stays enabled and
> not masked.
>
> I will appreciate any idea/direction for approaching this problem.
>
> Thank you,
>
> Leonid
>
>
>
>
> This message and all attachments are PRIVATE, and contain information
> that is PROPRIETARY to Intelligent Automation, Inc. You are not
> authorized to transmit or otherwise disclose this message or any
> attachments to any third party whatsoever without the express written
> consent of Intelligent Automation, Inc. If you received this message
> in error or you are not willing to view this message or any
> attachments on a confidential basis, please immediately delete this
> email and any attachments and notify Intelligent Automation, Inc.
> _______________________________________________
> Devel mailing list
> Devel@sel4.systems
> https://sel4.systems/lists/listinfo/devel


________________________________
This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc.


________________________________
This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc.


________________________________
This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc.


________________________________
This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc.


________________________________
This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc.


________________________________
This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc.




This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc.