Re: [seL4] SMP on Zynq7000 zc702

5 Oct 2017

      Hesham,

I apologize for the delay. I have run the seL4 tests master on the zc702 board
with the bamboo smp config and that works as expected, all is well in the
universe there.
I have tried the master CAmkES solutions hello-2 application with the config set
to have 2 Max Cpus and changed the component affinity attributes in the top
level CAmkES file but it isn't getting past:

ELF-loader started on CPU: ARM Ltd. Cortex-A9 r3p0
  paddr=[10000000..103cc81f]
ELF-loading image 'kernel'
  paddr=[0..27fff]
  vaddr=[e0000000..e0027fff]
  virt_entry=e0000000
ELF-loading image 'capdl-loader-experimental'
  paddr=[28000..382fff]
  vaddr=[8000..362fff]
  virt_entry=fe30
Bringing up 1 other cpus
Enabling MMU and paging

I'm going to do some more debugging but I thought I'd let you know that the sel4
tests worked but not multi-core CAmkES.

On 08/13/2017 11:51 PM, Hesham.Almatary@data61.csiro.au wrote:
...
Hi Jesse,
On sel4test/SMP, TEST_FPU0002 test relies on platform-dependent timer driver to implement sleep()/timestamp(). In the case of zynq7000, TTC timers are being used. We have recently added this functionality (basically ltimer/gettime()). From top-level sel4test; it's at projects/util_libs/libplatsupport/:
commit b2670005df372ca7e22be4bd7bfa18c9bda10e89
Author: Hesham Almatary <hesham.almatary@data61.csiro.au>
Date:   Thu Jul 27 14:25:36 2017 +1000
ltimer/zynq: use ttc1_timer1 as a timestamp/gettime timer
Are you using the master branch of util_libs? Any test (e.g. TEST_FPU0002) that's using gettime() won't work without this commit. Note that it's not released yet.
Hope that helps.
Best,
Hesham
On 12/08/17 06:08, Jesse Millwood wrote:
...
Hesham,
Thank you for the response and the tips
I tried out the master version of the sel4-tests and still don't seem to be able
to complete all of the tests.
Looks like it crashed after the TEST_FPU0002
I still have some debugging to do, but a better idea of where to go now thanks.
Jesse
On 2017-08-08 07:56 PM, Hesham.Almatary@data61.csiro.au wrote:
...
Hi Jesse
On 09/08/17 06:45, Jesse Millwood wrote:
...
Hello,
I realize that the only officially supported SMP platform is the i.mx6
but I did see some code on 6.0 to master for SMP and the zynq7000 so I
am trying to test out some SMP functionality on my zc702 board.
I am wondering if anyone has this working because my system is failing
when compiling from master and the 6.0 and compatible tags.
6.0.x release doesn't support SMP on Zynq, it's supposed to be in the
next release. However, you can still use master branches to get SMP/Zynq
work. Another option is to just use the master branch in the elfloader
since it fixes SMP related bugs and adds a reset functionality to reset
zynq's secondary core.
From the config you provided, it seems like you enabled printing and
debug mode. Zynq/SMP has only been tested in release mode on
sel4test/zc706. This means it might not work if you enabled printing
and/or debug mode. Furthermore, we haven't tested it on camkes-based
projects yet.
From your thorough output (thanks!), everything seems to be fine; core1
should be going to restore_user_context and proceed with the idleThread
after releasing the lock, and core0 should get it, and go to the root
thread.
I'd suggest you try disabling printing/debug mode and run on
sel4test/master (bamboo_zynq7000_smp_release_xml_defconfig), just to
make sure this works on your board.
Please let us know if you still have the same issue.
Cheers,
Hesham
...
My second core seems to be coming up but the system ultimately fails
and prints out:
Bo
ot
nKgE RaNlEl L fDinAiTsAh eAdBO, RdT!r
ppFaeud lttio nugs eirn stspraucctei o
n: 0xe001d1b0                       o
FAR: 0xfffffff8 DFSR: 0x807
halting...
Kernel entry via Syscall, number: 1, Call
Cap type: 1, Invocation tag: 37
Which seems to be “Booting all finished, dropped to user space” from
core0 and “KERNEL DATA ABORT!” from core1.
The “0xe001d1b0” seems to be the label of the “idle_thread” function.
While stepping through via JTAG, I have verified that core1 gets
through “init_kernel” and then enters “restore_user_context” at some
point in “restore_user_context” the fault registers as shown in the
printed output are set. I think it is either in the c_exit_hook in
restore_user_context or after the program branches to “0xFFF0010”
which is “ldr pc,0xFFFF0030”. This branches to the
“arm_data_abort_exception” label, which goes to “kernel_data_fault”
label and then to “kernel data abort”.
I’m having trouble exactly pin pointing where the fault occurs but it
seems to be close to there.
Has anyone had similar issues with SMP?It seems to get fairly far
without setting the fault registers.
I have tried to step through the execution over JTAG and here are some
of my (verbose) notes
| CORE0 Address | Core0 Function              | Core0
Instruction      | CORE1 Address | Core1 Function
                    | Core1 Instruction    |       DFSR |       DFAR |
Note |
|---------------+-----------------------------+------------------------+---------------+------------------------------------+----------------------+------------+------------+------|
|    0x10000000 | label: start                | =cpsid
aif=            |    0xFFFFFF34 |                                    |
=mvn r0,#0x0f=       | 0x00000000 | 0x00005000 |      |
|    0x10003A2C | call: platform_init         | =bl
-x10003DD8=        |    0xFFFFFF30
|                                    | =wfe=                |
0x00000000 | 0x00005000 |      |
|    0x10003ACC | call: smp_boot              | =bl
0x100039FC=        |    0xFFFFFF34
|                                    | =mvn r0,#0x0f=       |
0x00000000 | 0x00005000 |      |
|    0x10003ADO | ret: smp_boot               | =bl
0x10005C54=        |    0x10000020 | in:
non_boot_core                  | =orr r0,r0,#0x40=    | 0x00000000 |
0x00005000 |    2 |
|    0x10003ADC | =if(is_hyp_mode())=         | =beq
0x10003AF0=       |    0x10002200 | label: arm_disable_dcaches        
| =push {r14}=         | 0x00000000 | 0x00005000 |      |
|    0x10003AFC | call: arm_enable_mmu        | =bl
0x10002174=        |    0xE0006190 | in:
try_init_kernel_secondary_core | =beq 0xE0001680=     | 0x00000000 |
0x00005000 |    1 |
|    0xE0001D70 | label: init_kernel          | =push
{r11,r14}=       |    0xE0001680 | in: try_init_kernel_secondary_core
| =beq 0xE0001680=     | 0x00000000 | 0x00005000 |    1 |
|    0xE0001814 | label: try_init_kernel      | =push
{r11,r14}=       |    0xE0001690 | in: try_init_kernel_secondary_core
| =beq 0xE0001680=     | 0x00000000 | 0x00005000 |    1 |
|    0xE0001B80 | call: create_initial_thread | =str r0,
[r11,#-0x14]= |    0xE0001680 | in: try_init_kernel_secondary_core |
=beq 0xE0001680=     | 0x00000000 | 0x00005000 |    1 |
|    0xE0001C48 | call: SMP_COND_STATEMENT    | =bl
0xE0003C20=        |    0xE0001680 | in:
try_init_kernel_secondary_core | =beq 0xE0001680=     | 0x00000000 |
0x00005000 | 1, 3 |
|    0xE0001C4C | call: SMP_COND_STATEMENT    | =bl
0xE00017D8=        |    0xE0001680 | in:
try_init_kernel_secondary_core | =beq 0xE0001680=     | 0x00000000 |
0x00005000 | 1, 4 |
|    0xE0001C50 | NODE_LOCK_SYS               | =bl
0xE0019280=        |    0xE0019288 | in:
getCurrentCPUIndex             | =sub r13,r13,#0x8=   | 0x00000000 |
0x00005000 |    5 |
|    0xE0001D1C | call: arch_pause            | =bl
0xE0019DB0=        |    0xE0019290 | in:
getCurrentCPUIndex             | =str r0,[r11,#-0x8]= | 0x00000000 |
0x00005000 |    6 |
|    0xE0001D40 | in: clh_lock_acquire        | =uxtb
r3,r3=           |    0xE00037F4 | in: init_core_state               
| =pop {r4,r11,pc}=    | 0x00000000 | 0x00005000 |      |
|    0xE0001D20 | in: clh_lock_acquire        | =mov
r2,#0xE800=       |    0xE0003754 | in: init_core_state               
| =movw r2,#0xE8E0=    | 0x00000000 | 0x00005000 |    7 |
|    0xE0001D40 | in: clh_lock_acquire        | =uxtb
r3,r3=           |    0xE00017C4 | in: try_init_kernel_secondary     
| =mov r3,#0x1=        | 0x00000000 | 0x00005000 |    8 |
|    0xE0001D40 | in: clh_lock_acquire        | =uxtb
r3,r3=           |    0xE002A06C | in: schedule                      
| =push {r11,r14}=     | 0x00000000 | 0x00005000 |    9 |
|    0xE0001D40 | in: clh_lock_acquire        | =uxtb
r3,r3=           |    0xE002979C | in: activateThread                
| =push {r11, r14}=    | 0x00000000 | 0x00005000 |   10 |
|    0xE0001D40 | in: clh_lock_acquire        | =uxtb
r3,r3=           |    0xE001D24C | label: Arch_activateIdleThread    
| =push {r11}=         | 0x00000000 | 0x00005000 |   11 |
|    0xE0001D38 | in: clh_lock_acquire        | =ldr
r3,[r3,#0x4]=     |    0xE0000054 | in: start                         
| =b 0xE001CEC8=       | 0x00000000 | 0x00005000 |   12 |
Notes
1. Core1 is in a =while (!node_boot_lock)= loop
2. In =smp_boot=, CORE1 changes after =init_cpus= (branch
location: ZSR:10003A08)
- In =smp_boot=, =boot_cpus= is called
- This sets the =CPU_JUMP_PTR= =*((volatile
uint32_t*)CPU_JUMP_PTR) = (uint32_t)entry;=
- calls =dsb= (data synchronization barrier)
- After this call, CPU1 goes to =FFFFFF2C: dsb sy=
- And then =sev=
- After this call, CPU1 goes to the =non_boot_core= label
- SEV
- SEV causes an event to be signaled to all cores
within a multiprocessor system. If SEV is implemented, WFE must also
be implemented.
- WFE
- If the Event Register is not set, WFE suspends
execution until one of the following events occurs:
- an IRQ interrupt, unless masked by the CPSR I-bit
- an FIQ interrupt, unless masked by the CPSR F-bit
- an Imprecise Data abort, unless masked by the
CPSR A-bit
- a Debug Entry request, if Debug is enabled
- an Event signaled by another processor using the
SEV instruction.
- If the Event Register is set, WFE clears it and returns
immediately.
- If WFE is implemented, SEV must also be implemented.
- After CPU0 executes =arm_enable_mmu()= from the =main= function
- by the end of =smp_boot= core1 is just starting =non_boot_main=
3. The =SMP_COND_STATEMENT= is calling =clh_lock_init=
4. The =SMP_COND_STATEMENT= is calling =release_secondary_cpus=
5. right after Core0 returned from releasing secondary cpus
- First time Core1 has exited the loop
- Core1's stack is
- =getCurrentCPUINdex=
- =init_core_state=
- =try_init_kernel_secondary_core=
- =init_kernel=
6. Core0 is in a
=while(big_kernel_lock.node_owners[cpu].next->value ! = CLHState_Granted)=
- Core0 is in a static inline function =clh_lock_acquire= in
=try_init_kernel=
- Core1 is in =getCurrentCPUIndex= but being called from
=tcbDebugAppend=
- =tcbDebugAppend= is being called from a =for= loop in
=init_core_state=
7. Core0 is still in the previously mentioned while loop
- Core1 is in "init_core_state" and has exited the for loop
that called =tcbDebugAppend(NODE_STATE_ON_CORE(ksIdleThread, i))=
8. Core0 is still in the previously mentioned while loop
- Core1 has returned to =try_init_kernel_secondary_core= from
=init_core_state= and is at the end of the function
9. Core0 is still in the previously mentioned while loop
- Core1 has entered the =init_kernel= call and then the
=schedule= function.
10. Core0 is still in the previously mentioned while loop
- Core1 has entered the =activateThread= call after
=schedule= in =init_kernel=
11. Core0 is still in the previously mentioned while loop
- Core1 seems to have dropped into the =case
ThreadState_IdleThreadState:= case when switching on =switch
(thread_state_get_tsType(NODE_STATE(ksCurThread)->tcbState))=
- This was in =activateThread=
12. Core0 is still in the previously mentioned while loop
- Core1 has exited =init_kernel= and is now branching
to =restore_user_context=
To test everything out I am using the  “camkes-sols-master” manifest
and building the “CAmkES Hello World application with events and
dataports”.
The changes I made are
·        I edited the top level CAmkES file to set the affinity for
two separate cores.
·        Upped the Max Number of CPU nodes to 2
The rest of the config is pretty standard. I have it attached to this
message.
The FSBL and ps7_init script I use are the standard ones created for
the zc702 from the 2017.2 version of the Xilinx XSDK.
I am booting from jtag and first run the ps7_init script and then
flash the fsbl and then the
“capdl-loader-experimental-image-arm-zynq7000” that was built.
I am wondering if anyone is using a modified fsbl or ps7_init that
does something else, if there is config value that I missed, or if it
is still in development? If it is still in development I’d like to
work with whoever is
Thanks,
Jesse Millwood
_______________________________________________
Devel mailing list
Devel@sel4.systems
https://sel4.systems/lists/listinfo/devel

Devel mailing list
Devel@sel4.systems
https://sel4.systems/lists/listinfo/devel
_______________________________________________
Devel mailing list
Devel@sel4.systems
https://sel4.systems/lists/listinfo/devel
_______________________________________________
Devel mailing list
Devel@sel4.systems
https://sel4.systems/lists/listinfo/devel