Hesham, I apologize for the delay. I have run the seL4 tests master on the zc702 board with the bamboo smp config and that works as expected, all is well in the universe there. I have tried the master CAmkES solutions hello-2 application with the config set to have 2 Max Cpus and changed the component affinity attributes in the top level CAmkES file but it isn't getting past: ELF-loader started on CPU: ARM Ltd. Cortex-A9 r3p0 paddr=[10000000..103cc81f] ELF-loading image 'kernel' paddr=[0..27fff] vaddr=[e0000000..e0027fff] virt_entry=e0000000 ELF-loading image 'capdl-loader-experimental' paddr=[28000..382fff] vaddr=[8000..362fff] virt_entry=fe30 Bringing up 1 other cpus Enabling MMU and paging I'm going to do some more debugging but I thought I'd let you know that the sel4 tests worked but not multi-core CAmkES. On 08/13/2017 11:51 PM, Hesham.Almatary@data61.csiro.au wrote:
Hi Jesse,
On sel4test/SMP, TEST_FPU0002 test relies on platform-dependent timer driver to implement sleep()/timestamp(). In the case of zynq7000, TTC timers are being used. We have recently added this functionality (basically ltimer/gettime()). From top-level sel4test; it's at projects/util_libs/libplatsupport/:
commit b2670005df372ca7e22be4bd7bfa18c9bda10e89 Author: Hesham Almatary
Date: Thu Jul 27 14:25:36 2017 +1000 ltimer/zynq: use ttc1_timer1 as a timestamp/gettime timer
Are you using the master branch of util_libs? Any test (e.g. TEST_FPU0002) that's using gettime() won't work without this commit. Note that it's not released yet.
Hope that helps.
Best, Hesham
On 12/08/17 06:08, Jesse Millwood wrote:
Hesham,
Thank you for the response and the tips
I tried out the master version of the sel4-tests and still don't seem to be able to complete all of the tests.
Looks like it crashed after the TEST_FPU0002
I still have some debugging to do, but a better idea of where to go now thanks.
Jesse
On 2017-08-08 07:56 PM, Hesham.Almatary@data61.csiro.au wrote:
Hi Jesse
On 09/08/17 06:45, Jesse Millwood wrote:
Hello,
I realize that the only officially supported SMP platform is the i.mx6 but I did see some code on 6.0 to master for SMP and the zynq7000 so I am trying to test out some SMP functionality on my zc702 board.
I am wondering if anyone has this working because my system is failing when compiling from master and the 6.0 and compatible tags.
6.0.x release doesn't support SMP on Zynq, it's supposed to be in the next release. However, you can still use master branches to get SMP/Zynq work. Another option is to just use the master branch in the elfloader since it fixes SMP related bugs and adds a reset functionality to reset zynq's secondary core.
From the config you provided, it seems like you enabled printing and debug mode. Zynq/SMP has only been tested in release mode on sel4test/zc706. This means it might not work if you enabled printing and/or debug mode. Furthermore, we haven't tested it on camkes-based projects yet.
From your thorough output (thanks!), everything seems to be fine; core1 should be going to restore_user_context and proceed with the idleThread after releasing the lock, and core0 should get it, and go to the root thread.
I'd suggest you try disabling printing/debug mode and run on sel4test/master (bamboo_zynq7000_smp_release_xml_defconfig), just to make sure this works on your board.
Please let us know if you still have the same issue.
Cheers, Hesham
My second core seems to be coming up but the system ultimately fails and prints out:
Bo
ot
nKgE RaNlEl L fDinAiTsAh eAdBO, RdT!r
ppFaeud lttio nugs eirn stspraucctei o
n: 0xe001d1b0 o
FAR: 0xfffffff8 DFSR: 0x807
halting...
Kernel entry via Syscall, number: 1, Call
Cap type: 1, Invocation tag: 37
Which seems to be “Booting all finished, dropped to user space” from core0 and “KERNEL DATA ABORT!” from core1.
The “0xe001d1b0” seems to be the label of the “idle_thread” function.
While stepping through via JTAG, I have verified that core1 gets through “init_kernel” and then enters “restore_user_context” at some point in “restore_user_context” the fault registers as shown in the printed output are set. I think it is either in the c_exit_hook in restore_user_context or after the program branches to “0xFFF0010” which is “ldr pc,0xFFFF0030”. This branches to the “arm_data_abort_exception” label, which goes to “kernel_data_fault” label and then to “kernel data abort”.
I’m having trouble exactly pin pointing where the fault occurs but it seems to be close to there.
Has anyone had similar issues with SMP?It seems to get fairly far without setting the fault registers.
I have tried to step through the execution over JTAG and here are some of my (verbose) notes
| CORE0 Address | Core0 Function | Core0 Instruction | CORE1 Address | Core1 Function | Core1 Instruction | DFSR | DFAR | Note |
|---------------+-----------------------------+------------------------+---------------+------------------------------------+----------------------+------------+------------+------|
| 0x10000000 | label: start | =cpsid aif= | 0xFFFFFF34 | | =mvn r0,#0x0f= | 0x00000000 | 0x00005000 | |
| 0x10003A2C | call: platform_init | =bl -x10003DD8= | 0xFFFFFF30 | | =wfe= | 0x00000000 | 0x00005000 | |
| 0x10003ACC | call: smp_boot | =bl 0x100039FC= | 0xFFFFFF34 | | =mvn r0,#0x0f= | 0x00000000 | 0x00005000 | |
| 0x10003ADO | ret: smp_boot | =bl 0x10005C54= | 0x10000020 | in: non_boot_core | =orr r0,r0,#0x40= | 0x00000000 | 0x00005000 | 2 |
| 0x10003ADC | =if(is_hyp_mode())= | =beq 0x10003AF0= | 0x10002200 | label: arm_disable_dcaches | =push {r14}= | 0x00000000 | 0x00005000 | |
| 0x10003AFC | call: arm_enable_mmu | =bl 0x10002174= | 0xE0006190 | in: try_init_kernel_secondary_core | =beq 0xE0001680= | 0x00000000 | 0x00005000 | 1 |
| 0xE0001D70 | label: init_kernel | =push {r11,r14}= | 0xE0001680 | in: try_init_kernel_secondary_core | =beq 0xE0001680= | 0x00000000 | 0x00005000 | 1 |
| 0xE0001814 | label: try_init_kernel | =push {r11,r14}= | 0xE0001690 | in: try_init_kernel_secondary_core | =beq 0xE0001680= | 0x00000000 | 0x00005000 | 1 |
| 0xE0001B80 | call: create_initial_thread | =str r0, [r11,#-0x14]= | 0xE0001680 | in: try_init_kernel_secondary_core | =beq 0xE0001680= | 0x00000000 | 0x00005000 | 1 |
| 0xE0001C48 | call: SMP_COND_STATEMENT | =bl 0xE0003C20= | 0xE0001680 | in: try_init_kernel_secondary_core | =beq 0xE0001680= | 0x00000000 | 0x00005000 | 1, 3 |
| 0xE0001C4C | call: SMP_COND_STATEMENT | =bl 0xE00017D8= | 0xE0001680 | in: try_init_kernel_secondary_core | =beq 0xE0001680= | 0x00000000 | 0x00005000 | 1, 4 |
| 0xE0001C50 | NODE_LOCK_SYS | =bl 0xE0019280= | 0xE0019288 | in: getCurrentCPUIndex | =sub r13,r13,#0x8= | 0x00000000 | 0x00005000 | 5 |
| 0xE0001D1C | call: arch_pause | =bl 0xE0019DB0= | 0xE0019290 | in: getCurrentCPUIndex | =str r0,[r11,#-0x8]= | 0x00000000 | 0x00005000 | 6 |
| 0xE0001D40 | in: clh_lock_acquire | =uxtb r3,r3= | 0xE00037F4 | in: init_core_state | =pop {r4,r11,pc}= | 0x00000000 | 0x00005000 | |
| 0xE0001D20 | in: clh_lock_acquire | =mov r2,#0xE800= | 0xE0003754 | in: init_core_state | =movw r2,#0xE8E0= | 0x00000000 | 0x00005000 | 7 |
| 0xE0001D40 | in: clh_lock_acquire | =uxtb r3,r3= | 0xE00017C4 | in: try_init_kernel_secondary | =mov r3,#0x1= | 0x00000000 | 0x00005000 | 8 |
| 0xE0001D40 | in: clh_lock_acquire | =uxtb r3,r3= | 0xE002A06C | in: schedule | =push {r11,r14}= | 0x00000000 | 0x00005000 | 9 |
| 0xE0001D40 | in: clh_lock_acquire | =uxtb r3,r3= | 0xE002979C | in: activateThread | =push {r11, r14}= | 0x00000000 | 0x00005000 | 10 |
| 0xE0001D40 | in: clh_lock_acquire | =uxtb r3,r3= | 0xE001D24C | label: Arch_activateIdleThread | =push {r11}= | 0x00000000 | 0x00005000 | 11 |
| 0xE0001D38 | in: clh_lock_acquire | =ldr r3,[r3,#0x4]= | 0xE0000054 | in: start | =b 0xE001CEC8= | 0x00000000 | 0x00005000 | 12 |
Notes
1. Core1 is in a =while (!node_boot_lock)= loop
2. In =smp_boot=, CORE1 changes after =init_cpus= (branch location: ZSR:10003A08)
- In =smp_boot=, =boot_cpus= is called
- This sets the =CPU_JUMP_PTR= =*((volatile uint32_t*)CPU_JUMP_PTR) = (uint32_t)entry;=
- calls =dsb= (data synchronization barrier)
- After this call, CPU1 goes to =FFFFFF2C: dsb sy=
- And then =sev=
- After this call, CPU1 goes to the =non_boot_core= label
- SEV
- SEV causes an event to be signaled to all cores within a multiprocessor system. If SEV is implemented, WFE must also be implemented.
- WFE
- If the Event Register is not set, WFE suspends execution until one of the following events occurs:
- an IRQ interrupt, unless masked by the CPSR I-bit
- an FIQ interrupt, unless masked by the CPSR F-bit
- an Imprecise Data abort, unless masked by the CPSR A-bit
- a Debug Entry request, if Debug is enabled
- an Event signaled by another processor using the SEV instruction.
- If the Event Register is set, WFE clears it and returns immediately.
- If WFE is implemented, SEV must also be implemented.
- After CPU0 executes =arm_enable_mmu()= from the =main= function
- by the end of =smp_boot= core1 is just starting =non_boot_main=
3. The =SMP_COND_STATEMENT= is calling =clh_lock_init=
4. The =SMP_COND_STATEMENT= is calling =release_secondary_cpus=
5. right after Core0 returned from releasing secondary cpus
- First time Core1 has exited the loop
- Core1's stack is
- =getCurrentCPUINdex=
- =init_core_state=
- =try_init_kernel_secondary_core=
- =init_kernel=
6. Core0 is in a =while(big_kernel_lock.node_owners[cpu].next->value ! = CLHState_Granted)=
- Core0 is in a static inline function =clh_lock_acquire= in =try_init_kernel=
- Core1 is in =getCurrentCPUIndex= but being called from =tcbDebugAppend=
- =tcbDebugAppend= is being called from a =for= loop in =init_core_state=
7. Core0 is still in the previously mentioned while loop
- Core1 is in "init_core_state" and has exited the for loop that called =tcbDebugAppend(NODE_STATE_ON_CORE(ksIdleThread, i))=
8. Core0 is still in the previously mentioned while loop
- Core1 has returned to =try_init_kernel_secondary_core= from =init_core_state= and is at the end of the function
9. Core0 is still in the previously mentioned while loop
- Core1 has entered the =init_kernel= call and then the =schedule= function.
10. Core0 is still in the previously mentioned while loop
- Core1 has entered the =activateThread= call after =schedule= in =init_kernel=
11. Core0 is still in the previously mentioned while loop
- Core1 seems to have dropped into the =case ThreadState_IdleThreadState:= case when switching on =switch (thread_state_get_tsType(NODE_STATE(ksCurThread)->tcbState))=
- This was in =activateThread=
12. Core0 is still in the previously mentioned while loop
- Core1 has exited =init_kernel= and is now branching to =restore_user_context=
To test everything out I am using the “camkes-sols-master” manifest and building the “CAmkES Hello World application with events and dataports”.
The changes I made are
· I edited the top level CAmkES file to set the affinity for two separate cores.
· Upped the Max Number of CPU nodes to 2
The rest of the config is pretty standard. I have it attached to this message.
The FSBL and ps7_init script I use are the standard ones created for the zc702 from the 2017.2 version of the Xilinx XSDK.
I am booting from jtag and first run the ps7_init script and then flash the fsbl and then the “capdl-loader-experimental-image-arm-zynq7000” that was built.
I am wondering if anyone is using a modified fsbl or ps7_init that does something else, if there is config value that I missed, or if it is still in development? If it is still in development I’d like to work with whoever is
Thanks,
Jesse Millwood
_______________________________________________ Devel mailing list Devel@sel4.systems https://sel4.systems/lists/listinfo/devel
Devel mailing list Devel@sel4.systems https://sel4.systems/lists/listinfo/devel
_______________________________________________ Devel mailing list Devel@sel4.systems https://sel4.systems/lists/listinfo/devel
_______________________________________________ Devel mailing list Devel@sel4.systems https://sel4.systems/lists/listinfo/devel