HI, I have question about IPC on SMP. I ran sel4bench on sabre lite(i.mx6, arm v7) which SMP is on. Sabre lite has 4 cores. I supposed that SMP improved IPC performances. But results were not. seL4_CALL SMP on 504 SMP off 336 Would you give me reason why these results are happened?
Hi, I attached two of the benchmark results that I pulled from the last run. Looking at the top of each of them I get similar results to you: "Benchmark": "One way IPC microbenchmarks", "Direction": "client->server", "Function": "seL4_Call", SMP on (mean, stddev) 444.125, 5.176871642217914 SMP off (mean, stddev) 305.125, 8.94427190999916 The thing to note is that the measurement is still IPC between two threads of the same address space on the same core. The overhead comes from a lock that is introduced in the SMP seL4 configuration. When any syscall happens, the core that the syscall is performed on has to acquire the lock before it can handle the syscall. If another core is performing a syscall, the first core will be blocked until the lock is released. This can be found in the kernel entry code "kernel/src/arch/arm/c_traps.c", look at the NODE_LOCK_SYS macro. Essentially the overhead comes from the memory barrier required to implement the lock, and the extra instructions for acquiring and releasing the lock on every syscall. Kind regards, Kent.
Thanks for the info. 2018년 10월 23일 (화) 오후 3:02, <Kent.Mcleod@data61.csiro.au>님이 작성:
Hi,
I attached two of the benchmark results that I pulled from the last run.
Looking at the top of each of them I get similar results to you:
"Benchmark": "One way IPC microbenchmarks", "Direction": "client->server", "Function": "seL4_Call", SMP on (mean, stddev) 444.125, 5.176871642217914 SMP off (mean, stddev) 305.125, 8.94427190999916
The thing to note is that the measurement is still IPC between two threads of the same address space on the same core. The overhead comes from a lock that is introduced in the SMP seL4 configuration. When any syscall happens, the core that the syscall is performed on has to acquire the lock before it can handle the syscall. If another core is performing a syscall, the first core will be blocked until the lock is released. This can be found in the kernel entry code "kernel/src/arch/arm/c_traps.c", look at the NODE_LOCK_SYS macro. Essentially the overhead comes from the memory barrier required to implement the lock, and the extra instructions for acquiring and releasing the lock on every syscall.
Kind regards, Kent.
participants (2)
-
Kent.Mcleod@data61.csiro.au
-
송대영