On Thu, Dec 2, 2021 at 9:28 PM Gernot Heiser <gernot@unsw.edu.au> wrote:
On 2 Dec 2021, at 16:17, yadong.li <yadong.li@horizon.ai> wrote:
First, I got the data of IMX8MM_EVK_64 and TX2 from https://github.com/seL4/sel4bench/actions/runs/1469475721#artifacts, the sel4bench-results-imx8mm_evk file and sel4bench-results-tx2 file, unpack the file out, I find xxxx_SMP_64.json Secondly, the test is the smp benchmark form sel4bench-manifest project, the source file is sel4bench/apps/smp/src/main.c The test scenario look like below: A pair thread of ping-pong on the same core, the ping thread will wait for "ipc_normal_delay" time then send 0 len ipc message to pong thread, then return. I think the 500 cycles mean how long ipc_normal_delay will really delay
The above scenario will test on one core, or mutil core. If we run 4 cores, every core will have a ping thread and a pong thread run like above description, then record the sum of all cores ping-pong counts.
ok, but what is the metric reported? [Apologies for not being on top of the details of our benchmarking setups.]
Looking at the sel4bench smp benchmark implementation, the metric is the total number of "operations" in a single second. An operation is a round trip intra address space seL4_Call + seL4_ReplyRecv between 2 threads on the same core with each thread delaying for the cycle count before performing the next operation. After 1 second of all cores performing these operations continuously and maintaining a core-local (on a separate cache line) count, the total number of operations is added together and reported as the final number. So you would expect that the reported metric would scale following Amdahl's law based on the proportion of an operation that is serialized inside the kernel lock which would potentially vary across platforms.
It cannot simply be throughput, else the doubled delay should be reflected in a significantly reduced throughput, but it has almost no effect.
I think this experiment is used to illustrate in multi core, our seL4 kernel big lock will not affect mutli-core performance, am I right ?
Not quite. As there’s only one big lock, only one core can execute the kernel at any time. If one core is in the IPC while another core is trying to IPC, even though both IPCs are core-local, the second will have to wait until the first gets out of the lock.
As the delay is higher than the syscall latency, you’d expect perfect scalability from one core to two (with the lock essentially synchronising the threads). For 3 or 4 cores the combined latency of two IPCs is larger than the 500cy delay and you expect lock contention, resulting in reduced scaling, while it should still scale almost perfectly with the 1000cy delay. This is exactly what you see for the i.MX8 and the TX2.
Addition: Our seL4_Call performance is same with other platform XXXX IMX8MM_EVK_64 TX2_64 seL4_Call 367(0) 378(2) 492(16) client->server, same vspace, ipc_len is 0 seL4_ReplyRecv 396(0) 402(2) 513(16) server->client, same vspace, ipc_len is 0
OK, so baseline performance is good. But are these measured on a single-core or SMP kernel (i.e. is locking included)?
My test results below: ARM platform Test item XXX IMX8MM_EVK_64 TX2 mean(Stddev) 500 cycles, 1 core 636545(46) 625605(29) 598142(365) 500 cycles, 2 cores 897900(2327) 1154209(44) 994298(94) 500 cycles, 3 cores 1301679(2036) 1726043(65) 1497740(127) 500 cycles, 4 cores 1387678(549) 2172109(12674) 1545872(109) 1000 cycles, 1 core 636529(42) 625599(22) 597627(161) 1000 cycles, 2 cores 899212(3384) 1134110(34) 994437(541) 1000 cycles, 3 cores 1297322(5028) 1695385(45) 1497547(714) 1000 cycles, 4 cores 1387149(456) 2174605(81) 1545716(614)
I notice your standard deviations for 2 and 3 cores are surprisingly high (although still small in relative terms).
Did you try running the same again? Are the numbers essentially the same or are multiple runs all over the shop?
There are some issues with our benchmarking methodology. Fixing up sel4bench is one of the projects I’d like to do if I got a student for it, or maybe someone from the community would want to help?
But just from looking at the data I’m not sure that’s the issue here.
Gernot _______________________________________________ Devel mailing list -- devel@sel4.systems To unsubscribe send an email to devel-leave@sel4.systems