On Thu, Apr 18, 2024 at 8:31 PM Gernot Heiser via Devel <devel@sel4.systems> wrote:
I’m not completely sure what you’re actually measuring here (actually I’m totally unsure what your setup really is, I can’t make sense of the above description in terms of what the seL4 system is doing). But measuring performance of a kernel inside a virtual machine is somewhat pointless at best.
I don't think I've got any way to run both QNX and seL4 on the same hardware without virtualization. I wouldn't think it would make that much of a difference here.
In any case it seems clear that you’re *not* measuring raw seL4 IPC. Your quoted cost of a round-trip on seL4 is say 120ms for 10,000 round trips, or 12µs for one round trip. You don’t specify the clock rate, but that would be somewhere between 12,000 and 50,000 cycles, when a round-trip IPC on x86 with MCS is <900 cycles
I'm running on a Core i7-960, which is 3.2GHz.
Note that using Send()/Recv()/Reply() is the wrong way to use seL4 IPC, see https://microkerneldude.org/2019/03/07/how-to-and-how-not-to-use-sel4-ipc/ However, this will only be about a factor two slower than using the correct Call()/ReplyWait() incantation.
Somehow I said Send()/Recv()/Reply() when I meant Call()/Recv()/Send() (MCS of course doesn't even have a distinct Reply() function). Here's the exact code for the benchmark loop I'm using on seL4: void c_test_client(seL4_CPtr endpoint) { struct seL4_MessageInfo info = seL4_MessageInfo_new(0, 0, 0, 2); int j; for (j = 0; j < 10000; j++){ seL4_Call(endpoint, info); } } void c_test_server(seL4_CPtr endpoint, seL4_CPtr reply) { struct seL4_MessageInfo info = seL4_MessageInfo_new(0, 0, 0, 2); while (1){ seL4_Recv(endpoint, (void *)0, reply); seL4_Send(reply, info); } } This is being called from a Rust wrapper that allocates the endpoint/reply and reads the TSC; the inner part of it looks like this: let start = x86_64::_rdtsc(); x86_64::__cpuid(0); c_test_client(endpoint.to_cap()); let end = x86_64::_rdtsc(); x86_64::__cpuid(0); info!("test finished: {} {} {}", start, end, end - start); I'm not quite sure what would be slowing it down. I would have thought this would be measuring only the cost of a round trip IPC and nothing more. Is there possibly some kind of scheduling issue adding overhead? The QNX/Linux equivalent is: int main() { char buf[100]; int f = open("/dev/null", O_WRONLY); int i; for (i = 0; i < 10; i++) { uint64_t start = __rdtsc(); uint64_t end; int j; for (j = 0; j < 10000; j++){ if (write(f, buf, 100) != 100){ printf("cannot write\n"); exit(1); } } end = __rdtsc(); printf("cycles: %u\n", end - start); } }