I think I found the reason, or, at least, I do not have errors now and I have reasoning about the solution. Please correct me if I am wrong. Firstly, I should come back to my email about syscall.c (11 Oct. 16). This conversation was started by my message: … For example, sometimes it uses only S0-S7 with some T0-T9 registers … end ended with: ------
The point of the syscall.c tests is to check that registers are not being corrupted by our syscalls (i.e that the kernel ABI + stubs follows the calling convention of the architecture).
…and corruption of registers can happen only if syscalls modify stack, since these values are popped from it after the end of a syscall routine, right? ----- So, I have tried to say, that this test, in my case, tests nothing, because variables like this: register int a00 = 0xdead0000; \ register int a01 = 0xdead0001; \ register int a02 = 0xdead0002; \ register int a03 = 0xdead0003; \ register int a04 = 0xdead0004; \ can be located anywhere. Since I (we/you/they) do not specify exactly register name, the compiler can do anything with these variables. And this is what I saw in my tests: My compiler uses different registers and save their values on the stack before the syscall. After the syscall, the compiler load previous values, and tests pass without any problem. Now I have specified register name: #define TEST_REGISTERS(code) \ do { \ register int a00 asm("v0") = 0xdead00aa; \ __asm__ __volatile__ ("" \ : "+r"(a00)); \ code ; \ __asm__ __volatile__ ("" \ : "+r"(a00)); \ test_assert(a00 == 0xdead00aa); \ } while (0) and used only Yield syscal as the test. Btw, this is an implementation of Yield: static inline void seL4_Yield(void) { register seL4_Word scno asm("v0") = seL4_SysYield; __asm__ __volatile__ ("nop;syscall" : : "r"(scno)); } And this is what I see when I disassemble this test: 00403080 <test_seL4_Yield>: 403080: 3c04dead lui a0,0xdead <==== a0 = 0xdead0000 403084: 27bdffe0 addiu sp,sp,-32 403088: 248400aa addiu a0,a0,170 <==== a0 = 0xdead00aa 40308c: 2403000a li v1,10 403090: afbf001c sw ra,28(sp) 403094: 00802821 move a1,a0 403098: 00801021 move v0,a0 <==== v0 = a0 = 0xdead00aa 40309c: 2402fff9 li v0,-7 <==== v0 = -7 4030a0: 00000000 nop 4030a4: 0000000c syscall 4030a8: 14450008 bne v0,a1,4030cc <test_seL4_Yield+0x4c> <==== compare v0 with a0 4030ac: 2463ffff addiu v1,v1,-1 4030b0: 1460fffa bnez v1,40309c <test_seL4_Yield+0x1c> 4030b4: 00801021 move v0,a0 4030b8: 0c107f4d jal 41fd34 <sel4test_get_result> 4030bc: 00000000 nop 4030c0: 8fbf001c lw ra,28(sp) 4030c4: 03e00008 jr ra 4030c8: 27bd0020 addiu sp,sp,32 4030cc: 3c040043 lui a0,0x43 4030d0: 3c050043 lui a1,0x43 4030d4: 2484d8a0 addiu a0,a0,-10080 4030d8: 24a5d8b4 addiu a1,a1,-10060 4030dc: 0c107f25 jal 41fc94 <_sel4test_failure> 4030e0: 2406009f li a2,159 4030e4: 00001021 move v0,zero 4030e8: 8fbf001c lw ra,28(sp) 4030ec: 03e00008 jr ra 4030f0: 27bd0020 addiu sp,sp,32 So, as one can see, we load the 0xdead00aa into the a0, then we copy it to the v0, then we fill scno after the syscall we compare values. And of course, this test is failed. So, my failure was an assumption that these registers are saved across syscall. I already changed my message registers to S0-S3 (Callee saved registers), and I do not see original issue anymore. But, of course, callee saved registers add overhead and the size of my binary changes. Thus, maybe I still have an error, but I cannot trigger it with current tests.
On Tue 01-Nov-2016 7:37 AM, Vasily A. Sartakov wrote:
as you might see, this is little bit modified version of trivial.c tests. Usual, when I am testing all tests, there is no problem with this test. But when I am running this file alone, I have a problem. Also, you might see, that there are several free lines with numbers from 16 to 22. I made this not accidental, it is a source of error. If the test_allocator(env_t env) is lockated in the 23rd line, this tests has no problem. But if I add one more free line, I have an error like this: If adding whitespace gives you a different compilation result then that is one bizarre compiler you have. I would check and make sure that this is really what is going on, because it seems fairly improbable to me. Maybe do some multiple runs/builds, 'make clean' between each build etc. Also, I see, that there is a correlation between size of the image and faults:
1349892 ./sel4test-tests.bin_23 1349900 ./sel4test-tests.bin_24
The border line is 1349900 if I have a size of the image below the value -- there is no problem. Unfortunately, 1349900 is not a 'round' value, somehow related to TLB sizes of something else what I know.
If virtual address layout changes seem to coincide with faults then I would be checking things like * Context switching code / address space management * TLB/cache/ASID maintenance * Branch predictor / any other hardware state that tracks virtual addresses
Note that I'm saying this as someone who knows basically nothing about MIPS, hence the broad suggestions.
Adrian
-- Vasily A. Sartakov sartakov@ksyslabs.org