Re: [seL4] need ideas

2 Nov 2016

      I think I found the reason, or, at least, I do not have errors now and I have reasoning about the solution. Please correct me if I am wrong. 

Firstly, I should come back to my email about syscall.c (11 Oct. 16). This conversation was started by my message: 

…  For example, sometimes it uses only S0-S7 with some T0-T9 registers … 

end ended with: 

------
...
The point of the syscall.c tests is to check that registers are not being corrupted by our syscalls (i.e that the kernel ABI + stubs follows the calling convention of the architecture).
…and corruption of registers can happen only if syscalls modify stack, since these values are popped from it after the end of a syscall routine, right? 

-----

So, I have tried to say, that this test, in my case, tests nothing, because variables like this: 

        register int a00 = 0xdead0000; \
        register int a01 = 0xdead0001; \
        register int a02 = 0xdead0002; \
        register int a03 = 0xdead0003; \
        register int a04 = 0xdead0004; \

can be located anywhere. Since I (we/you/they) do not specify exactly register name, the compiler can do anything with these variables. And this is what I saw in my tests: My compiler uses different registers and save their values on the stack before the syscall. After the syscall, the compiler load previous values, and tests pass without any problem. Now I have specified register name: 

#define TEST_REGISTERS(code) \
    do { \
        register int a00 asm("v0") = 0xdead00aa; \
        __asm__ __volatile__ ("" \
                : "+r"(a00)); \
        code ; \
        __asm__ __volatile__ ("" \
                : "+r"(a00)); \
        test_assert(a00 == 0xdead00aa); \
    } while (0)

and used only Yield syscal as the test. Btw, this is an implementation of Yield: 

static inline void
seL4_Yield(void)
{
    register seL4_Word scno asm("v0") = seL4_SysYield;
    __asm__ __volatile__ ("nop;syscall" : : "r"(scno));
}

And this is what I see when I disassemble this test: 

00403080 <test_seL4_Yield>:
  403080:	3c04dead 	lui	a0,0xdead           <====  a0 = 0xdead0000
  403084:	27bdffe0 		addiu	sp,sp,-32
  403088:	248400aa 	addiu	a0,a0,170     <====  a0 = 0xdead00aa
  40308c:	2403000a 	li	v1,10
  403090:	afbf001c 		sw	ra,28(sp)
  403094:	00802821 	move	a1,a0
  403098:	00801021 	move	v0,a0           <====  v0 = a0 = 0xdead00aa
  40309c:	2402fff9 		li	v0,-7                    <====  v0 = -7
  4030a0:	00000000 	nop 
  4030a4:	0000000c 	syscall
  4030a8:	14450008 	bne	v0,a1,4030cc <test_seL4_Yield+0x4c>     <==== compare v0 with a0
  4030ac:	2463ffff 		addiu	v1,v1,-1
  4030b0:	1460fffa 		bnez	v1,40309c <test_seL4_Yield+0x1c>
  4030b4:	00801021 	move	v0,a0
  4030b8:	0c107f4d 	jal	41fd34 <sel4test_get_result>
  4030bc:	00000000 	nop
  4030c0:	8fbf001c 		lw	ra,28(sp)
  4030c4:	03e00008 	jr	ra
  4030c8:	27bd0020 	addiu	sp,sp,32
  4030cc:	3c040043 	lui	a0,0x43
  4030d0:	3c050043 	lui	a1,0x43
  4030d4:	2484d8a0 	addiu	a0,a0,-10080
  4030d8:	24a5d8b4 	addiu	a1,a1,-10060
  4030dc:	0c107f25 	jal	41fc94 <_sel4test_failure>
  4030e0:	2406009f 	li	a2,159
  4030e4:	00001021 	move	v0,zero
  4030e8:	8fbf001c 		lw	ra,28(sp)
  4030ec:	03e00008 	jr	ra
  4030f0:	27bd0020 	addiu	sp,sp,32

So, as one can see, we load the 0xdead00aa into the a0, then we copy it to the v0, then we fill scno after the syscall we compare values. And of course, this test is failed. 

So, my failure was an assumption that these registers are saved across syscall. I already changed my message registers to S0-S3 (Callee saved registers), and I do not see original issue anymore. But, of course, callee saved registers add overhead and the size of my binary changes. Thus, maybe I still have an error, but I cannot trigger it with current tests.
...
On Tue 01-Nov-2016 7:37 AM, Vasily A. Sartakov wrote:
...
as you might see, this is little bit modified version of trivial.c tests. Usual, when I am testing all tests, there is no problem with this test. But when I am running this file alone, I have a problem. Also, you might see, that there are several free lines with numbers from 16 to 22. I made this not accidental, it is a source of error. If the  test_allocator(env_t env) is lockated in the 23rd line, this tests has no problem. But if I add one more free line, I have an error like this: 
If adding whitespace gives you a different compilation result then that is one bizarre compiler you have. I would check and make sure that this is really what is going on, because it seems fairly improbable to me. Maybe do some multiple runs/builds, 'make clean' between each build etc.
Also, I see, that there is a correlation between size of the image and faults:
1349892    ./sel4test-tests.bin_23
1349900    ./sel4test-tests.bin_24
The border line is 1349900 if I have a size of the image below the value -- there is no problem. Unfortunately, 1349900 is not a 'round' value, somehow related to TLB sizes of something else what I know.
If virtual address layout changes seem to coincide with faults then I would be checking things like
* Context switching code / address space management
* TLB/cache/ASID maintenance
* Branch predictor / any other hardware state that tracks virtual addresses
Note that I'm saying this as someone who knows basically nothing about MIPS, hence the broad suggestions.
Adrian
-- 
Vasily A. Sartakov
sartakov@ksyslabs.org