need ideas

1 Nov 2016


      Hello. 

I have a problem. It is a very complicated problem.  I understand that It is hard to give an advice without looking into the sources. But I am stuck and need some ideas in ‘brainstorm’ format as an input for me. So, here is the problem: 

There are many tests from sel4test suit are working. This is how does it look like: 

http://pastebin.com/vvnDaUe9

Unfortunately, I found several configurations of the tests, which causes crashes with stable and different symptoms. One of them is looking following: 

    11    #include <sel4test/test.h>
    12    #include "../test.h"
    13    #include "../helpers.h"
    14    
    15    #define MIN_EXPECTED_ALLOCATIONS 100
    16    
    17    
    18    
    19    
    20    
    21    
    22    
    23    int test_allocator(env_t env)
    24    {
    25        /* Perform a bunch of allocations and frees */
    26        vka_object_t endpoint;
    27        int error;
    28    
    29        for (int i = 0; i < MIN_EXPECTED_ALLOCATIONS; i++) {
    30            error = vka_alloc_endpoint(&env->vka, &endpoint);
    31            test_assert(error == 0);
    32            test_assert(endpoint.cptr != 0);
    33            vka_free_object(&env->vka, &endpoint);
    34        }
    35    
    36        return sel4test_get_result();
    37    }
    38    DEFINE_TEST(TRIVIAL0001, "Ensure the allocator works", test_allocator)
    39    DEFINE_TEST(TRIVIAL0002, "Ensure the allocator works more than once", test_allocator)


as you might see, this is little bit modified version of trivial.c tests. Usual, when I am testing all tests, there is no problem with this test. But when I am running this file alone, I have a problem. Also, you might see, that there are several free lines with numbers from 16 to 22. I made this not accidental, it is a source of error. If the  test_allocator(env_t env) is lockated in the 23rd line, this tests has no problem. But if I add one more free line, I have an error like this: 

Starting test suite sel4test
Starting test 0: TEST_TRIVIAL0001
  TEST_TRIVIAL0001

 726:trap!, address error on store!

$0   : 0 fffffff8 7 10089000
$4   : 6a6ffc 3f 0 ffffffff
$8   : 1 7 1 20
$12   : 70f 0 2 6ae
$16   : 6a6ffc 5a6a50 ffffff30 6a6ffc
$20   : 3f 5a6e58 5a6a50 6a6fdc
$24   : 20 41bab0 0 0
$28   : 0 100848d8 fffffffe 41faf0
Hi    : 20
Lo    : 0
epc   : 41f818
fi    : 41f814
ra    : 41faf0
Status: 20000002
Cause : 414
Config: 80008482
BadVaddr: f


which means, that the test-drivers (I know the ASID) tries to store something to the address 0x0000000f by the instruction located in the 0x41f818 (EPC). Also, it is failed before the starting of trivial.c test! 

This is what I on the address: 

0041f7b4 <unbin>:
  41f7b4:    8c83000c     lw    v1,12(a0)
  41f7b8:    27bdffd8     addiu    sp,sp,-40
  41f7bc:    8c820008     lw    v0,8(a0)
  41f7c0:    afb0001c     sw    s0,28(sp)
  41f7c4:    00808021     move    s0,a0
  41f7c8:    afbf0024     sw    ra,36(sp)
  41f7cc:    1462000e     bne    v1,v0,41f808 <unbin+0x54>
  41f7d0:    afb10020     sw    s1,32(sp)
  41f7d4:    00a03021     move    a2,a1
  41f7d8:    00002021     move    a0,zero
  41f7dc:    0c107afa     jal    41ebe8 <.pic.__ashldi3>
  41f7e0:    24050001     li    a1,1
  41f7e4:    3c04005a     lui    a0,0x5a
  41f7e8:    00022827     nor    a1,zero,v0
  41f7ec:    24846a50     addiu    a0,a0,27216
  41f7f0:    0c107dac     jal    41f6b0 <a_and>
  41f7f4:    00038827     nor    s1,zero,v1
  41f7f8:    3c04005a     lui    a0,0x5a
  41f7fc:    02202821     move    a1,s1
  41f800:    0c107dac     jal    41f6b0 <a_and>
  41f804:    24846a54     addiu    a0,a0,27220
  41f808:    8e030008     lw    v1,8(s0)
  41f80c:    8e02000c     lw    v0,12(s0)
  41f810:    8fbf0024     lw    ra,36(sp)
  41f814:    8fb10020     lw    s1,32(sp)
  41f818:    ac430008     sw    v1,8(v0)

I am quite sure that there is no problem here with unbin function, and the problem somewhere with contexts, or alignment or something else. Also, I am mapping this area RO, so I am also sure that there is no corruption of the user-space regions. 

Also, I see, that there is a correlation between size of the image and faults: 

1349892    ./sel4test-tests.bin_23
1349900    ./sel4test-tests.bin_24

The border line is 1349900 if I have a size of the image below the value -- there is no problem. Unfortunately, 1349900 is not a 'round' value, somehow related to TLB sizes of something else what I know. 

Also, I should mention, that 'key' variable wich changes when I add new line is an argument of _sel4test_failure: 

mips-mti-linux-gnu-objdump -d sel4test-tests.bin_23 > bin23.asm
mips-mti-linux-gnu-objdump -d sel4test-tests.bin_24 > bin24.asm
 diff -ua bin23.asm bin24.asm 

 Disassembly of section .init:
@@ -3143,7 +3143,7 @@
   4030f0:    3c050043     lui    a1,0x43
   4030f4:    2484c248     addiu    a0,a0,-15800
   4030f8:    24a5c964     addiu    a1,a1,-13980
-  4030fc:    2406001f     li    a2,31
+  4030fc:    24060020     li    a2,32
   403100:    0c107b51     jal    41ed44 <_sel4test_failure>
   403104:    0000f021     move    s8,zero
   403108:    8fbf0074     lw    ra,116(sp)
@@ -3383,7 +3383,7 @@
   4034b0:    2484c950     addiu    a0,a0,-14000
   4034b4:    24a5c964     addiu    a1,a1,-13980
   4034b8:    0c107b51     jal    41ed44 <_sel4test_failure>
-  4034bc:    24060020     li    a2,32
+  4034bc:    24060021     li    a2,33
   4034c0:    03c01021     move    v0,s8
   4034c4:    8fbf0074     lw    ra,116(sp)
   4034c8:    8fbe0070     lw    s8,112(sp)

That is all that I have now. Any ideas? Now I do not see any other choice but to gradually reduce the amount of code with keeping this 'border' situation, until I will not have very small set of system functions. 

Thank you 


-- 
Vasily A. Sartakov
sartakov@ksyslabs.org

    

Vasily A. Sartakov

Adrian.Danis＠data61.csiro.au

Vasily A. Sartakov

Vasily A. Sartakov

tags

participants (2)