Re: [seL4] Wandboard Port

16 Mar 2015

      Hi Robert,

  Responses inline.

On Sun, 2015-03-15 at 15:33 +0100, Robert Kaiser wrote:
...
Hi Alex,
thanks for your quick response (on a sunday evening!)
Am 15.03.2015 um 11:23 schrieb Alexander Kroh:
...
Hi Robert,
The FSR value of 0x1c06 represents an asynchronous abort. In this case, the address reported cannot be trusted! 
Thats good to know. However after some more checking (see below), I'm
still inclined to believe the address is indeed correct in my case.
...
The abort occurs when a physical address is accessed that has no valid backing RAM or device register.
So, could it also happen when accessing a virtual address that is mapped
to an invalid physical address (that might explain what I'm seeing)?
The virtual to physical address translation has been completed
successfully, else you would get an synchronous abort. The key here is
that there was a problem with the underlying physical address.
...
...
We have had lots of fun with this feature on the SabreLite. Common causes are:
* Accessing device registers that do exist (some devices have voids in the middle of their address map).
* If you (for some reason) map a device with the cacheable attribute, all addresses which would be used to fill the cache line must be valid (again, watch out for voids).
* Some UART registers are unavailable when the appropriate enable bits are not set.
My advice to you is to check that you are using the correct physical address for your device mappings (Including the kernel IRQ controller and timer).
Also, the first printf at userspace may trigger the initialisation of the default UART (which will be incorrect in your case).
https://github.com/seL4/libplatsupport/blob/master/plat_include/imx6/platsup...
Thanks for this hint! That would have been the next thing for me to
stumble over. However, quickliy fixing it had no effect on my current
problem.
...
There may also be slight differences in the availability of device registers between the 2 SoCs.
Is that really a possibility, given that U-boot reports the same chip
revision on both boards?
It is unlikely, but it is still a possibility. Is it only the ARM chip
revisions that match or also the i.MX6 chip revisions?
...
Here's what I have tried in the meantime:
- write a small helper routine to dump all registers in
tptr->tcbContext.registers[0-15]
 - call that routine from handleFault()
With this modification in place, I get:
R00:300000 R01:0 R02:0 R03:0 R04:0 R05:0 R06:0 R08:0
R09:0 R10:0 R11:0 R12:0  SP:0  LR:0 LRs:1329c CPS:50
Caught cap fault in send phase at address 0x0
while trying to handle:
vm fault on data at address 0x1f11c2e0 with status 0x1c06
in thread 0xffdfad00 at address 0x13294
Now I  added a "mov r10,#-1" in head.S, right before the "rfeia sp" that
invokes the user space code. To that user space code at _sel4_start I
added a "mov r9,#-1". This is now the very first instruction to be
executed in user mode. Running the code with these changes, I get:
R00:300000 R01:0 R02:0 R03:0 R04:0 R05:0 R06:0 R08:0
R09:0 R10:ffffffff R11:0 R12:0  SP:0  LR:0 LRs:1329c CPS:50
Caught cap fault in send phase at address 0x0
while trying to handle:
vm fault on data at address 0x1f11c2e0 with status 0x1c06
in thread 0xffdfad00 at address 0x13294
So R10 has  received its value, R9 didn't. Both mov instructions use
immediate data, so do not cause any memory access other than opcode
fetches. That sort of indicates to me that the crash happens between
those two mov instructions, i.e. on the way to user mode.
Wish I had a JTAG-debugger....
What I am still uncertain about is wether a fault upon entering user
code is to be expected, i.e. do those pages get mapped in by a page
fault handler or are they pre-mapped before the code is invoked?
The fault is unexpected. The pages are pre-mapped by the kernel, but
again, this is not a virtual memory mapping issue.
However, one thing that is typical is the occurrence of an IRQ exception
as soon as the mode switch to user space occurs.

One thing to try is to insert an "isb" instruction just before switching
to user space. This will ensure that all memory accesses are completed
before continuing and it will force the asynchronous abort to occur at
this instruction rather than some future instruction, when the
load/store buffer finally drains.
You should also add an isb here in case you are returning from an IRQ:
https://github.com/seL4/seL4/blob/master/src/arch/arm/traps.S#L49

 - Alex
...
Again, thanks for any help
Cheers
Robert
...
- Alex
________________________________________
From: Devel [devel-bounces@sel4.systems] on behalf of Robert Kaiser [robert.kaiser@hs-rm.de]
Sent: Sunday, 15 March 2015 19:03
To: devel@sel4.systems
Subject: [seL4] Wandboard Port
Hello,
in an attempt to familiarize myself with the seL4 code, I am trying to
"port" it to the Wandboard (see www.wandboard.org). This should be an
easy task for a beginner (thought I) since the board is very similar to
the SabeLite, and seL4 is already running well on that board. I have
access to a SabreLite and a Wandboard Quad, both (according to U-boot)
have the same revision of the iMX6 SoC installed.
Differences between the Sabre and the Wand I have noticed so far are:
- 2GB of RAM from (0x10000000 to 0x90000000) on the Wand (Sabrelite has 1GB)
- Wand uses UART1 for debug output, Sabrelite: UART2
I compiled an sel4test project where I adapted the UART port in
kernel/include/plat/imx6/plat/machine/devices.h and
elfloader/src/arch-arm/plat-imx6/platform.h and the RAM size in kernel
src/plat/imx6/machine/hardware.c. When I boot this system, I get:
Jumping to kernel-image entry point...
Bootstrapping kernel
Caught cap fault in send phase at address 0x0
while trying to handle:
vm fault on data at address 0x9f11c2e0 with status 0x1c06
in thread 0xffdfad00 at address 0x13294
(Needless to say, "all is well in the universe" on the SabreLite... )
What is not shown here are a ton of other debug messages which I have
added to convince myself that kernel initialization completes as
expected. The crash seems to happen upon entry into user code. The
address 0x13294 is the virtual address of the entry point:
$ nm build/arm/imx6/sel4test-driver/sel4test-driver.bin | grep 13294
00013294 T _sel4_start
I suspect that this fault happens on opcode fetch, because the user code
is not properly mapped when invoked. Does "status 0x1c06" confirm this?
If so, *should* the code be mapped at this point or are these mappings
expected to be installed "on demand", i.e. through page fault handling?
Thanks for any help...
Robert
--
Robert Kaiser
Computer Engineering
RheinMain University of Applied Sciences
_______________________________________________
Devel mailing list
Devel@sel4.systems
https://sel4.systems/lists/listinfo/devel
________________________________
The information in this e-mail may be confidential and subject to legal professional privilege and/or copyright. National ICT Australia Limited accepts no liability for any damage caused by this email or its attachments.