I'm chasing an issue that looks like retype'd memory has nonsense data. If I read the kernel code correctly it looks like the object returned by an seL4_UntypeRetype syscall should be zero'd (looks to happen when an untyped memory object is reset here https://github.com/seL4/seL4/blob/master/src/object/untyped.c#L254). Is that correct? I don't see anything called out in the manual. -Sam
On Sat, 11 Mar 2023, 09:40 Sam Leffler via Devel,
I'm chasing an issue that looks like retype'd memory has nonsense data. If I read the kernel code correctly it looks like the object returned by an seL4_UntypeRetype syscall should be zero'd (looks to happen when an untyped memory object is reset here https://github.com/seL4/seL4/blob/master/src/object/untyped.c#L254). Is that correct? I don't see anything called out in the manual
If the untyped isn't device untyped then it should be zeroed before it is typed into an object. Device untyped is not allowed to be accessed by the kernel and so is not written to.
When are you observing the odd behaviour?
-Sam _______________________________________________ Devel mailing list -- devel@sel4.systems To unsubscribe send an email to devel-leave@sel4.systems
On Fri, Mar 10, 2023 at 3:28 PM Kent Mcleod
On Sat, 11 Mar 2023, 09:40 Sam Leffler via Devel,
wrote: I'm chasing an issue that looks like retype'd memory has nonsense data. If I read the kernel code correctly it looks like the object returned by an seL4_UntypeRetype syscall should be zero'd (looks to happen when an untyped memory object is reset here https://github.com/seL4/seL4/blob/master/src/object/untyped.c#L254). Is that correct? I don't see anything called out in the manual
If the untyped isn't device untyped then it should be zeroed before it is typed into an object. Device untyped is not allowed to be accessed by the kernel and so is not written to.
When are you observing the odd behaviour?
I've got a stress test that forces lots of memory recycling by creating, running & tearing down applications. I repeatedly see a particular point in the test (after memory starts being recycled) where an app gets an instruction fault. Narrowing the issue has been challenging so I'm questioning everything (including cache handling). This is all anonymous memory.
-Sam _______________________________________________ Devel mailing list -- devel@sel4.systems To unsubscribe send an email to devel-leave@sel4.systems
Hello, On 2023-03-11 10:56, Sam Leffler via Devel wrote:
I've got a stress test that forces lots of memory recycling by creating, running & tearing down applications. I repeatedly see a particular point in the test (after memory starts being recycled) where an app gets an instruction fault. Narrowing the issue has been challenging so I'm questioning everything (including cache handling). This is all anonymous memory.
What platform are you running this on? Does it have a VIPT L1 cache, and if so, is KernelArmICacheVIPT enabled for your cpu? Another thing that could trigger this is re-using ASIDs. It can be either a bug in your application code, or an ASID re-use bug in seL4. Greetings, Indan
RISC-V (32-bit), no L2 cache. All of user-space shares 1 ASID.
On Fri, Mar 10, 2023 at 4:23 PM Indan Zupancic
Hello,
On 2023-03-11 10:56, Sam Leffler via Devel wrote:
I've got a stress test that forces lots of memory recycling by creating, running & tearing down applications. I repeatedly see a particular point in the test (after memory starts being recycled) where an app gets an instruction fault. Narrowing the issue has been challenging so I'm questioning everything (including cache handling). This is all anonymous memory.
What platform are you running this on? Does it have a VIPT L1 cache, and if so, is KernelArmICacheVIPT enabled for your cpu?
Another thing that could trigger this is re-using ASIDs. It can be either a bug in your application code, or an ASID re-use bug in seL4.
Greetings,
Indan
Hi Sam, this issue call my attention... if you are right you may have
potentially found some security issue, which is an area I am very
interested....
I'm curious... How are you "debugging"...? I guess you just set some check
points in your code and inspect the memory....? Have you tried to:
1) automate repeating the process causing the dirty memory many times
2) dumping the offending bytes in the memory
3) try to give some sense to those bytes (to try find where they come
from...)...
I know this may not directly help solve the issue but it may at some point
put some light on it...
Best,
El sáb, 11 mar 2023 a las 1:06, Sam Leffler via Devel (
On Fri, Mar 10, 2023 at 3:28 PM Kent Mcleod
wrote: On Sat, 11 Mar 2023, 09:40 Sam Leffler via Devel,
wrote: I'm chasing an issue that looks like retype'd memory has nonsense data.
If
I read the kernel code correctly it looks like the object returned by an seL4_UntypeRetype syscall should be zero'd (looks to happen when an untyped memory object is reset here https://github.com/seL4/seL4/blob/master/src/object/untyped.c#L254). Is that correct? I don't see anything called out in the manual
If the untyped isn't device untyped then it should be zeroed before it is typed into an object. Device untyped is not allowed to be accessed by the kernel and so is not written to.
When are you observing the odd behaviour?
I've got a stress test that forces lots of memory recycling by creating, running & tearing down applications. I repeatedly see a particular point in the test (after memory starts being recycled) where an app gets an instruction fault. Narrowing the issue has been challenging so I'm questioning everything (including cache handling). This is all anonymous memory.
-Sam _______________________________________________ Devel mailing list -- devel@sel4.systems To unsubscribe send an email to devel-leave@sel4.systems
_______________________________________________ Devel mailing list -- devel@sel4.systems To unsubscribe send an email to devel-leave@sel4.systems
On Sat, Mar 11, 2023 at 10:56 AM Sam Leffler
On Fri, Mar 10, 2023 at 3:28 PM Kent Mcleod
wrote: On Sat, 11 Mar 2023, 09:40 Sam Leffler via Devel,
wrote: I'm chasing an issue that looks like retype'd memory has nonsense data. If I read the kernel code correctly it looks like the object returned by an seL4_UntypeRetype syscall should be zero'd (looks to happen when an untyped memory object is reset here https://github.com/seL4/seL4/blob/master/src/object/untyped.c#L254). Is that correct? I don't see anything called out in the manual
If the untyped isn't device untyped then it should be zeroed before it is typed into an object. Device untyped is not allowed to be accessed by the kernel and so is not written to.
When are you observing the odd behaviour?
I've got a stress test that forces lots of memory recycling by creating, running & tearing down applications. I repeatedly see a particular point in the test (after memory starts being recycled) where an app gets an instruction fault. Narrowing the issue has been challenging so I'm questioning everything (including cache handling). This is all anonymous memory.
Is the instruction fault triggered deterministically when the system is stressed? What sort of instruction fault, prefetch data fault or undefined instruction? Because it's happening with teardown and recreate of applications, it could be more likely caused by bad page table cache maintenance rather than clean on the UT retype. Kent.
-Sam _______________________________________________ Devel mailing list -- devel@sel4.systems To unsubscribe send an email to devel-leave@sel4.systems
On 11 Mar 2023, at 09:19, Sam Leffler via Devel
On Fri, Mar 10, 2023 at 3:55 PM Gerwin Klein
On 11 Mar 2023, at 09:19, Sam Leffler via Devel
wrote: I'm chasing an issue that looks like retype'd memory has nonsense data. If I read the kernel code correctly it looks like the object returned by an seL4_UntypeRetype syscall should be zero'd (looks to happen when an untyped memory object is reset here https://github.com/seL4/seL4/blob/master/src/object/untyped.c#L254). Is that correct? I don't see anything called out in the manual.
Yes, like Kent said unless it's a device untyped where the user is supposed to zero the memory, it should be 0.
One cause I could imagine would be caching (which would be a bug). E.g something like https://github.com/seL4/seL4/pull/485
Interesting. I've been validating page contents but was worried about page tables. Thanks for the pointer. -Sam
participants (6)
-
Demi Marie Obenour
-
Gerwin Klein
-
Hugo V.C.
-
Indan Zupancic
-
Kent Mcleod
-
Sam Leffler