Unblocking the caller of seL4_Call when the receiver is killed
Hello, I'll give a brief introduction since this is my first message here. I'm an undergraduate research assistant at the University of British Columbia, and I'm working with Margo Seltzer (https://www.seltzer.com/margo/) in the Systopia Lab. I'm helping with development of Sid Agrawal's (https://sid-agrawal.ca/) project built on seL4. I'm trying to determine what is the correct way to handle this situation: 1. Process 1 makes an seL4_Call to an endpoint that Process 2 is listening on. 2. Process 2 receives the message, and since we are running in non-MCS kernel, it will get a reply capability in its TCB. 3. Process 2 dies, for whatever reason, before it can reply to Process 1. If we wanted to unblock Process 1, what would be the correct approach? - I can use seL4_CNode_SaveCaller in Process 2 as soon as it receives a message, and store the slot number in some shared memory with the root task. Then when Process 2 dies, the root task is able to move the reply capability into its own cspace and send a reply indicating an error. This is not ideal for our purpose, because we do not want Process 2 to be able to perform any operations on its own cspace, but it needs its own root cnode capability in order to perform seL4_CNode_SaveCaller. - Is seL4_Call the wrong function for this purpose? We also considered using NBRecv and considering the call a failure after some timeout period. This is also not ideal since we would lose the convenience of the Call's reply cap, etc. - Is there some other solution that we are missing? As an aside: I notice that for the MCS kernel, if the reply cap is deleted, Process 1's thread state would be moved from ThreadState_BlockedOnReceive to ThreadState_Inactive. In non-MCS kernel, it seems that the state would remain ThreadState_BlockedOnReceive? Is there a reason for this? Best, Arya
Hi Arya, You are correct that when a caller is in the BlockedOnReply state and it doesn't get a reply it will block indefinitely. Some options: - Use a separate timeout that checks on behalf of process 1 that it's gotten a reply. By performing a seL4_TCB_Suspend and then seL4_TCB_Resume, the blocked thread can be restarted. (It will reattempt the system call again as its first instruction executed, so you may need to also change its program counter). - If process 1's thread is restarted, but not deleted, then when it restarts it could still reply to the blocked thread using the reply cap. This requires that the reply cap isn't deleted as part of the restart process for the thread. - You can still use seL4_CNode_SaveCaller in process 2 without requiring a cap to its own CSpace. If you use a separate CNode object, then a seL4_CPtr cap to that Cnode can be used as the CPtr argument to seL4_CNode_SaveCaller. If this CNode is also installed in process 2's cspace then it can still address the copied reply cap using a different seL4_CPtr from its own root CNode. (This would mean that process 2 is able to modify the slots of the separate CNode object, but not other slots in its cspace.
As an aside: I notice that for the MCS kernel, if the reply cap is deleted, Process 1's thread state would be moved from ThreadState_BlockedOnReceive to ThreadState_Inactive. In non-MCS kernel, it seems that the state would remain ThreadState_BlockedOnReceive? Is there a reason for this?
This is because the parent reply cap on non-mcs is held in the TCB object of the thread that is blocked. The only way to delete this cap is to revoke the TCB object. So if all derived caps to this reply cap are gone or inaccessible, then the only ways to recover the thread is to explicitly cause it to be restarted via the TCB invocations that either change its registers or its running state. Hope this helps, Kent. On Wed, Jun 12, 2024 at 10:51 AM Arya Stevinson <arya.stevinson@gmail.com> wrote:
Hello,
I'll give a brief introduction since this is my first message here. I'm an undergraduate research assistant at the University of British Columbia, and I'm working with Margo Seltzer (https://www.seltzer.com/margo/) in the Systopia Lab. I'm helping with development of Sid Agrawal's (https://sid-agrawal.ca/) project built on seL4.
I'm trying to determine what is the correct way to handle this situation: 1. Process 1 makes an seL4_Call to an endpoint that Process 2 is listening on. 2. Process 2 receives the message, and since we are running in non-MCS kernel, it will get a reply capability in its TCB. 3. Process 2 dies, for whatever reason, before it can reply to Process 1.
If we wanted to unblock Process 1, what would be the correct approach? - I can use seL4_CNode_SaveCaller in Process 2 as soon as it receives a message, and store the slot number in some shared memory with the root task. Then when Process 2 dies, the root task is able to move the reply capability into its own cspace and send a reply indicating an error. This is not ideal for our purpose, because we do not want Process 2 to be able to perform any operations on its own cspace, but it needs its own root cnode capability in order to perform seL4_CNode_SaveCaller. - Is seL4_Call the wrong function for this purpose? We also considered using NBRecv and considering the call a failure after some timeout period. This is also not ideal since we would lose the convenience of the Call's reply cap, etc. - Is there some other solution that we are missing?
As an aside: I notice that for the MCS kernel, if the reply cap is deleted, Process 1's thread state would be moved from ThreadState_BlockedOnReceive to ThreadState_Inactive. In non-MCS kernel, it seems that the state would remain ThreadState_BlockedOnReceive? Is there a reason for this?
Best, Arya _______________________________________________ Devel mailing list -- devel@sel4.systems To unsubscribe send an email to devel-leave@sel4.systems
Hi Kent, Thank you, your suggestions are helpful. I didn't realize we could use a separate CNode to save the reply cap with seL4_CNode_SaveCaller. Do you think this is an acceptable use of seL4_Call? There is a sequence diagram here: https://github.com/sid-agrawal/OSmosis/blob/fd3aa0be5baa29684b9872c83e73de90.... I want to make sure that we're not breaking any design principles by moving the reply cap to the root task and having the root task reply on behalf of Process 2. Best, Arya
participants (2)
-
Arya Stevinson
-
Kent Mcleod