Mitchell Johnston ( https://sel4.atlassian.net/secure/ViewProfile.jspa?accountId=712020%3A58682… ) *created* an issue
RFCs ( https://sel4.atlassian.net/browse/RFC?atlOrigin=eyJpIjoiMDkyNDQwZmRhMjc0NDg… ) / RFC ( https://sel4.atlassian.net/browse/RFC-14?atlOrigin=eyJpIjoiMDkyNDQwZmRhMjc0… ) RFC-14 ( https://sel4.atlassian.net/browse/RFC-14?atlOrigin=eyJpIjoiMDkyNDQwZmRhMjc0… ) MCS: Adding budget limit thresholds to endpoints for SC Donation ( https://sel4.atlassian.net/browse/RFC-14?atlOrigin=eyJpIjoiMDkyNDQwZmRhMjc0… )
Issue Type: RFC Assignee: Mitchell Johnston ( https://sel4.atlassian.net/secure/ViewProfile.jspa?accountId=712020%3A58682… ) Created: 15/Aug/23 10:30 PM Reporter: Mitchell Johnston ( https://sel4.atlassian.net/secure/ViewProfile.jspa?accountId=712020%3A58682… )
*******
Summary
*******
This is a two stage RFC that proposes to add a new `budget threshold' to endpoints, which would only allow scheduling context donation to occur if the SC's available budget exceeds the threshold. This would provide a mechanism to prevent budget-expiry from occurring in passive servers.
As a second optional stage, this threshold value could also restrict the budget that a passive server is permitted to consume on a donated SC. We refer to this as a `budget limit'. This would help bound priority inversion in a system as well as reducing the trust that clients must place in passive servers they call.
**********
Motivation
**********
Thresholds
This change regarding thresholds is motivated by a desire to prevent budget expiry from occurring inside passive servers, improving worst-case performance and response time analysis. Additionally, it would shift timeout exceptions into being a true error case, rather than an unavoidable, but routine occurrence on many systems. This is likely to simplify system design.
Currently, there are almost no restrictions on the amount of budget required to enter a passive server. This almost inevitably leads to budget expiry, resulting in undesirable longer blocking times for other clients of the passive server. This must also be accounted for in schedulability analysis of a system and is particularly important for servers with clients of varying criticality.
The kernel currently supports a mechanism to manage budget expiry, timeout handlers. These allow an error-handling thread to intervene and resolve the budget expiry. However, as they address the issue after it occurs, these have some limitations. Timeout handlers can provide extra budget, however, it is not preferable to reward clients with additional budget just because they are blocking the rest of the system. Alternatively, the server can be reverted to a previous state, however this wastes any work the server performed and can be expensive or impractical, depending on the nature of the passive server.
This RFC proposes a new mechanism, endpoint thresholds, which aim to prevent budget expiry from occurring in passive servers, rather than reacting to it after it occurs.
-------------
Budget limits
-------------
The second stage of budget limits is motivated by the current implicit requirement that clients must completely trust passive servers that they call. Under the current model, a server can consume as much budget as it desires on a donated SC and is not forced to return it. A malfunctioning or malicious passive server can therefore run indefinitely while also possessing a higher priority than its clients, significantly impeding their ability to make progress.
The impact of this extends to any thread with a priority between that of the passive server and its clients, even threads that are not themselves clients of the passive server. Enforcing stronger bounds on passive server SC consumption would reduce the level of trust threads in a system need to have in one another.
***********************
Guide-level explanation
***********************
----------
Thresholds
----------
This change involves adding a new attribute to endpoint objects, known as a threshold. By default, this threshold value would be set to zero and will have no effect. Therefore, existing code will not need to be updated to accommodate thresholds.
If the threshold is set to a non-zero value, sending invocations performed on the endpoint will only be permitted if they donate a scheduling context with available budget greater than or equal to the threshold. Available budget is the amount of budget ready for use in the SC (the sum of refills with a release time greater than the current time). By setting the threshold to the WCET of the passive server associated with the endpoint, this guarantees that the server will always receive an SC with sufficient budget to complete its execution, preventing budget expiry from occurring. This will also ensure that timeout exceptions will only occur as a result of a true error, such as a misconfigured threshold value or a fault in the server code.
If a client has insufficient available budget in its SC, its refills will be deferred and merged until its head refill has sufficient budget at some point in the future. Once the thread resumes, the IPC invocation will transparently continue. If a client's maximum budget is less than the threshold value, an error will instead be returned to the client.
The time required to check an SC's available budget and to defer and merge are both bounded by the maximum number of refills in an SC. When designing a system using an endpoint with a threshold, this will be an additional factor to consider regarding the optimal choice for the maximum number of SC refills.
The threshold value of an endpoint can be set via an object invocation, performed indirectly through an invocation on the Cnode, similar to the existing Cancel Badged Sends invocation. However, we restrict the right to set the endpoint threshold to only the original unbadged capability.
Changes to the threshold value are propagated weakly, meaning that clients only interact with the threshold value at the moment where they invoke the endpoint. If a threshold value is increased, clients already enqueued on the endpoint are unaffected, even if they now possess insufficient budget. Similarly, threads which have had their budget deferred will not be woken up early, even if the threshold value is reduced. As the threshold is intended to reflect the WCET of the passive server, changing it would be a rare operation. Therefore, we consider this weak propogation behaviour acceptable.
Additionally, invocations which do not support scheduling context donation are no longer permitted to invoke an endpoint with a non-zero threshold set. They instead fail immediately as an invalid operation. This prevents a passive server from performing a rendezvous with a client that did not donate an SC, which would result in the server being unable to run, which causes similar issues to budget expiry. Invoking a passive server without donating an SC is fundamentally an invalid operation, so it is therefore desirable for the IPC to fail upon the initial send, rather than causing an exception only after rendezvousing with the server.
As an optional extension, the new defer and merge behaviour of SC's can also be made available to threads as an explicit system call. We have provisionally called this YieldUntilBudget, denoting it as a variant of Yield. This would allow threads to specify a desired quantity of budget, after which the kernel will merge and defer refills until the thread's head refill exceeds the desired budget. We do emphasise that this additional system call is not required for the functionality of thresholds. However, it will essentially be implemented regardless as part of thresholds and exposing it to users is an easy change.
-------------
Budget Limits
-------------
The optional budget limit change extends the meaning of an endpoint threshold value to also restrict budget consumption by a passive server on a donated SC. An additional bit is added to endpoint objects to toggle budget limit behaviour. By default, budget limits are disabled and only the aforementioned threshold behaviour applies. If the threshold is non-zero and budget limits are enabled, then both threshold and budget limit behaviour will be in effect. A budget limit has a no effect if the threshold is zero, regardless of the value of the budget limit bit.
If a passive server returns a donated SC having consumed less than the budget limit (equal to the threshold value), then the budget limit has no effect. However, if the server's consumption reaches the budget limit, the kernel will forcibly return the SC to the caller, ensuring that the server’s consumption does not exceed the limit. This will almost certainly leave the server in a stuck state, so the server's timeout handler will be invoked to recover it.
When an SC is donated over an endpoint with budget limits enabled, that SC is marked as possessing a budget limit. An SC with a budget limit can only be further donated over endpoints that also have budget limits set.
During such a donation, the kernel must still determine whether the donated SC has a sufficient budget (greater than the threshold). However, rather than considering the total available budget in the SC, the kernel considers the budget remaining until the existing budget limit would be reached. This allows the kernel to guarantee that every server is able to use its full budget limit allocation, while also being able to enforce the limit if any server exceeds its allocated budget limit. The SC is only marked as no longer possessing a budget limit once it is returned over the original budget limit endpoint.
***************************
Reference-level explanation
***************************
----------
Thresholds
----------
This RFC proposes to add a ticks_t value to endpoint objects as the `threshold'. It is also viable to associate the threshold with endpoint capabilities, this is discussed in the rationale and alternative section below. For every sending IPC invocation, fastpath or slowpath, the kernel checks the threshold value of the relevant object. If it is set to zero, this represents threshold behaviour being disabled and the IPC continues, with no further behaviour changes.
However, if the threshold value is non-zero, then the kernel calculates the available budget of the thread (released refills in the SC) and compares against the budget required to pass the threshold. The intention is for the 'threshold' value to represent the required runtime of a passive server. Therefore, the budget to pass the threshold is the 'threshold' value plus twice the kernel WCET, to account for the call and reply system calls. Additionally, the kernel must account for time consumed by the client, but not yet charged to its SC. If the available budget is sufficient, again, the IPC continues without further behaviour changes.
However, if the available budget is insufficient, alternate action needs to be taken. At this point, the fastpath and slowpath diverges. If the available budget check on the fastpath fails, the kernel switches over to the slowpath to reduce complexity on the fastpath.
Returning to the slowpath behaviour, after determining that the thread has insufficient available budget, the thread's maximum budget is compared to the threshold. If the maximum budget is insufficient, an error is returned to the client. Otherwise, the client's usage is first charged to it, and then its SC's refills are merged and deferred until the head refill is sufficient to pass the threshold. This will mostly likely be at some point in the future, so the client will be inserted into the release queue and the system call will be restarted once the refill is released.
-------------
Budget Limits
-------------
Supporting budget limits requires multiple changes in addition to those required for thresholds. Each endpoint object requires an additional bit to represent the budget limit toggle, however this does not require an increase in the object size. SC's also require additional fields to track whether they currently possess a budget limit, along with a budget consumption field. Finally, reply objects have an additional limit field added to track the budget limit currently in effect.
When a thread and SC (without a budget limit in effect) calls over an endpoint with a budget limit set, the donated SC is marked as possessing a budget limit and its budget consumption is reset to zero. The reply object's limit field is then set to the threshold value. When setting the timer interrupt, the kernel additionally compares the active SC's consumed budget field against the limit in the reply object. This allows the kernel to preempt the thread if it exceeds the budget limit.
When a thread, where the SC has a budget limit in effect, calls over an endpoint, the kernel only permits the operation if the endpoint has a budget limit set.The remaining budget limit is then compared to the endpoint's threshold value and the operation is only permitted if the remaining budget exceeds the threshold value. The SC’s consumed budget field is not reset, but the reply object’s limit field is set to the SC’s consumed budget plus the endpoint threshold value. This allows the kernel to track and enforce the new budget limit, while not affecting the information required to enforce the original budget limit.
If a thread with a budget limit enabled exceeds its budget limit, the kernel preempts it and returns the SC one layer down to the caller. In a multi-level call chain, other budget limits further down the chain remain in effect.
This functionality requires additional overhead on the fastpath, in particular reprogramming the timer interrupt.
--------------------
Performance summary:
--------------------
In this section we present a summary of performance results, checking the increased overhead that would result from implementing these proposed changes. Three broad cases were tested:
* Baseline: The baseline MCS kernel, without any changes.
* Threshold disabled: A kernel with our proposed changes, but tested over an endpoint that does not enforce threshold or budget limit behaviour.
* Threshold/Budget limit enabled: A kernel with the proposed changes, over an endpoint that enforces threshold/budget limit behaviour.
Thresholds Only
---------------
First, we compare successful IPC cost of thresholds only (no budget limits) against the baseline kernel using the standard seL4bench IPC tests.
These tests were performed with an SC with a single refill.
*Fastpath* *Fastpath overhead* *Slowpath* *Slowpath overhead* Baseline 269 (3) N/A 945 (12) N/A Thresholds disabled 276 (4) 3% 959 (10) 1% Thresholds enabled 304 (2) 13% 976 (12) 3%
IPC Call overhead from endpoint thresholds. Results are cycles, presented as: mean (standard deviation
Thresholds introduce a new case to IPC, whereby instead of succeeding immediately, the thread's budget is deferred.
We measured the cost of deferring budget for SC's with various numbers of refills.
*Operation* *Extra Refills* *Fastpath* *Slowpath* IPC call N/A 272 (2) 962 (15) IPC block N/A 852 (11) 796 (17) Threshold defer 0 1150 (24) 1015 (17) Threshold defer 10 1391 (18) 1300 (26) Threshold defer 20 1665 (25) 1545 (19) Threshold defer 30 1942 (30) 1819 (20) Threshold defer 40 2174 (18) 2079 (23) Threshold defer 50 2446 (24) 2331 (29)
IPC Call defer costs. Results are cycles, presented as: mean (standard deviation)
Budget Limits
-------------
In this section, we consider the costs on a system that supports both thresholds and budget limits.
We compare the increase in overhead for both IPC Call and ReplyRecv.
First, overheads for IPC Call
*Fastpath* *Fastpath overhead* *Slowpath* *Slowpath overhead* Baseline 269 (3) N/A 945 (12) N/A Thresholds disabled 291 (2) 8% 1068 (14) 13% Budget limit enabled 327 (0) 22% 1108 (18) 17%
IPC Call overhead from budget limit thresholds. Results are cycles, presented as: mean (standard deviation)
Next, the overheads for ReplyRecv
*Fastpath* *Fastpath overhead* *Slowpath* *Slowpath overhead* Baseline 290 (0) N/A 972 (16) N/A Thresholds disabled 298 (3) 3% 1131 (20) 16% Budget limit enabled 352 (5) 21% 1208 (17) 24%
IPC ReplyRecv overhead from budget limit thresholds. Results are cycles, presented as: mean (standard deviation)
----------------------------
Other implementation details
----------------------------
An example implementation and draft pull request, for clarity of changes, is available here:
* https://github.com/Yermin9/seL4/tree/threshold_rfc
* https://github.com/Yermin9/seL4/pull/6
The core of this change is additional code on the IPC path that checks if a calling client has sufficient budget compared to the threshold and if not, deferring and merging until its head refill has sufficient budget. The size of endpoint objects has been increased by 1 bit to make room to store the threshold value, which is a ticks_t value. On a 64-bit system, this increases the size from 16 bytes to 32 bytes.
The calculation of available budget requires summing over all of an SC's released refills. This is not a major issue, but does introduce a new kernel time dependence on the refill size of an SC. The same is true with deferring and merging refills. There already exists a performance trade-off regarding max refill sizes, this is simply an additional factor for system designers to consider.
*********
Drawbacks
*********
The primary drawback is that this change will increase the IPC path cost, imposing an additional cost on all IPC calls. This is true even for endpoints with thresholds set to zero, as the kernel still needs to check whether it should impose the threshold behaviour restrictions. However, while non-zero, this checking cost is small. Where thresholds are enforced, there will be a modest performance impact. However, as outlined above, budget limits will impose a more significant performance cost.
**************************
Rationale and alternatives
**************************
A key benefit is that this change will support better schedulability analysis and reduced server complexity. The threshold change trades slightly slower average-case performance for much improved worst-case performance. However, this more predictable behaviour is of particular benefit for schedulability analysis. In particular, this predictability is highly important for real-time systems that the MCS kernel is targeted at. Further, this change also moves timeout faults into a true error case, rather than an unavoidable consequence of passive servers.
The current model supports changes to the threshold value, however, changes are propagated very weakly. As described above, budgets are only compared to the threshold value at the point where a thread invokes an endpoint. Threads already enqueued on the endpoint or waiting for sufficient budget are only affected by a change to the threshold value when they invoke the endpoint again.
We had considered stronger threshold propagation models, whereby all threads enqueued on the endpoint or waiting for sufficient budget were affected by changes to the threshold value immediately. Upon a change to the threshold value, threads would be enqueued or dequeued from the endpoint to reflect the change. However, this behaviour involved significant additional kernel complexity, in particular requiring two additional release queues. Briefly, this is because deferring and merging budget would not be possible, as a thread's available budget must be preserved in case the threshold value is decreased. This required significant additional infrastructure to track the threads and their budgets, while also waiting for additional refill releases. Given that changing an endpoint threshold is expected to be a rare operation, we do not consider this compromise of kernel minimality worth the benefit. Therefore, we implemented the weak propagation behaviour described above.
Alternative associations for the threshold value were considered, namely the passive server's TCB and endpoint capabilities, rather than objects.
* Association with the server's TCB would add significant kernel complexity with little benefit. Threshold values would need to be checked at the point of IPC rendezvous, rather than at the point of entering the endpoint. This could require endpoints to have two queues, one containing servers that ready to receive and another with clients that are ready to send, but unable to rendezvous because of insufficient budget.
* Association with endpoint capabilities would allow different client's entry points to have different thresholds. There is potentially a use for different threshold value representing different operations into a passive server, with correspondingly different WCET's. However, this would require increasing the size of endpoint capabilities. Due to the nature of the capability system, this would also require increasing the size of all capabilities. This is not an insurmountable issue and may be the best design.
-----------------------------------
Budget limit call stack restriction
-----------------------------------
For budget limits, we introduced the restriction that threads with a budget limit in effect could only donate their SC over an endpoint with a budget limit. Originally, we did consider a model that did not restrict how users could construct call stacks, however there were significant downsides. We now illustrate this with an example.
Consider a client (C) that calls a server (S) over a budget limit endpoint. Then, over endpoints without budget limits set, S donates its SC to s~1~, s~1~ to s~2~ and so on to s~n~.
We outline two system behaviour assumptions that we believe are reasonable:
* The original budget limit guarantee made from S to C should still be enforced regardless of which thread the SC is bound to at the time. If this is not enforced then the usefulness of the budget limit to guarantee scheduling properties is greatly diminished.
* Further, any thread that is skipped over by the returning SC needs to have its timeout handler invoked, to restore them to a sane state. This is required for the system to remain usable. Otherwise, some mechanism must exist to notify some user-level error-handling thread of which server threads are stuck and allow it to recover them. This is essentially the same concept as a timeout/error handler regardless.
Therefore, if the budget limit is exceeded by s~n~, the reply stack needs to be traversed until the original client is found, triggering timeout handlers at each level as well. The result is that kernel execution time would be O dependent on the size of the call stack created, which we consider unreasonable.
Further, S is making a guarantee to C that only a certain amount of budget will be consumed on the donated SC, that being the semantic meaning of the budget limit. It would be a poor design for S to then donate that SC to another server (s~1~) with no timing guarantees, as if s~1~ consumes too much budget, S will also be skipped over when the SC is returned to C. This means that S cannot guarantee it will meet its own scheduling obligations. Therefore, we consider it a reasonable design restriction to only allow S to donate its SC to servers that also provide a budget limit guarantee.
If S needs to donate its SC to a server (s~1~) that cannot provide a budget limit guarantee, we believe it is most sensible for S to not offer a budget limit either, as S does not know how much budget will be required. Alternatively, we believe adding a budget limit to s~1~ would also be a sensible design pattern.
*********
Prior art
*********
See also this thesis ( https://trustworthy.systems/publications/theses_public/22/Johnston%3Abe.abs…), where this proposed change was discussed in more detail.
********************
Unresolved questions
********************
What is the best API for the new Yield system call?
* The cleanest API would probably be to change seL4_Yield() to accept a single parameter. If the parameter is zero, the existing Yield behaviour occurs, otherwise, a non-zero value triggers the new defer and merge behaviour.
Unfortunately, this would be a breaking API change. However, the change required would be fairly simple, replacing every seL4_Yield() with seL4_Yield(0).
* Same as above, but make the true system call seL4_YieldUntilBudget(time_t val). This would allow seL4_Yield() to be made a macro for seL4_YieldUntilBudget(0). This seems to be a messier API, but maintains backward API compatibility. It would however, break existing binary compatibility.
* Two true system calls, the existing seL4_Yield() and the new seL4_YieldUntilBudget. This seems to be less preferable from an API perspective (and potentially performance, though this hasn't been tested), but also maintains backward compatibility.
We restrict sending invocations on an endpoint with a threshold set to only those that permit SC donation. There are some possible extensions of this:
* It is potentially reasonable to also restrict receiving invocations to only those that allow SC donation? For a passive server, receiving in a manner that does not permit SC donation is an invalid configuration.
This change would cause an explicit error upon invocation, rather than the server getting stuck after IPC rendezvous. This is potentially desirable.
( https://sel4.atlassian.net/browse/RFC-14#add-comment?atlOrigin=eyJpIjoiMDky… ) Add Comment ( https://sel4.atlassian.net/browse/RFC-14#add-comment?atlOrigin=eyJpIjoiMDky… )
Get Jira notifications on your phone! Download the Jira Cloud app for Android ( https://play.google.com/store/apps/details?id=com.atlassian.android.jira.co… ) or iOS ( https://itunes.apple.com/app/apple-store/id1006972087?pt=696495&ct=EmailNot… ) This message was sent by Atlassian Jira (v1001.0.0-SNAPSHOT#100234- sha1:aec3a25 )
Robbie VanVossen ( https://sel4.atlassian.net/secure/ViewProfile.jspa?accountId=557058%3Ab7eb5… ) *updated* an issue
RFCs ( https://sel4.atlassian.net/browse/RFC?atlOrigin=eyJpIjoiMjdiNDNlZTMwNDdlNDQ… ) / RFC ( https://sel4.atlassian.net/browse/RFC-9?atlOrigin=eyJpIjoiMjdiNDNlZTMwNDdlN… ) RFC-9 ( https://sel4.atlassian.net/browse/RFC-9?atlOrigin=eyJpIjoiMjdiNDNlZTMwNDdlN… ) Add new capability for seL4 SMC Forwarding ( https://sel4.atlassian.net/browse/RFC-9?atlOrigin=eyJpIjoiMjdiNDNlZTMwNDdlN… )
Change By: Robbie VanVossen ( https://sel4.atlassian.net/secure/ViewProfile.jspa?accountId=557058%3Ab7eb5… )
h1. Summary
This RFC proposes to add a new , badged capability which certain threads can invoke so seL4 can make SMC calls on ARM platforms in EL2 (virtualized mode) on the thread’s behalf.
h1. Motivation
Currently, seL4 running in EL2 traps all SMC calls. Then , if setup, seL4 decides what calls a handler to do deal with these calls the call. Some SMC calls are emulated, such as PSCI for the camkes-vm , but most are just ignored. However, there may be some SMC calls that actually need to make it to the Supervisor, and this configuration does not allow that to happen. This RFC proposes a solution to this problem by giving threads creating a new badged , badged capability which can be given to specific threads to allows them to forward configured SMC calls to seL4 so that the microkernel will make the actual SMC call on the thread's behalf.
h1. Guide-level explanation
This change will allow you to forward to the microkernel, emulate, or block SMC calls based on platform specific configuration. When an SMC call is made by a VM, the microkernel intercepts the call, which it does for all SMC calls made at EL1. It then calls {{handle_smc()}} for the corresponding VMM thread. That function will decode the specific SMC call and decide whether to block it, emulate it, or forward the call back into seL4. There are examples already for multiple platforms on emulation and blocking. The zynqmp platform will have an example of forwarding the SMC with the new API call, {{seL4_ARM_SMC_Call()}}.
h2. New concepts
h3. seL4_ARM_SMC_Call()
{{seL4_ARM_SMC_Call()}} is the new ObjectInvocation API call and can be called in two ways:
# A VM attempts to make an SMC call which traps into the microkernel. The microkernel informs the VMM’s handler and the VMM decides that the SMC call really needs to be made. The VMM then calls {{seL4_ARM_SMC_Call()}} with the SMC capability and all the arguments that the VM provided.
# Directly by constructing the SMC call from the thread with the SMC capability. This means that a trap isn’t needed.
Our expected use case has been using the first approach in the camkes-vm for VMM handlers on the behalf of its VM. I am not sure if there is a use case for the second approach, but it is certainly possible. In either case, the function still requires the caller to set all 8 registers, even if the specific SMC call uses less arguments.
h3. seL4_ARM_SMC
{{seL4_ARM_SMC}} is the new capability. It needs to be given to any threads that you expect to call {{seL4_ARM_SMC_Call()}}, otherwise it will be denied. We expect that in most use cases, it will be given to each VMM thread.
The capability is badged and the badge represents the SMC Function Identifier. Therefore, the specific SMC calls that each Thread can make is configurable and checked by seL4.
h2. For users
We plan to implement this configuration as an example configuration of camkes-vm for the zynqmp platform. Due to WCET implication of this approach, it will not be the default for all platforms. It should only be enabled when actually needed for a specific platform or application. Without this change, Linux is unable to run as a guest on camkes-vm for the zynqmp platform.
Here is an example for how to enable SMC calls for some VMs in the {{devices.camkes}} file implemented for a platform in {{camkes-vm-apps}}
{code:none}vm0.allow_smc = true;
vm0.allowed_smc_functions = [
0xC2000001, // PM_GET_API_VERSION
0xC2000017 // PM_FPGA_GET_STATUS
];
vm1.allow_smc = true;
vm1.allowed_smc_functions = [
0xC2000001, // PM_GET_API_VERSION
0xC200000D, // PM_REQUEST_NODE
0xC200000E // PM_RELEASE_NODE
];
vm2.allow_smc = false;
vm2.allow_smc_functions = [];{code}
As you can see in this example, you can configure each VM to allow or not allow any disallow SMC calls. For each VM that has SMC calls enabled, you then specify which SMC function calls they can make. These are integers that are defined per platform. As you can see, VMs can share the same calls functions and they can have different calls functions as well.
h1. Reference-level explanation
This implementation will meet the SMC Calling Convention specification from ARM ([ARM DEN 0028E v1.4|https://developer.arm.com/documentation/den0028/latest?_ga=2.116565828…]) for 32-bit SMC calls. 64-bit SMC calls are partially supported, but full support could be added later.
h2. seL4_ARM_SMC_Call()
{code:c}static int seL4_ARM_SMC_Call ( seL4_ARM_SMC _service,
seL4_ARM_SMCContext * smc_args,
seL4_ARM_SMCContext * smc_response
){code}
{noformat}Parameters
[in] _service Capability to allow threads to interact with SMC calls.
[in] smc_args The structure that has the provided arguments.
[out] smc_response The structure to capture the responses.
Returns{noformat}
h2. struct seL4_ARM_SMCContext
{code:c}typedef struct seL4_ARM_SMCContext_ {
/* register arguments */
seL4_Word x0, x1, x2, x3, x4, x5, x6, x7;
} seL4_ARM_SMCContext;{code}
h1. Drawbacks
There are a couple of drawbacks to allowing SMC calls to actually go through. Both relate to allowing a supervisor to do anything in the system past initialization time.
One drawback is that the worst case execution time (WCET) is no longer guaranteed for the system. Each SMC call can take an arbitrarily long amount of time. This means a thread can essentially lock up all of the cores for an unspecified amount of time. The core the SMC call is made on is obviously blocked because the supervisor will not return to seL4 until it completes its function. The other cores will become locked as soon as seL4 runs and tries to grab the Big Kernel Lock (BKL). One potential fix for the multicore issue is to release the BKL before making the SMC call. However, that is out of the scope of this RFC as it has a lot of implications. The current approach will be to have this config option off by default and add documentation about how enabling it affects WCET.
Another drawback is the effect this has on assurance. Since the Supervisor is a higher privilege level than seL4, it can make changes to the system of which seL4 is unaware. This would likely add on to the assumptions made if the proof ever included this configuration setting. The proofs would have to assume that the SMC calls that are made will not invalidate the proof. This is on the User to look at what each allowed SMC call does and check that it won’t interfere with the proof.
h1. Rationale and alternatives
This design is the best because it gives users flexibility by allowing them to make some SMC calls while still being able to emulate other SMC calls. It is the preferred method, because the microkernel shouldn't need to be updated with each new platform, VMM configuration, and secure application implementation. It can all be configured in user space.
h2. Alternative approaches
SMC filtering could be done in the microkernel so that developers are less likely to configure the system incorrectly. However, there are some cases where the filtering is dependent on specific threads and the application level (Secure world communication) instead of just platform dependent. On the first point, instead of creating a cap for making the SMC call, you could have a cap for each numbered SMC call so that you could configure what SMC call each thread can make using the root thread. However, that is quite a bit of work as that will differ for each ARM platform and adds another step when porting the microkernel to new ARM platforms. That could also solve the second point, but then users would likely need to modify the microkernel to support their specific secure world application, which is something that should really be avoided.
Another part of the implementation that could be done differently is using the VCPU capability with an access right instead of creating a new type of capability. The kinds of access rights are already determined as read, write, grant, and grantreply. A write access right would probably make the most sense, but it still seems obtuse that a write access on a VCPU capability would allow the invoker to make an SMC call. Besides that, there are two other issues I see with this approach:
# I can foresee a use-case where you have a health monitor component that isn’t associated with a VCPU but needs to be able to forward an SMC call to actually interact with EL3 or secure applications.
# If the VCPU capability ever needs read/write access rights that make more sense with the type of object it is, for example, limiting which threads can actually read or write that VCPUs registers.
For those reasons we believe a new capability makes the most sense.
h2. Impact of not implementing this
Not implementing this will limit the ARM platforms we can use for the camkes-vm and/or the guests that can run on those platforms. It will also limit support for applications that run in secure side of the trustzone that need to communicate with the nonsecure side (Where sel4, VMMs, and the VMs run).
h1. Prior art
Xen has a similar implementation for the Zynqmp where arbitration is done to allow/disallow/emulate SMC calls from guest VMs. The arbitration is quite specific and is automatically generated based on resource allocation for each VM. Xen does not have VMM pieces that run in Userspace. Basically all of the management runs in EL2. Therefore, all of the arbitration is done in Xen itself. This approach does not make sense for seL4 for the reasons stated above in the [#rationale|#rationale] section.
However, this does show a need for this feature in hypervisors on ARMv8 platforms.
If necessary, we could reference the Xen arbitration code to help improve the default arbitration for the zynqmp in our implementation.
h1. Unresolved questions
# How do we address the WCET issue? (Doesn’t need to be solved by this RFC because we are documenting the effects)
( https://sel4.atlassian.net/browse/RFC-9#add-comment?atlOrigin=eyJpIjoiMjdiN… ) Add Comment ( https://sel4.atlassian.net/browse/RFC-9#add-comment?atlOrigin=eyJpIjoiMjdiN… )
Get Jira notifications on your phone! Download the Jira Cloud app for Android ( https://play.google.com/store/apps/details?id=com.atlassian.android.jira.co… ) or iOS ( https://itunes.apple.com/app/apple-store/id1006972087?pt=696495&ct=EmailNot… ) This message was sent by Atlassian Jira (v1001.0.0-SNAPSHOT#100211- sha1:6a2b617 )
Robbie VanVossen ( https://sel4.atlassian.net/secure/ViewProfile.jspa?accountId=557058%3Ab7eb5… ) *updated* an issue
RFCs ( https://sel4.atlassian.net/browse/RFC?atlOrigin=eyJpIjoiMDA0MjRmMzBkMmVjNDQ… ) / RFC ( https://sel4.atlassian.net/browse/RFC-9?atlOrigin=eyJpIjoiMDA0MjRmMzBkMmVjN… ) RFC-9 ( https://sel4.atlassian.net/browse/RFC-9?atlOrigin=eyJpIjoiMDA0MjRmMzBkMmVjN… ) Add new capability for seL4 SMC Forwarding ( https://sel4.atlassian.net/browse/RFC-9?atlOrigin=eyJpIjoiMDA0MjRmMzBkMmVjN… )
Change By: Robbie VanVossen ( https://sel4.atlassian.net/secure/ViewProfile.jspa?accountId=557058%3Ab7eb5… )
h1. Summary
This RFC proposes to add a new capability which certain threads can invoke so seL4 can make SMC calls on ARM platforms in EL2 (virtualized mode) on the thread’s behalf.
h1. Motivation
Currently, seL4 running in EL2 traps all SMC calls. Then seL4 decides what to do with these calls. Some SMC calls are emulated, such as PSCI, but most are just ignored. However, there may be some SMC calls that actually need to make it to the Supervisor, and this configuration does not allow that to happen. This RFC proposes a solution to this problem by giving threads a new badged, capability which allows them to forward SMC calls to seL4 so that the microkernel will make the actual SMC call on the thread's behalf.
h1. Guide-level explanation
This change will allow you to forward to the microkernel, emulate, or block SMC calls based on platform specific configuration. When an SMC call is made by a VM, the microkernel intercepts the call, which it does for all SMC calls made at EL1. It then calls {{handle_smc()}} for the corresponding VMM thread. That function will decode the specific SMC call and decide whether to block it, emulate it, or forward the call back into seL4. There are examples already for multiple platforms on emulation and blocking. The zynqmp platform will have an example of forwarding the SMC with the new API call, {{seL4_ARM_SMC_Call()}}.
h2. New concepts
h3. seL4_ARM_SMC_Call()
{{seL4_ARM_SMC_Call()}} is the new ObjectInvocation API call and can be called in two ways:
# A VM attempts to make an SMC call which traps into the microkernel. The microkernel informs the VMM’s handler and the VMM decides that the SMC call really needs to be made. The VMM then calls {{seL4_ARM_SMC_Call()}} with the SMC capability and all the arguments that the VM provided.
# Directly by constructing the SMC call from the thread with the SMC capability. This means that a trap isn’t needed.
Our expected use case has been using the first approach in the camkes-vm for VMM handlers on the behalf of its VM. I am not sure if there is a use case for the second approach, but it is certainly possible. In either case, the function still requires the caller to set all 8 registers, even if the specific SMC call uses less arguments.
h3. seL4_ARM_SMC
{{seL4_ARM_SMC}} is the new capability. It needs to be given to any threads that you expect to call {{seL4_ARM_SMC_Call()}}, otherwise it will be denied. We expect that in most use cases, it will be given to each VMM thread.
The capability is badged and the badge represents the SMC Function Identifier. Therefore, the specific SMC calls that each Thread can make is configurable and checked by seL4.
h2. For users
We plan to implement this configuration as an example configuration of camkes-vm for the zynqmp platform. Due to WCET implication of this approach, it will not be the default for all platforms. It should only be enabled when actually needed for a specific platform or application. Without this change, Linux is unable to run as a guest on camkes-vm for the zynqmp platform.
Here is an example for how to enable SMC calls for some VMs in the {{devices.camkes}} file implemented for a platform in {{camkes-vm-apps}}
{code:none}vm0.allow_smc = true;
vm0.allowed_smc_functions = [
0xC2000001, // PM_GET_API_VERSION
0xC2000017 // PM_FPGA_GET_STATUS
];
vm1.allow_smc = true;
vm1.allowed_smc_functions = [
0xC2000001, // PM_GET_API_VERSION
0xC200000D, // PM_REQUEST_NODE
0xC200000E // PM_RELEASE_NODE
];
vm2.allow_smc = false;
vm2.allow_smc_functions = [];{code}
As you can see in this example, you can configure each VM to allow or not allow any SMC calls. For each VM that has SMC calls enabled, you then specify which SMC function calls they can make. These are integers that are defined per platform. As you can see, VMs can share the same calls and they can have different calls as well.
h1. Reference-level explanation
This implementation will meet the SMC Calling Convention specification from ARM ([ARM DEN 0028E v1.4|https://developer.arm.com/documentation/den0028/latest?_ga=2.116565828…]) for 32-bit SMC calls. 64-bit SMC calls are partially supported, but full support could be added later.
h2. seL4_ARM_SMC_Call()
{code:c}static int seL4_ARM_SMC_Call ( seL4_ARM_SMC _service,
seL4_ARM_SMCContext * smc_args,
seL4_ARM_SMCContext * smc_response
){code}
{noformat}Parameters
[in] _service Capability to allow threads to interact with SMC calls.
[in] smc_args The structure that has the provided arguments.
[out] smc_response The structure to capture the responses.
Returns{noformat}
h2. struct seL4_ARM_SMCContext
{code:c}typedef struct seL4_ARM_SMCContext_ {
/* register arguments */
seL4_Word x0, x1, x2, x3, x4, x5, x6, x7;
} seL4_ARM_SMCContext;{code}
h1. Drawbacks
The main drawback of this approach is that you There are a couple of drawbacks to allowing SMC calls to actually go through. Both relate to allowing a supervisor to do anything in the system past initialization time. This means
One drawback is that the worst case execution time is no longer guaranteed for the system. Each SMC call can take an arbitrarily long amount of time. This means a thread can essentially lock up all of the cores for an unspecified amount of time. The core the SMC call is made on is obviously blocked because the supervisor will not return to seL4 until it completes its function. The other cores will become locked as soon as seL4 runs and tries to grab the Big Kernel Lock (BKL). One potential fix for the multicore issue is to release the BKL before making the SMC call. However, that is out of the scope of this RFC as it has a lot of implications. The current approach will be to have this config option off by default and add documentation about how enabling it affects WCET.
Another drawback is the effect this has on assurance. Since the Supervisor is a higher privilege level than seL4, it can make changes to the system of which seL4 is unaware. This would likely add on to the assumptions made if the proof ever included this configuration setting. The proofs would have to assume that the SMC calls that are made will not invalidate the proof. This is on the User to look at what each allowed SMC call does and check that it won’t interfere with the proof.
h1. Rationale and alternatives
This design is the best because it gives users flexibility by allowing them to make some SMC calls while still being able to emulate other SMC calls. It is the preferred method, because the microkernel shouldn't need to be updated with each new platform, VMM configuration, and secure application implementation. It can all be configured in user space.
h2. Alternative approaches
SMC filtering could be done in the microkernel so that developers are less likely to configure the system incorrectly. However, there are some cases where the filtering is dependent on specific threads and the application level (Secure world communication) instead of just platform dependent. On the first point, instead of creating a cap for making the SMC call, you could have a cap for each numbered SMC call so that you could configure what SMC call each thread can make using the root thread. However, that is quite a bit of work as that will differ for each ARM platform and adds another step when porting the microkernel to new ARM platforms. That could also solve the second point, but then users would likely need to modify the microkernel to support their specific secure world application, which is something that should really be avoided.
Another part of the implementation that could be done differently is using the VCPU capability with an access right instead of creating a new type of capability. The kinds of access rights are already determined as read, write, grant, and grantreply. A write access right would probably make the most sense, but it still seems obtuse that a write access on a VCPU capability would allow the invoker to make an SMC call. Besides that, there are two other issues I see with this approach:
# I can foresee a use-case where you have a health monitor component that isn’t associated with a VCPU but needs to be able to forward an SMC call to actually interact with EL3 or secure applications.
# If the VCPU capability ever needs read/write access rights that make more sense with the type of object it is, for example, limiting which threads can actually read or write that VCPUs registers.
For those reasons we believe a new capability makes the most sense.
h2. Impact of not implementing this
Not implementing this will limit the ARM platforms we can use for the camkes-vm and/or the guests that can run on those platforms. It will also limit support for applications that run in secure side of the trustzone that need to communicate with the nonsecure side (Where sel4, VMMs, and the VMs run).
h1. Prior art
Xen has a similar implementation for the Zynqmp where arbitration is done to allow/disallow/emulate SMC calls from guest VMs. The arbitration is quite specific and is automatically generated based on resource allocation for each VM. Xen does not have VMM pieces that run in Userspace. Basically all of the management runs in EL2. Therefore, all of the arbitration is done in Xen itself. This approach does not make sense for seL4 for the reasons stated above in the [#rationale|#rationale] section.
However, this does show a need for this feature in hypervisors on ARMv8 platforms.
If necessary, we could reference the Xen arbitration code to help improve the default arbitration for the zynqmp in our implementation.
h1. Unresolved questions
# Should this implementation be configurable?
## I would argue it makes the most sense to have it automatically enabled if you are using the VMM on ARM and {{KernelAllowSMCCalls}} is disabled. Then your default whitelist in the SMC handler, for most platforms, can be to allow no SMC calls through, which would mean the default configuration would essentially be the same as the second configuration that is defined in the motivation section. However, I am wondering if there is a case where you still just want to disable it completely?
# Should the microkernel have an overriding blacklist?
## How and who decides what do we address the blacklist is WCET issue ?
## Is it configurable?
## I don' (Doesn’ t think need to be solved by this question blocks the RFC as because we are documenting the blacklist could be implemented easily enough after the proposed implementation. effects)
( https://sel4.atlassian.net/browse/RFC-9#add-comment?atlOrigin=eyJpIjoiMDA0M… ) Add Comment ( https://sel4.atlassian.net/browse/RFC-9#add-comment?atlOrigin=eyJpIjoiMDA0M… )
Get Jira notifications on your phone! Download the Jira Cloud app for Android ( https://play.google.com/store/apps/details?id=com.atlassian.android.jira.co… ) or iOS ( https://itunes.apple.com/app/apple-store/id1006972087?pt=696495&ct=EmailNot… ) This message was sent by Atlassian Jira (v1001.0.0-SNAPSHOT#100211- sha1:6a2b617 )
Robbie VanVossen ( https://sel4.atlassian.net/secure/ViewProfile.jspa?accountId=557058%3Ab7eb5… ) *updated* an issue
RFCs ( https://sel4.atlassian.net/browse/RFC?atlOrigin=eyJpIjoiNzQwZjI2NzE0M2Y2NDh… ) / RFC ( https://sel4.atlassian.net/browse/RFC-9?atlOrigin=eyJpIjoiNzQwZjI2NzE0M2Y2N… ) RFC-9 ( https://sel4.atlassian.net/browse/RFC-9?atlOrigin=eyJpIjoiNzQwZjI2NzE0M2Y2N… ) Add new capability for seL4 SMC Forwarding ( https://sel4.atlassian.net/browse/RFC-9?atlOrigin=eyJpIjoiNzQwZjI2NzE0M2Y2N… )
Change By: Robbie VanVossen ( https://sel4.atlassian.net/secure/ViewProfile.jspa?accountId=557058%3Ab7eb5… )
h1. Summary
This RFC proposes to add a new capability which certain threads can invoke so seL4 can make SMC calls on ARM platforms in EL2 (virtualized mode) on the thread’s behalf.
h1. Motivation
Currently, seL4 running in EL2 traps all SMC calls. Then seL4 decides what to do with these calls. Some SMC calls are emulated, such as PSCI, but most are just ignored. However, there may be some SMC calls that actually need to make it to the Supervisor, and this configuration does not allow that to happen. This RFC proposes a solution to this problem by giving threads a new badged, capability which allows them to forward SMC calls to seL4 so that the microkernel will make the actual SMC call on the thread's behalf.
h1. Guide-level explanation
This change will allow you to forward to the microkernel, emulate, or block SMC calls based on platform specific configuration. When an SMC call is made by a VM, the microkernel intercepts the call, which it does for all SMC calls made at EL1. It then calls {{handle_smc()}} for the corresponding VMM thread. That function will decode the specific SMC call and decide whether to block it, emulate it, or forward the call back into seL4. There are examples already for multiple platforms on emulation and blocking. The zynqmp platform will have an example of forwarding the SMC with the new API call, {{seL4_ARM_SMC_Call()}}.
h2. New concepts
h3. seL4_ARM_SMC_Call()
{{seL4_ARM_SMC_Call()}} is the new ObjectInvocation API call and can be called in two ways:
# A VM attempts to make an SMC call which traps into the microkernel. The microkernel informs the VMM’s handler and the VMM decides that the SMC call really needs to be made. The VMM then calls {{seL4_ARM_SMC_Call()}} with the SMC capability and all the arguments that the VM provided.
# Directly by constructing the SMC call from the thread with the SMC capability. This means that a trap isn’t needed.
Our expected use case has been using the first approach in the camkes-vm for VMM handlers on the behalf of its VM. I am not sure if there is a use case for the second approach, but it is certainly possible. In either case, the function still requires the caller to set all 8 registers, even if the specific SMC call uses less arguments.
h3. seL4_ARM_SMC
{{seL4_ARM_SMC}} is the new capability. It needs to be given to any threads that you expect to call {{seL4_ARM_SMC_Call()}}, otherwise it will be denied. We expect that in most use cases, it will be given to each VMM thread.
The capability is badged and the badge represents the SMC Function Identifier. Therefore, the specific SMC calls that each Thread can make is configurable and checked by seL4.
h2. For users
We plan to implement this configuration as on for an example configuration of camkes-vm for the zynqmp platform. Due to WCET implication of this approach, it will not be the default for all platforms. It should only be enabled when actually needed for a specific platform or application. Without this change, Linux is unable to run as a guest on camkes-vm for the zynqmp platform.
Here is an example for how to enable SMC calls for some VMs in the {{devices.camkes}} file implemented for a platform in {{camkes-vm-apps}}
{code:none}vm0.allow_smc = true;
vm0.allowed_smc_functions = [
0xC2000001, // PM_GET_API_VERSION
0xC2000017 // PM_FPGA_GET_STATUS
];
vm1.allow_smc = true;
vm1.allowed_smc_functions = [
0xC2000001, // PM_GET_API_VERSION
0xC200000D, // PM_REQUEST_NODE
0xC200000E // PM_RELEASE_NODE
];
vm2.allow_smc = false;
vm2.allow_smc_functions = [];{code}
As you can see in this example, you can configure each VM to allow or not allow any SMC calls. For each VM that has SMC calls enabled, you then specify which SMC function calls they can make. These are integers that are defined per platform. As you can see, VMs can share the same calls and they can have different calls as well.
h1. Reference-level explanation
This implementation will meet the SMC Calling Convention specification from ARM ([ARM DEN 0028E v1.4|https://developer.arm.com/documentation/den0028/latest?_ga=2.116565828…]) for 32-bit SMC calls. 64-bit SMC calls are partially supported, but full support could be added later.
h2. seL4_ARM_SMC_Call()
{code:c}static int seL4_ARM_SMC_Call ( seL4_ARM_SMC _service,
seL4_ARM_SMCContext * smc_args,
seL4_ARM_SMCContext * smc_response
){code}
{noformat}Parameters
[in] _service Capability to allow threads to interact with SMC calls.
[in] smc_args The structure that has the provided arguments.
[out] smc_response The structure to capture the responses.
Returns{noformat}
h2. struct seL4_ARM_SMCContext
{code:c}typedef struct seL4_ARM_SMCContext_ {
/* register arguments */
seL4_Word x0, x1, x2, x3, x4, x5, x6, x7;
} seL4_ARM_SMCContext;{code}
h1. Drawbacks
The main drawback of this approach is that you are allowing a supervisor in the system past initialization. This means that the worst case execution time is no longer guaranteed for the system. Each SMC call can take an arbitrarily long amount of time. This means a thread can essentially lock up all of the cores for an unspecified amount of time. The core the SMC call is made on is obviously blocked because the supervisor will not return to seL4 until it completes its function. The other cores will become locked as soon as seL4 runs and tries to grab the Big Kernel Lock (BKL). One potential fix for the multicore issue is to release the BKL before making the SMC call. However, that is out of the scope of this RFC as it has a lot of implications. The current approach will be to have this config option off by default and add documentation about how enabling it affects WCET.
h1. Rationale and alternatives
This design is the best because it gives users flexibility by allowing them to make some SMC calls while still being able to emulate other SMC calls. It is the preferred method, because the microkernel shouldn't need to be updated with each new platform, VMM configuration, and secure application implementation. It can all be configured in user space.
h2. Alternative approaches
SMC filtering could be done in the microkernel so that developers are less likely to configure the system incorrectly. However, there are some cases where the filtering is dependent on specific threads and the application level (Secure world communication) instead of just platform dependent. On the first point, instead of creating a cap for making the SMC call, you could have a cap for each numbered SMC call so that you could configure what SMC call each thread can make using the root thread. However, that is quite a bit of work as that will differ for each ARM platform and adds another step when porting the microkernel to new ARM platforms. That could also solve the second point, but then users would likely need to modify the microkernel to support their specific secure world application, which is something that should really be avoided.
Another part of the implementation that could be done differently is using the VCPU capability with an access right instead of creating a new type of capability. The kinds of access rights are already determined as read, write, grant, and grantreply. A write access right would probably make the most sense, but it still seems obtuse that a write access on a VCPU capability would allow the invoker to make an SMC call. Besides that, there are two other issues I see with this approach:
# I can foresee a use-case where you have a health monitor component that isn’t associated with a VCPU but needs to be able to forward an SMC call to actually interact with EL3 or secure applications.
# If the VCPU capability ever needs read/write access rights that make more sense with the type of object it is, for example, limiting which threads can actually read or write that VCPUs registers.
For those reasons we believe a new capability makes the most sense.
h2. Impact of not implementing this
Not implementing this will limit the ARM platforms we can use for the camkes-vm and/or the guests that can run on those platforms. It will also limit support for applications that run in secure side of the trustzone that need to communicate with the nonsecure side (Where sel4, VMMs, and the VMs run).
h1. Prior art
Xen has a similar implementation for the Zynqmp where arbitration is done to allow/disallow/emulate SMC calls from guest VMs. The arbitration is quite specific and is automatically generated based on resource allocation for each VM. Xen does not have VMM pieces that run in Userspace. Basically all of the management runs in EL2. Therefore, all of the arbitration is done in Xen itself. This approach does not make sense for seL4 for the reasons stated above in the [#rationale|#rationale] section.
However, this does show a need for this feature in hypervisors on ARMv8 platforms.
If necessary, we could reference the Xen arbitration code to help improve the default arbitration for the zynqmp in our implementation.
h1. Unresolved questions
# Should this implementation be configurable?
## I would argue it makes the most sense to have it automatically enabled if you are using the VMM on ARM and {{KernelAllowSMCCalls}} is disabled. Then your default whitelist in the SMC handler, for most platforms, can be to allow no SMC calls through, which would mean the default configuration would essentially be the same as the second configuration that is defined in the motivation section. However, I am wondering if there is a case where you still just want to disable it completely?
# Should the microkernel have an overriding blacklist?
## How and who decides what the blacklist is?
## Is it configurable?
## I don't think this question blocks the RFC as the blacklist could be implemented easily enough after the proposed implementation.
( https://sel4.atlassian.net/browse/RFC-9#add-comment?atlOrigin=eyJpIjoiNzQwZ… ) Add Comment ( https://sel4.atlassian.net/browse/RFC-9#add-comment?atlOrigin=eyJpIjoiNzQwZ… )
Get Jira notifications on your phone! Download the Jira Cloud app for Android ( https://play.google.com/store/apps/details?id=com.atlassian.android.jira.co… ) or iOS ( https://itunes.apple.com/app/apple-store/id1006972087?pt=696495&ct=EmailNot… ) This message was sent by Atlassian Jira (v1001.0.0-SNAPSHOT#100211- sha1:6a2b617 )
Robbie VanVossen ( https://sel4.atlassian.net/secure/ViewProfile.jspa?accountId=557058%3Ab7eb5… ) *updated* an issue
RFCs ( https://sel4.atlassian.net/browse/RFC?atlOrigin=eyJpIjoiOTY3ZDUzM2E0NDNlNDJ… ) / RFC ( https://sel4.atlassian.net/browse/RFC-9?atlOrigin=eyJpIjoiOTY3ZDUzM2E0NDNlN… ) RFC-9 ( https://sel4.atlassian.net/browse/RFC-9?atlOrigin=eyJpIjoiOTY3ZDUzM2E0NDNlN… ) Add new capability for seL4 SMC Forwarding ( https://sel4.atlassian.net/browse/RFC-9?atlOrigin=eyJpIjoiOTY3ZDUzM2E0NDNlN… )
Change By: Robbie VanVossen ( https://sel4.atlassian.net/secure/ViewProfile.jspa?accountId=557058%3Ab7eb5… )
h1. Summary
This RFC proposes to add a new capability which certain threads can invoke so seL4 can make SMC calls on ARM platforms in EL2 (virtualized mode) on the thread’s behalf.
h1. Motivation
Currently, seL4 running in EL2 traps all SMC calls. Then seL4 decides what to do with these calls. Some SMC calls are emulated, such as PSCI, but most are just ignored. However, there may be some SMC calls that actually need to make it to the Supervisor, and this configuration does not allow that to happen. This RFC proposes a solution to this problem by giving threads a new badged, capability which allows them to forward SMC calls to seL4 so that the microkernel will make the actual SMC call on the thread's behalf.
h1. Guide-level explanation
This change will allow you to forward to the microkernel, emulate, or block SMC calls based on platform specific configuration. When an SMC call is made by a VM, the microkernel intercepts the call, which it does for all SMC calls made at EL1. It then calls {{handle_smc()}} for the corresponding VMM thread. That function will decode the specific SMC call and decide whether to block it, emulate it, or forward the call back into seL4. There are examples already for multiple platforms on emulation and blocking. The zynqmp platform will have an example of forwarding the SMC with the new API call, {{seL4_ARM_SMC_Call()}}.
h2. New concepts
h3. seL4_ARM_SMC_Call()
{{seL4_ARM_SMC_Call()}} is the new ObjectInvocation API call and can be called in two ways:
# A VM attempts to make an SMC call which traps into the microkernel. The microkernel informs the VMM’s handler and the VMM decides that the SMC call really needs to be made. The VMM then calls {{seL4_ARM_SMC_Call()}} with the SMC capability and all the arguments that the VM provided.
# Directly by constructing the SMC call from the thread with the SMC capability. This means that a trap isn’t needed.
Our expected use case has been using the first approach in the camkes-vm for VMM handlers on the behalf of its VM. I am not sure if there is a use case for the second approach, but it is certainly possible. In either case, the function still requires the caller to set all 8 registers, even if the specific SMC call uses less arguments.
h3. seL4_ARM_SMC
{{seL4_ARM_SMC}} is the new capability. It needs to be given to any threads that you expect to call {{seL4_ARM_SMC_Call()}}, otherwise it will be denied. We expect that in most use cases, it will be given to each VMM thread.
The capability is badged and the badge represents the SMC Function Identifier. Therefore, the specific SMC calls that each Thread can make is configurable and checked by seL4.
h2. For users
We plan to implement this configuration as the default on for users an example configuration of camkes-vm for the zynqmp platform. Due to WCET implication of this approach, it will not be the default for all platforms. It should only be enabled when actually needed for a specific platform or application. Without this change, Linux is unable to run as a guest on camkes-vm for the zynqmp platform.
If a user enables SMC forwarding and needs Here is an example for how to update enable SMC calls for some VMs in the configuration {{devices.camkes}} file implemented for their a platform , they can make changes to in {{ projects camkes-vm-apps}}
{code:none}vm0.allow_smc = true;
vm0.allowed_smc_functions = [
0xC2000001, / seL4_projects_libs / libsel4vmmplatsupport PM_GET_API_VERSION
0xC2000017 / src / plat PM_FPGA_GET_STATUS
];
vm1.allow_smc = true;
vm1.allowed_smc_functions = [
0xC2000001, / $KernelPlatform / devices PM_GET_API_VERSION
0xC200000D, / smc / PM_REQUEST_NODE
0xC200000E // PM_RELEASE_NODE
];
vm2. c allow_smc = false;
vm2.allow_smc_functions = [];{code } }
As you can see in this example, you can configure each VM to allow or not allow any SMC calls. For each VM that has SMC calls enabled, you then specify which SMC function calls they can make. These are integers that are defined per platform. As you can see, VMs can share the same calls and they can have different calls as well.
h1. Reference-level explanation
This implementation will meet the SMC Calling Convention specification from ARM ([ARM DEN 0028E v1.4|https://developer.arm.com/documentation/den0028/latest?_ga=2.116565828…]) for 32-bit SMC calls. 64-bit SMC calls are partially supported, but full support could be added later.
h2. seL4_ARM_SMC_Call()
{code:c}static int seL4_ARM_SMC_Call ( seL4_ARM_SMC _service,
seL4_ARM_SMCContext * smc_args,
seL4_ARM_SMCContext * smc_response
){code}
{noformat}Parameters
[in] _service Capability to allow threads to interact with SMC calls.
[in] smc_args The structure that has the provided arguments.
[out] smc_response The structure to capture the responses.
Returns{noformat}
h2. struct seL4_ARM_SMCContext
{code:c}typedef struct seL4_ARM_SMCContext_ {
/* register arguments */
seL4_Word x0, x1, x2, x3, x4, x5, x6, x7;
} seL4_ARM_SMCContext;{code}
h1. Drawbacks
The main drawback of this approach is that you are allowing a supervisor in the system past initialization. This means that the worst case execution time is no longer guaranteed for the system. Each SMC call can take an arbitrarily long amount of time. This means a thread can essentially lock up all of the cores for an unspecified amount of time. The core the SMC call is made on is obviously blocked because the supervisor will not return to seL4 until it completes its function. The other cores will become locked as soon as seL4 runs and tries to grab the Big Kernel Lock (BKL). One potential fix for the multicore issue is to release the BKL before making the SMC call. However, that is out of the scope of this RFC as it has a lot of implications. The current approach will be to have this config option off by default and add documentation about how enabling it affects WCET.
h1. Rationale and alternatives
This design is the best because it gives users flexibility by allowing them to make some SMC calls while still being able to emulate other SMC calls. It is the preferred method, because the microkernel shouldn't need to be updated with each new platform, VMM configuration, and secure application implementation. It can all be configured in user space.
h2. Alternative approaches
SMC filtering could be done in the microkernel so that developers are less likely to configure the system incorrectly. However, there are some cases where the filtering is dependent on specific threads and the application level (Secure world communication) instead of just platform dependent. On the first point, instead of creating a cap for making the SMC call, you could have a cap for each numbered SMC call so that you could configure what SMC call each thread can make using the root thread. However, that is quite a bit of work as that will differ for each ARM platform and adds another step when porting the microkernel to new ARM platforms. That could also solve the second point, but then users would likely need to modify the microkernel to support their specific secure world application, which is something that should really be avoided.
Another part of the implementation that could be done differently is using the VCPU capability with an access right instead of creating a new type of capability. The kinds of access rights are already determined as read, write, grant, and grantreply. A write access right would probably make the most sense, but it still seems obtuse that a write access on a VCPU capability would allow the invoker to make an SMC call. Besides that, there are two other issues I see with this approach:
# I can foresee a use-case where you have a health monitor component that isn’t associated with a VCPU but needs to be able to forward an SMC call to actually interact with EL3 or secure applications.
# If the VCPU capability ever needs read/write access rights that make more sense with the type of object it is, for example, limiting which threads can actually read or write that VCPUs registers.
For those reasons we believe a new capability makes the most sense.
h2. Impact of not implementing this
Not implementing this will limit the ARM platforms we can use for the camkes-vm and/or the guests that can run on those platforms. It will also limit support for applications that run in secure side of the trustzone that need to communicate with the nonsecure side (Where sel4, VMMs, and the VMs run).
h1. Prior art
Xen has a similar implementation for the Zynqmp where arbitration is done to allow/disallow/emulate SMC calls from guest VMs. The arbitration is quite specific and is automatically generated based on resource allocation for each VM. Xen does not have VMM pieces that run in Userspace. Basically all of the management runs in EL2. Therefore, all of the arbitration is done in Xen itself. This approach does not make sense for seL4 for the reasons stated above in the [#rationale|#rationale] section.
However, this does show a need for this feature in hypervisors on ARMv8 platforms.
If necessary, we could reference the Xen arbitration code to help improve the default arbitration for the zynqmp in our implementation.
h1. Unresolved questions
# Should this implementation be configurable?
## I would argue it makes the most sense to have it automatically enabled if you are using the VMM on ARM and {{KernelAllowSMCCalls}} is disabled. Then your default whitelist in the SMC handler, for most platforms, can be to allow no SMC calls through, which would mean the default configuration would essentially be the same as the second configuration that is defined in the motivation section. However, I am wondering if there is a case where you still just want to disable it completely?
# Should the microkernel have an overriding blacklist?
## How and who decides what the blacklist is?
## Is it configurable?
## I don't think this question blocks the RFC as the blacklist could be implemented easily enough after the proposed implementation.
( https://sel4.atlassian.net/browse/RFC-9#add-comment?atlOrigin=eyJpIjoiOTY3Z… ) Add Comment ( https://sel4.atlassian.net/browse/RFC-9#add-comment?atlOrigin=eyJpIjoiOTY3Z… )
Get Jira notifications on your phone! Download the Jira Cloud app for Android ( https://play.google.com/store/apps/details?id=com.atlassian.android.jira.co… ) or iOS ( https://itunes.apple.com/app/apple-store/id1006972087?pt=696495&ct=EmailNot… ) This message was sent by Atlassian Jira (v1001.0.0-SNAPSHOT#100211- sha1:6a2b617 )
Robbie VanVossen ( https://sel4.atlassian.net/secure/ViewProfile.jspa?accountId=557058%3Ab7eb5… ) *updated* an issue
RFCs ( https://sel4.atlassian.net/browse/RFC?atlOrigin=eyJpIjoiZjk5MTU0ODY3ODdhNDJ… ) / RFC ( https://sel4.atlassian.net/browse/RFC-9?atlOrigin=eyJpIjoiZjk5MTU0ODY3ODdhN… ) RFC-9 ( https://sel4.atlassian.net/browse/RFC-9?atlOrigin=eyJpIjoiZjk5MTU0ODY3ODdhN… ) Add new capability for seL4 SMC Forwarding ( https://sel4.atlassian.net/browse/RFC-9?atlOrigin=eyJpIjoiZjk5MTU0ODY3ODdhN… )
Change By: Robbie VanVossen ( https://sel4.atlassian.net/secure/ViewProfile.jspa?accountId=557058%3Ab7eb5… )
h1. Summary
This RFC proposes to add a new capability which certain threads can invoke so seL4 can make SMC calls on ARM platforms in EL2 (virtualized mode) on the thread’s behalf.
h1. Motivation
There are currently two ways to handle SMC calls when using Currently, seL4 as a hypervisor on ARM.
# You can configure the system so that any thread running at EL1/0 can directly make in EL2 traps all SMC calls. (Configure by enabling {{KernelAllowSMCCalls}} for the microkernel)
# You can configure the system so that SMC calls are trapped into Then seL4. This allows seL4 to decide decides what to do with the these calls. In the current implementation
Both approaches leave something to be desired. The first option has various security implications and is essentially unprovable as it would allow any thread to make secure calls that could put the system in a bad state.
The second option is more secure and allows you to emulate some Some SMC calls are emulated , such as PSCI, however but most are just ignored. However , there may be some SMC calls that do actually need to be made on behalf of a thread make it to the Supervisor , and this configuration does not allow that to happen.
We want This RFC proposes a solution to essentially combine the two approaches this problem by configuring the system for the second approach, and then allowing giving threads that have a new badged, capability which allows them to forward SMC calls to seL4 so that seL4 the microkernel will make the actual SMC call on the thread's behalf.
h1. Guide-level explanation
This change will allow you to forward to the microkernel, emulate, or block SMC calls based on platform specific configuration. When an SMC call is made by a VM, the microkernel intercepts the call , which it does for all SMC calls made at EL1. It then calls {{handle_smc()}} for the corresponding VMM thread. That function will decode the specific SMC call and decide whether to block it, emulate it, or forward the call back into seL4. There are examples already for multiple platforms on emulation and blocking. The zynqmp platform will have an example of forwarding the SMC with the new API call, {{ seL4_ARM_SMC_Forward seL4_ARM_SMC_Call ()}}.
h2. New concepts
h3. seL4_ARM_SMC_Forward seL4_ARM_SMC_Call ()
{{ seL4_ARM_SMC_Forward seL4_ARM_SMC_Call ()}} is the new ObjectInvocation API call and can be called in two ways:
# A VM attempts to make an SMC call which traps into the microkernel. The microkernel informs the VMM’s handler and the VMM decides that the SMC call really needs to be made. The VMM then calls {{ seL4_ARM_SMC_Forward seL4_ARM_SMC_Call ()}} with the SMC capability and all the arguments that the VM provided.
# Directly by constructing the SMC call from the thread with the SMC capability. This means that a trap isn’t needed.
Our expected use case has been using the first approach in the camkes-vm for VMM handlers on the behalf of its VM. I am not sure if there is a use case for the second approach, but it is certainly possible. In either case, the function still requires the caller to set all 8 registers, even if the specific SMC call uses less arguments.
h3. seL4_ARM_SMC
{{seL4_ARM_SMC}} is the new capability. It needs to be given to any threads that you expect to call {{ seL4_ARM_SMC_Forward seL4_ARM_SMC_Call ()}}, otherwise it will be denied. In We expect that in most use cases, it will be given to each VMM thread.
The capability is badged and the badge represents the SMC Function Identifier. Therefore, the specific SMC calls that each Thread can make is configurable and checked by seL4.
h2. For users
We plan to implement this configuration as the default for users of camkes-vm for ARM platforms the zynqmp platform. However Due to WCET implication of this approach , it should be quite seamless. We will implement a not be the default configuration for each platform that denies forwarding on any SMC calls so that functionality will remain the same all platforms. The exception will It should only be enabled when actually needed for the zynqmp platform, which up until now has not been a supported specific platform for the camkes-vm or application. This was actually due to the fact that Without this platform required {{KernelAllowSMCCalls}} to be enabled to get change, Linux working is unable to run as a VM. This change will fix that issue so that guest on camkes-vm for the zynqmp will be a valid platform. This platform will have a different default, which allows specific SMC calls to be forwarded on so that Linux guests will boot correctly.
If a user enables SMC forwarding and needs to update the configuration for their platform, they can make changes to {{projects/seL4_projects_libs/libsel4vmmplatsupport/src/plat/$KernelPlatform/devices/smc.c}}.
h1. Reference-level explanation
This implementation will meet the SMC Calling Convention specification from ARM ([ARM DEN 0028E v1.4|https://developer.arm.com/documentation/den0028/latest?_ga=2.116565828…]) for 32-bit SMC calls. 64-bit SMC calls are partially supported, but full support could be added later.
h2. seL4_ARM_SMC_Forward seL4_ARM_SMC_Call ()
{code:c}
static int seL4_ARM_SMC_Forward seL4_ARM_SMC_Call ( seL4_ARM_SMC _service,
seL4_ARM_SMCContext * smc_args,
seL4_ARM_SMCContext * smc_response
)
{code}
{noformat}
Parameters
[in] _service Capability to allow threads to interact with SMC calls.
[in] smc_args The structure that has the provided arguments.
[out] smc_response The structure to capture the responses.
Returns
{noformat}
h2. struct seL4_ARM_SMCContext
{code:c}
typedef struct seL4_ARM_SMCContext_ {
/* register arguments */
seL4_Word x0, x1, x2, x3, x4, x5, x6, x7;
} seL4_ARM_SMCContext;
{code}
h1. Drawbacks
The main drawback of this approach is that you are allowing the user to make the decision on which SMC calls should go through or not. This will require a fairly advanced configuration and allows supervisor in the user to open themselves up to system -level vulnerabilities past initialization. However, I think This means that the need worst case execution time is no longer guaranteed for the feature outweighs the risk system. Also, to help mitigate the risk in the camkes-vm, I think we Each SMC call can setup take an arbitrarily long amount of time. This means a default configuration that allows as few SMC calls as possible thread can essentially lock up all of the cores for most platforms an unspecified amount of time. We have done some testing with Linux The core the SMC call is made on is obviously blocked because the zynqmp platform supervisor will not return to seL4 until it completes its function. The other cores will become locked as soon as seL4 runs and have identified tries to grab the minimum SMC calls we need Big Kernel Lock (BKL). One potential fix for the VMM's multicore issue is to support just to boot release the VMs BKL before making the SMC call. So However, that platform configuration will be more permissive but will meet is out of the needs scope of this RFC as it has a Linux guest VM lot of implications.
One other way The current approach will be to mitigate have this issue is identifying some SMC calls that should never be made config option off by a thread default and add documentation about how enabling it affects WCET. Then create a platform-dependent blacklist in the microkernel that would disallow certain SMC calls even from threads that have the SMC capability.
h1. Rationale and alternatives
{anchor:rationale}
This design is the best because it gives users flexibility by allowing them to make some SMC calls while still being able to emulate other SMC calls. It is the preferred method, because the microkernel shouldn't need to be updated with each new platform, VMM configuration, and secure application implementation. It can all be configured in user space.
h2. Alternative approaches
SMC filtering could be done in the microkernel so that developers are less likely to configure the system incorrectly. However, there are some cases where the filtering is dependent on specific threads and the application level (Secure world communication) instead of just platform dependent. On the first point, instead of creating a cap for making the SMC call, you could have a cap for each numbered SMC call so that you could configure what SMC call each thread can make using the root thread. However, that is quite a bit of work as that will differ for each ARM platform and adds another step when porting the microkernel to new ARM platforms. That could also solve the second point, but then users would likely need to modify the microkernel to support their specific secure world application, which is something that should really be avoided.
Another part of the implementation that could be done differently is using the VCPU capability with an access right instead of creating a new type of capability. The kinds of access rights are already determined as read, write, grant, and grantreply. A write access right would probably make the most sense, but it still seems obtuse that a write access on a VCPU capability would allow the invoker to make an SMC call. Besides that, there are two other issues I see with this approach:
# I can foresee a use-case where you have a health monitor component that isn’t associated with a VCPU but needs to be able to forward an SMC call to actually interact with EL3 or secure applications.
# If the VCPU capability ever needs read/write access rights that make more sense with the type of object it is, for example, limiting which threads can actually read or write that VCPUs registers.
For those reasons we believe a new capability makes the most sense.
h2. Impact of not implementing this
Not implementing this will limit the ARM platforms we can use for the camkes-vm and/or the guests that can run on those platforms. It will also limit support for applications that run in secure side of the trustzone that need to communicate with the nonsecure side (Where sel4, VMMs, and the VMs run).
h1. Prior art
Xen has a similar implementation for the Zynqmp where arbitration is done to allow/disallow/emulate SMC calls from guest VMs. The arbitration is quite specific and is automatically generated based on resource allocation for each VM. Xen does not have VMM pieces that run in Userspace. Basically all of the management runs in EL2. Therefore, all of the arbitration is done in Xen itself. This approach does not make sense for seL4 for the reasons stated above in the [#rationale |#rationale ] section.
However, this does show a need for this feature in hypervisors on ARMv8 platforms.
If necessary, we could reference the Xen arbitration code to help improve the default arbitration for the zynqmp in our implementation.
h1. Unresolved questions
# Should this implementation be configurable?
## I would argue it makes the most sense to have it automatically enabled if you are using the VMM on ARM and {{KernelAllowSMCCalls}} is disabled. Then your default whitelist in the SMC handler, for most platforms, can be to allow no SMC calls through, which would mean the default configuration would essentially be the same as the second configuration that is defined in the motivation section. However, I am wondering if there is a case where you still just want to disable it completely?
# Should the microkernel have an overriding blacklist?
## How and who decides what the blacklist is?
## Is it configurable?
## I don't think this question blocks the RFC as the blacklist could be implemented easily enough after the proposed implementation.
( https://sel4.atlassian.net/browse/RFC-9#add-comment?atlOrigin=eyJpIjoiZjk5M… ) Add Comment ( https://sel4.atlassian.net/browse/RFC-9#add-comment?atlOrigin=eyJpIjoiZjk5M… )
Get Jira notifications on your phone! Download the Jira Cloud app for Android ( https://play.google.com/store/apps/details?id=com.atlassian.android.jira.co… ) or iOS ( https://itunes.apple.com/app/apple-store/id1006972087?pt=696495&ct=EmailNot… ) This message was sent by Atlassian Jira (v1001.0.0-SNAPSHOT#100211- sha1:6a2b617 )