Matthew Brecknell ( https://sel4.atlassian.net/secure/ViewProfile.jspa?accountId=557058%3Ad9d2f… ) *created* an issue
RFCs ( https://sel4.atlassian.net/browse/RFC?atlOrigin=eyJpIjoiMjk0NWJlYzljZjFiNGF… ) / RFC ( https://sel4.atlassian.net/browse/RFC-13?atlOrigin=eyJpIjoiMjk0NWJlYzljZjFi… ) RFC-13 ( https://sel4.atlassian.net/browse/RFC-13?atlOrigin=eyJpIjoiMjk0NWJlYzljZjFi… ) MCS: Improve constraints on grant via reply ( https://sel4.atlassian.net/browse/RFC-13?atlOrigin=eyJpIjoiMjk0NWJlYzljZjFi… )
Issue Type: RFC Assignee: Matthew Brecknell ( https://sel4.atlassian.net/secure/ViewProfile.jspa?accountId=557058%3Ad9d2f… ) Created: 04/Nov/22 10:46 AM Reporter: Matthew Brecknell ( https://sel4.atlassian.net/secure/ViewProfile.jspa?accountId=557058%3Ad9d2f… )
-------
Summary
-------
In MCS configurations, the ability to grant capabilities when replying to a call is determined entirely by the grant right on the reply capability. If a server can create reply objects by retyping untyped memory, then it can also grant capabilities to its clients. Consequently, many realistic systems would not have the authority confinement property that we would expect.
This RFC proposes two alternatives:
* Both alternatives restrict the server's ability to grant via reply according to whether or not the grant right is present on the receive endpoint capability at the time the server called seL4_Recv.
* The first alternative ( https://github.com/seL4/seL4/pull/874 ) keeps the grant bit on reply object capabilities. Servers that want to grant via reply must have grant rights on both the receive endpoint capability and the reply object capability.
* The second alternative ( https://github.com/seL4/seL4/pull/945 ) removes the grant bit from reply object capabilities. Servers that want to grant via reply only need a grant right on the receive endpoint, but can't make reply object capabilities that mask those grant rights.
----------
Motivation
----------
The seL4 integrity theorems ( https://trustworthy.systems/publications/nictaabstracts/Sewell_WGMAK_11.abs… ) show that seL4, in non-MCS configurations, can enforce access control policies. For example, one can construct a system in which untrusted client and server components can communicate through an endpoint, but cannot write to each other's private memory.
The theorems are conditional: They require system setups that constrain the propagation of authority. In particular, grant rights are too powerful for the access control model, so one of the conditions is that untrusted components cannot grant or be granted capabilities.
We would like to prove similar integrity theorems for MCS configurations, and therefore need appropriate mechanisms to limit capability grant rights between untrusted components. Currently, the only way to prevent an MCS server granting to its clients is to prevent it from retyping untyped memory. That's too severe a limitation. It's also an indirect mechanism, which would be difficult to express in an access control model.
This RFC proposes a more direct mechanism for limiting an MCS server's ability to grant to its clients, and should provide a clearer path to useful integrity theorems for MCS configurations.
-----------------------
Guide-level explanation
-----------------------
We give a brief account of the history that led to the current situation.
In the non-MCS kernel, there are no reply objects. Reply capabilities are generated by the kernel when a call is received through an endpoint, and inserted into a slot in the receiver's TCB. The kernel sets the grant right on the reply capability iff the grant right was set on the endpoint cap the receiver used when it called seL4_Recv.
The intention of the design is that the receiver's endpoint capability controls the receiver's ability to grant capabilities when it replies. The implementation is the grant right on the reply capability, but the kernel ensures that the grant right on the generated reply capability is determined by the grant right on the receiver's endpoint capability. The intention is important, because endpoint rights are crucial to the ability to construct systems with useful access control properties.
MCS changed the object-capability model, introducing reply objects and scheduling contexts. Reply objects make protected procedure call stacks more explicit, allowing scheduling contexts to be passed from caller to receiver and back again, potentially recursively. Reply object capabilities are supplied by the receiver, and the kernel internally links the reply object with the calling thread when a call is received.
With respect to grant rights for replies, MCS reused parts of the implementation from the non-MCS kernel, but did not capture its intention. In both MCS and non-MCS, the ability to grant when replying is controlled by the grant bit on the reply capability. However, the way that bit is set is very different. In MCS, the grant right on a reply object capability is typically controlled by the receiver, if it can create reply objects. The grant right on the receiver's endpoint capability has no effect. Consequently, it is harder to construct MCS systems with useful access control properties.
In this RFC, we propose changes that limit an MCS receiver's ability to grant when replying, according to the rights on the receiver's endpoint capability. The kernel propagates the endpoint grant right via the reply object.
---------------------------
Reference-level explanation
---------------------------
We propose to add a boolean canGrant field to the MCS reply object. It is controlled by the kernel, and is only meaningful when the reply object is associated with either a receiver waiting for a caller, or a caller waiting for a reply. In those cases, the canGrant field indicates whether the capCanGrant bit was set on the receiver's endpoint capability when it called seL4_Recv or seL4_ReplyRecv.
Regardless of whether a caller is waiting on the endpoint, the canGrant field of the reply object is set in receiveIPC , which is part of either seL4_Recv or seL4_ReplyRecv. In receiveIPC , the kernel has access to both the receiver's endpoint capability and its reply object capability, so it can directly copy the capCanGrant flag from the receive endpoint capability to the canGrant field of the reply object.
If there is already a caller waiting, then the send phase of the call completes immediately, and the caller becomes blocked on the reply object.
If the receiver has to block waiting for a caller, then either:
* The receiver's IPC is cancelled before a caller arrives, and the reply object becomes dissociated from the receiver. In this case, the canGrant value that was set on the reply object is never used.
* The receiver remains blocked until a caller arrives, and the canGrant field of the reply object still reflects the capCanGrant bit that was set on the receiver's endpoint capability. The send phase completes, and the caller becomes blocked on the reply object.
Therefore, regardless of how the caller becomes blocked on the reply object, the canGrant field of the reply object reflects the capCanGrant bit on the endpoint capability used to receive that call. This is the property we'll need to prove a useful integrity theorem.
This RFC includes two alternatives:
* The first alternative ( https://github.com/seL4/seL4/pull/874 ) only makes the change just described. The grant bit on the reply object capability (which is separate from the canGrant field of the reply object) is retained, and acts as a mask against the grant right on the receiver's endpoint capability. Therefore, to grant via a reply, a receiver would need grant rights on both the receive endpoint capability and the reply object capability.
* The second alternative ( https://github.com/seL4/seL4/pull/945 ) makes the change just described, and additionally removes the grant bit on reply object capabilities. To grant via a reply, a receiver only needs a grant right on the receive endpoint, but can't make a reply object capability that masks a grant right conferred by the receive endpoint.
---------
Drawbacks
---------
The two alternatives are compared in the next section.
Both alternatives have the potential to break existing MCS systems that include threads that:
* receive calls on endpoint capabilities without grant rights, yet
* expect to be able to grant capabilities when replying.
Any such systems would need to be updated to add grant rights to the receive endpoint capabilities.
Unfortunately, the breakage will only become evident when the caller tries to the use the capabilities it should have been granted.
A mitigating factor is that non-MCS systems already require the grant bit to be set appropriately on receive endpoint capabilities, so systems ported from non-MCS to MCS should not be affected.
--------------------------
Rationale and alternatives
--------------------------
We've already covered the rationale, so it only remains to compare the two alternatives.
Retaining the grant bit in reply object capabilities
----------------------------------------------------
The first alternative ( https://github.com/seL4/seL4/pull/874 ) retains the grant bit in the reply object capability. The code change is minimal. Any existing MCS systems that make use of the ability to mask grant rights via the reply object capability will continue to enjoy that ability, so the change does not give any thread permissions it did not already have.
Removing the grant bit in reply object capabilities
---------------------------------------------------
The second alternative ( https://github.com/seL4/seL4/pull/945 ) removes the grant bit in the reply object capability. This results in a simpler implementation and API semantics, but removes a feature that might be used in some MCS systems. There is no corresponding feature in the non-MCS kernel ( https://github.com/seL4/seL4/pull/945#issuecomment-1302684909 ) , so the feature removal is only potentially relevant to MCS systems.
Unfortunately, the only reasonable way to remove the feature is silently. This means that an existing MCS system that currently uses the feature to mask the grant rights of reply object capabilities will continue to work (assuming the corresponding receive endpoint capabilities have grant rights), but might include threads which gain permissions that the system designer did not intend. That is, there might threads that did not previously have the right to grant capabilities when replying, that could gain that right when this alternative is implemented.
The rationale for this alternative is that we think it is unlikely that there are any such systems. We are not aware of any real use cases, and have not been able to construct a hypothetical use case that is not more easily implemented without this feature ( https://github.com/seL4/seL4/pull/945#issuecomment-1301609528 ).
If there are any use cases, we would like to hear about them.
---------
Prior art
---------
The main inspiration for this change is the non-MCS version of seL4, which:
* already uses the grant bit on a receiver's endpoint capability to govern the receiver's ability to grant capabilities when replying.
* does not provide any facility to remove the grant right from a reply capability ( https://github.com/seL4/seL4/pull/945#issuecomment-1302684909 ).
--------------------
Unresolved questions
--------------------
Is there a realistic use case for a grant bit on the reply object capability, that can be used to mask a grant right conferred by the receive endpoint capability?
This is the main factor in choosing between the two alternatives. So far, the discussion on this issue has only shown a hypothetical use case ( https://github.com/seL4/seL4/pull/874#issuecomment-1174670248 ) , which did not survive scrutiny ( https://github.com/seL4/seL4/pull/945#issuecomment-1301609528 ) , and which for non-MCS systems, was based on a mistaken understanding of the API semantics ( https://github.com/seL4/seL4/pull/945#issuecomment-1302684909 ).
If you do use seL4_CNode_Mint to mask the grant rights of MCS reply object capabilities, we would like to hear from you.
( https://sel4.atlassian.net/browse/RFC-13#add-comment?atlOrigin=eyJpIjoiMjk0… ) Add Comment ( https://sel4.atlassian.net/browse/RFC-13#add-comment?atlOrigin=eyJpIjoiMjk0… )
Get Jira notifications on your phone! Download the Jira Cloud app for Android ( https://play.google.com/store/apps/details?id=com.atlassian.android.jira.co… ) or iOS ( https://itunes.apple.com/app/apple-store/id1006972087?pt=696495&ct=EmailNot… ) This message was sent by Atlassian Jira (v1001.0.0-SNAPSHOT#100210- sha1:21ade21 )
Lucy Parker ( https://sel4.atlassian.net/secure/ViewProfile.jspa?accountId=624efd15f6a269… ) *updated* an issue
RFCs ( https://sel4.atlassian.net/browse/RFC?atlOrigin=eyJpIjoiODg3ZDg0NzZkMjU0NDd… ) / RFC ( https://sel4.atlassian.net/browse/RFC-12?atlOrigin=eyJpIjoiODg3ZDg0NzZkMjU0… ) RFC-12 ( https://sel4.atlassian.net/browse/RFC-12?atlOrigin=eyJpIjoiODg3ZDg0NzZkMjU0… ) The seL4 Device Driver Framework ( https://sel4.atlassian.net/browse/RFC-12?atlOrigin=eyJpIjoiODg3ZDg0NzZkMjU0… )
Change By: Lucy Parker ( https://sel4.atlassian.net/secure/ViewProfile.jspa?accountId=624efd15f6a269… )
h1. Summary
The seL4 Device Driver Framework (sDDF) provides libraries, interfaces and protocols for writing/porting device drivers to run as performant user level programs on seL4. Specifically, the sDDF aims to:
* support a wide class of devices, including network, USB, graphics, storage, serial ports etc;
* achieve performance comparable to Linux in-kernel drivers, as measured by throughput, latency and CPU load, at least for latency-sensitive high-throughput devices (network cards and USB);
* be robust against defined threats;
* extend to a device virtualisation framework (sDVF) for sharing devices between virtual machines and native components on seL4;
* provide strong separation of concerns;
* enable the formal verification of its components.
h1. Motivation
The seL4 microkernel prescribes device drivers to run in user level. While this provides strong fault containment, it can lead to performance degradation due to the extra context switches required. Furthermore, writing/porting device drivers to seL4 is tedious and error prone. The sDDF aims to ease this development by defining a general device model that is performant, eliminates common causes of driver bugs and aids formal verification. It is based on an asynchronous transport mechanism that minimises overheads while keeping driver interfaces simple.
The goals of the sDDF are to:
* simplify drivers by making them single-threaded and event-based
* further simplify drivers by having them serve no purpose other than abstracting device hardware
* exclusively use lock-free shared data structures
* avoid any data copying in the driver
* minimise trust in the driver by enabling configurations where I/O data is not mapped in the driver's address space
h1. Guide-level explanation
The sDDF will consist of:
* a general device model that enables formal reasoning and simplifies implementation of performant user level device drivers.
* An asynchronous transport mechanism that provides a means of communication between a driver and other components in the system.
* User level libraries to ease development of I/O systems such as a network stack and file system.
* Sample device drivers.
The sDDF currently supports network interfaces on the seL4 Core Platform and provides a sample Gigabit Ethernet driver as well a test configuration where the driver's client consists of an IP stack (lwip) co-located with an echo server. The configuration outperforms an equivalent Linux configuration on the same hardware by a large margin.
h1. Reference-level explanation
For simplicity, the following explanation assumes a 2-component structure, comprising a device driver running in one address space, and a server utilising the driver's services in a separate address space. Multiple clients sharing a device will require a multiplexer component that implements a sharing policy, this leaves the driver focussed on hardware abstraction and largely policy-free. Sample multiplexers will be provided in the near future.
The transport layer consists of a number of shared memory regions, data structures and access protocols. The three memory regions are:
# The control region shared between the driver and server
# The metadata region, shared between the device and driver.
# The data region, shared between the device and server.
The device takes instructions from its metadata region, and reads/writes data to the data region. The data region can be considered as an array of I/O buffers. The server and driver communicate and share the buffers via the control region.
The control region consists of single producer, single consumer, lockless bounded queues implemented as ring buffers. There are two queues per direction. The transmit free (TxF) queue keeps track of buffers that are ready to be re-used for transmit. The transmit available (TxA) queue keeps track of buffers with data available for transmit. Likewise, the receive free (RxF) queue and receive free (RxA) queue store buffers to be re-used and with newly available data. The corresponding interfaces to dequeue and enqueue buffers are in libsharedringbuffer. This library can be easily expanded to include interfaces for request ring buffers to support storage device classes in the future.
The driver itself is event driven and acts in response to:
* Transmit requests from another component.
* Data available interrupts from the device.
* Transmit complete interrupts from the device.
* Receive buffers available signals from another component.
In this way, the driver can be a simple, single threaded component that reacts to the above events. The driver thread runs at highest priority to ensure the timely handling of interrupts as well as immediate response to server requests. However, the driver could be active (with a scheduling context bound to it’s TCB) or passive (with a scheduling context bound to its interrupt handling notification object). The case for active vs passive drivers is yet to be fully researched and likely depends on other component needs.
The active model uses seL4 notifications for all its interfaces. The notifications are used to signal updates to the shared memory regions. However, in the passive model, the server must instead hold a badged capability to the drivers endpoint in order to use IPC to signal updates to the control region.
The sDDF currently supports both driver models for a network device driver running on the iMX8MM hardware, with a combined echo client and IP stack (lwIP) operating as a server that communicate via the transport architecture as outlined above.
h1. Drawbacks
While the zero-copy transport architecture enables high performance, it implies that multiple clients would have to trust each other, which is unreasonable for most use cases where a device is shared by multiple clients. Such cases require a multiplexer that copies data as appropriate, or re-maps DMA buffers frequently. The trade-offs here are not yet fully understood and are likely dependent on the hardware platform and device class. More research is required to fully understand these issues, this research is planned for the next calendar year.
The sDDF is based on the newly developed seL4 Core Platform (seL4CP) and the MCS version of seL4. At this stage, the seL4CP is still immature and only supportx a small number of hardware platforms. Furthermore, the sDDF is still missing many components, including but not limited to:
* multiplexers;
* support for further device classes, particularly storage;
* support for other architectures/platforms;
* evaluation of more realistic use cases.
Finally, there are still many open research questions, including:
* What scenarios favour an active vs passive driver?
* What are appropriate budget/period configurations for a device driver servicing client/s with differing needs (eg. client-initiated I/O vs device-initiated, asymmetric traffic)?
* How will the design perform when distributed over multiple cores? The relatively fine-grained modularity, combined with the lock-free, asynchronous transport, lends itself to using multiple cores without requiring extra effort or even code, but we have not yet evaluated that.
h1. Rationale and alternatives
The intention of the sDDF is that utilising the design will ease development of user level device drivers on the seL4 Core Platform that achieve performance competitive with in kernel drivers. Alternatives presented to date do not achieve the level of performance we aim for, and generally result in more complex drivers, although some ease the integration of legacy drivers.
h1. Prior art
The sDDF was originally implemented on the CAmkES framework. However, the rigidity of that framework, and hidden overheads, made development difficult and competitive performance infeasible, leading to re-targeting to the seL4CP. The simplified, event driven model enforced by the Core Platform framework further eases the development of device drivers on seL4.
h1. Unresolved questions
There is still outstanding research to be done in order to completely satisfy the sDDF’s aims. These are outlined as drawbacks.
h1. Implementation
A proof of concept system can be found [here|https://github.com/lucypa/sDDF].
A more detailed design document can be found [here|https://trustworthy.systems/projects/TS/drivers/].
( https://sel4.atlassian.net/browse/RFC-12#add-comment?atlOrigin=eyJpIjoiODg3… ) Add Comment ( https://sel4.atlassian.net/browse/RFC-12#add-comment?atlOrigin=eyJpIjoiODg3… )
Get Jira notifications on your phone! Download the Jira Cloud app for Android ( https://play.google.com/store/apps/details?id=com.atlassian.android.jira.co… ) or iOS ( https://itunes.apple.com/app/apple-store/id1006972087?pt=696495&ct=EmailNot… ) This message was sent by Atlassian Jira (v1001.0.0-SNAPSHOT#100207- sha1:b857cee )
Corey Lewis ( https://sel4.atlassian.net/secure/ViewProfile.jspa?accountId=5f3c7e83323607… ) *created* an issue
RFCs ( https://sel4.atlassian.net/browse/RFC?atlOrigin=eyJpIjoiYTdkYmE0NmQxOTNlNGY… ) / RFC ( https://sel4.atlassian.net/browse/RFC-11?atlOrigin=eyJpIjoiYTdkYmE0NmQxOTNl… ) RFC-11 ( https://sel4.atlassian.net/browse/RFC-11?atlOrigin=eyJpIjoiYTdkYmE0NmQxOTNl… ) MCS: set fault and timeout handler parameters while configuring TCBs ( https://sel4.atlassian.net/browse/RFC-11?atlOrigin=eyJpIjoiYTdkYmE0NmQxOTNl… )
Issue Type: RFC Assignee: Corey Lewis ( https://sel4.atlassian.net/secure/ViewProfile.jspa?accountId=5f3c7e83323607… ) Created: 22/Jul/22 12:34 PM Reporter: Corey Lewis ( https://sel4.atlassian.net/secure/ViewProfile.jspa?accountId=5f3c7e83323607… )
*******
Summary
*******
This RFC proposes to modify TCB configuration functions so that it is possible to directly set the badge and rights of the fault and timeout handler Endpoint Caps. This would make it consistent with how the CSpace and VSpace of threads are set.
**********
Motivation
**********
This change is motivated by wanting to simplify the process of configuring fault and timeout handlers when using a system with the MCS extensions enabled. Without this change, setting the badge or rights of a fault handler requires first minting a new copy of the Endpoint Cap with the desired parameters, and then installing it in the TCB. In many systems this additional copy of the cap ends up being unnecessary.
A longer term motivation is that we would like to port how fault handlers are configured on MCS to mainline seL4. This would resolve https://sel4.atlassian.net/browse/SELFOUR-29 , improve consistency, and provide increased functionality to systems that do not use MCS. However, this next step would be very difficult without this change, due to technicalities in the existing verification approach of the CapDL Loader.
***********************
Guide-level explanation
***********************
On systems with the MCS extensions enabled, caps to a TCB's fault and timeout handler endpoints are installed directly in the TCB. This change adds two new parameters to the relevant API calls used to install these caps (TCBSetSchedParams, TCBSetTimeoutEndpoint, TCBSetSpace). These parameters can be used to set the badge and rights of the fault and timeout handler endpoints being installed. In cases where the badge and rights do not need to be set, the existing approach is still supported by setting both parameters to 0.
See https://github.com/seL4/capdl/pull/47 for an example of this new functionality being used. The CapDL Loader previously minted a new endpoint cap for the fault handler with the required badge and rights. With this change it no longer needs to mint the extra cap and instead seL4 sets the fields while installing the endpoint caps in the TCB.
https://github.com/seL4/sel4test/pull/78 is an example of updating existing code without using the new functionality. As the seL4 tests generally do not need to set the badge and rights, they are able to be updated for this change by just using seL4_NilData and seL4_NoRights as the parameters.
***************************
Reference-level explanation
***************************
The API calls TCBSetSchedParams, TCBSetTimeoutEndpoint and TCBSetSpace have each had two new parameters added, and their corresponding decode functions expect two additional syscall arguments. These arguments are used in the same pattern as used when setting the CSpace and VSpace of a TCB. When the parameters are non-zero then the provided cap is modified with updateCapData and maskCapRights, and then it is confirmed that it is safe to copy the capability by calling deriveCap. Following this, the decode functions then confirm that the modified cap is a valid cap through validFaultHandler. https://github.com/seL4/seL4/pull/889 is the current proposal for what these changes would look like.
In addition to the patch to the kernel, there are also patches for seL4_libs, sel4test, and capdl which show the required changes to migrate these projects to the new API. The patch to sel4test is an example of a minimal migration which does not use the new functionality, while the patch for capdl is a migration that does make use of it.
*********
Drawbacks
*********
On some platforms, the extra parameters that this change adds might exceed the hardware limit on the number of message registers, which would cause the syscall to go into the IPC buffer. In the worst case, if this does become an issue then it could possibly be avoided by modifying the API so that each syscall performs fewer operations and hence does not require as many arguments.
This implementation also means that seL4 will silently downgrade some incorrect capabilities to NullCaps if a user attempts to set them as a TCBs fault or timeout handler, instead of returning an error. This behaviour currently occurs when setting the CSpace and VSpace and is generally safe, although could cause confusion if it hides a user error. If this behaviour is not desired then we could separately check for this happening and explicitly return the error.
**************************
Rationale and alternatives
**************************
This design provides increased consistency within seL4, as it makes setting the fault and timeout handlers of TCBs exactly the same as setting their CSpace and VSpace.
An alternative design with a minor difference that was considered was combining the badge and right parameters into one word, in the same fashion as the seL4_CNode_CapData structure. While this would lead to a slightly simpler API due to only needing one parameter, on 64-bit platforms it would limit the badge to 60 bits instead of the full 64. Since existing users often make use of all available bits for the badges, this alternative was not chosen.
********************
Unresolved questions
********************
There are existing pull-requests that demonstrate the proposed changes.
https://github.com/seL4/seL4/pull/889https://github.com/seL4/seL4_libs/pull/65https://github.com/seL4/sel4test/pull/78https://github.com/seL4/capdl/pull/47
There is also some initial work on updating the seL4 proofs for this proposal.
https://github.com/seL4/l4v/pull/505
While initial verification efforts suggest that no further implementation changes are required to maintain verifiability, this work is not yet completed and progress might lead to issues being discovered.
( https://sel4.atlassian.net/browse/RFC-11#add-comment?atlOrigin=eyJpIjoiYTdk… ) Add Comment ( https://sel4.atlassian.net/browse/RFC-11#add-comment?atlOrigin=eyJpIjoiYTdk… )
Get Jira notifications on your phone! Download the Jira Cloud app for Android ( https://play.google.com/store/apps/details?id=com.atlassian.android.jira.co… ) or iOS ( https://itunes.apple.com/app/apple-store/id1006972087?pt=696495&ct=EmailNot… ) This message was sent by Atlassian Jira (v1001.0.0-SNAPSHOT#100202- sha1:0ba1f75 )
Kent McLeod ( https://sel4.atlassian.net/secure/ViewProfile.jspa?accountId=557058%3Ab1217… ) *created* an issue
RFCs ( https://sel4.atlassian.net/browse/RFC?atlOrigin=eyJpIjoiYzYwNzJhMTU1NjZhNDc… ) / RFC ( https://sel4.atlassian.net/browse/RFC-10?atlOrigin=eyJpIjoiYzYwNzJhMTU1NjZh… ) RFC-10 ( https://sel4.atlassian.net/browse/RFC-10?atlOrigin=eyJpIjoiYzYwNzJhMTU1NjZh… ) AArch64: remove VSpace object types, seL4_ARM_PageDirectory and seL4_ARM_PageUpperDirectory ( https://sel4.atlassian.net/browse/RFC-10?atlOrigin=eyJpIjoiYzYwNzJhMTU1NjZh… )
Issue Type: RFC Assignee: Kent McLeod ( https://sel4.atlassian.net/secure/ViewProfile.jspa?accountId=557058%3Ab1217… ) Created: 11/Mar/22 10:11 PM Reporter: Kent McLeod ( https://sel4.atlassian.net/secure/ViewProfile.jspa?accountId=557058%3Ab1217… )
*******
Summary
*******
This RFC proposes to remove two AArch64 VSpace object types, seL4_ARM_PageDirectory and seL4_ARM_PageUpperDirectory. The functionality previously provided by these types will be provided by the existing seL4_ARM_PageTable object type. This allows for a simpler API and enables a smaller kernel implementation that will be easier to verify.
**********
Motivation
**********
This change is motivated by an ongoing verification project to verify the AArch64 design and implementation. Unified PageTable object types was attempted during the RISCV64 verification project and it successfully reduced verification complexity. By applying the same design strategy to Aarch64, the verification can become easier.
The change is also motivated by an reducing complexity of the kernel user API where there are less object and capability types and less different invocation-object pairs while maintaining the same functionality. The ability to use PageTable objects for any level of the page-table hierarchy can also make virtual address space resource management easier as objects are more interchangeable.
***********************
Guide-level explanation
***********************
This change only affects the AArch64 architecture, but affects every configuration.
This change requires removing the seL4_ARM_PageDirectory and seL4_ARM_PageUpperDirectory object types. This means that it will no longer be possible to create these objects from Untyped capabilities. To make up for this, seL4_ARM_PageTable objects will be able to be mapped into an seL4_ARM_VSpace object or other seL4_ARM_PageTable objects.
This change will require modifications to existing user code. Every usage of an seL4_ARM_PageDirectory or seL4_ARM_PageUpperDirectory will need to be updated to instead use a seL4_PageTable. This update should be reasonably direct as all three object types have the same kind of invocations, are the same size, and contain the same number of table entries as each other.
***************************
Reference-level explanation
***************************
The entire sections of the seL4_ARM_PageDirectory and seL4_ARM_PageUpperDirectory interfaces have been removed from the libsel4 API XML interface definitions. The kernel will no longer recognize them as objects.
It's possible to redefine all of the deleted constants and symbols in the libsel4 implementation using CPP macro definitions in a deprecated.h header file. This can allow migration with minimal manual updating. Manual updates may still be required in places where the programming language used expects each object type to have a different number such as in a C `switch` statement.
In addition to a patch to the kernel, there is a patch to seL4_libs showing the minimal set of changes required to migrate the seL4test and seL4bench applications to the new API.
Internally, the kernel will use a single pte_t datatype to represent all page table entries in the VSpace implementation. This allows for considerably less code duplication for mapping and unmapping operations as well as for object creation and revocation. This new implementation follows the design taken with the RISC-V architecture.
*********
Drawbacks
*********
The main drawback for this change is that is API breaking - existing user space programs will require their sources and binaries to be updated to use the newer kernel version. However, this will only affect configurations that previously have not been verified which should already be expecting API breakages as part of a verification process.
**************************
Rationale and alternatives
**************************
As AArch64 will be the third 64-bit architecture to go through a verification process we are able to make decisions that are informed by the experiences of the x86_64 and RISCV64 projects. In particular, the reduction in effort that this type unification allows was directly seen when comparing the x86_64 project, which used a separate object type for each level of the page table hierarchy, to the RISCV64 project which uses a single type.
One change that has been made compared to the RISC-V design is using a different type for the seL4_ARM_VSpace object. This object serves as the root page table of the address space. The AArch64 virtual memory architecture allows the top level page table to be sized differently depending on the size of the address space. This object also supports additional invocations that can't be applied to intermediate page table objects and it is also likely to gain more invocations in the future.
*********
Prior art
*********
Having a single page table object type is based on the design of the AArch64 virtual memory system architecture which sets out to reuse page table descriptor formats at all levels of the virtual address space.
********************
Unresolved questions
********************
There are existing pull-requests that show-case the proposed changes.
https://github.com/seL4/seL4/pull/801https://github.com/seL4/seL4_libs/pull/59
As always, verification progress can lead to design or implementation changes as bugs are found or verification blockers are encountered.
( https://sel4.atlassian.net/browse/RFC-10#add-comment?atlOrigin=eyJpIjoiYzYw… ) Add Comment ( https://sel4.atlassian.net/browse/RFC-10#add-comment?atlOrigin=eyJpIjoiYzYw… )
Get Jira notifications on your phone! Download the Jira Cloud app for Android ( https://play.google.com/store/apps/details?id=com.atlassian.android.jira.co… ) or iOS ( https://itunes.apple.com/app/apple-store/id1006972087?pt=696495&ct=EmailNot… ) This message was sent by Atlassian Jira (v1001.0.0-SNAPSHOT#100197- sha1:b96bc1b )
Robbie VanVossen ( https://sel4.atlassian.net/secure/ViewProfile.jspa?accountId=557058%3Ab7eb5… ) *created* an issue
RFCs ( https://sel4.atlassian.net/browse/RFC?atlOrigin=eyJpIjoiNzE4MDc2MTZkNDQ4NDc… ) / RFC ( https://sel4.atlassian.net/browse/RFC-9?atlOrigin=eyJpIjoiNzE4MDc2MTZkNDQ4N… ) RFC-9 ( https://sel4.atlassian.net/browse/RFC-9?atlOrigin=eyJpIjoiNzE4MDc2MTZkNDQ4N… ) Add new capability for seL4 SMC Forwarding ( https://sel4.atlassian.net/browse/RFC-9?atlOrigin=eyJpIjoiNzE4MDc2MTZkNDQ4N… )
Issue Type: RFC Assignee: Robbie VanVossen ( https://sel4.atlassian.net/secure/ViewProfile.jspa?accountId=557058%3Ab7eb5… ) Created: 06/Nov/21 6:31 AM Labels: arm-hyp new-feature Reporter: Robbie VanVossen ( https://sel4.atlassian.net/secure/ViewProfile.jspa?accountId=557058%3Ab7eb5… )
*******
Summary
*******
This RFC proposes to add a new capability which certain threads can invoke so seL4 can make SMC calls on ARM platforms in EL2 (virtualized mode) on the thread’s behalf.
**********
Motivation
**********
There are currently two ways to handle SMC calls when using seL4 as a hypervisor on ARM.
* You can configure the system so that any thread running at EL1/0 can directly make SMC calls. (Configure by enabling KernelAllowSMCCalls for the microkernel)
* You can configure the system so that SMC calls are trapped into seL4. This allows seL4 to decide what to do with the calls. In the current implementation
Both approaches leave something to be desired. The first option has various security implications and is essentially unprovable as it would allow any thread to make secure calls that could put the system in a bad state.
The second option is more secure and allows you to emulate some SMC calls, such as PSCI, however, there may be some SMC calls that do need to be made on behalf of a thread, and this configuration does not allow that to happen.
We want to essentially combine the two approaches by configuring the system for the second approach, and then allowing threads that have a new capability to forward SMC calls so that seL4 will make the actual SMC call on the thread's behalf.
***********************
Guide-level explanation
***********************
This change will allow you to forward to the microkernel, emulate, or block SMC calls based on platform specific configuration. When an SMC call is made by a VM, the microkernel intercepts the call. It then calls handle_smc() for the corresponding VMM thread. That function will decode the specific SMC call and decide whether to block it, emulate it, or forward the call back into seL4. There are examples already for multiple platforms on emulation and blocking. The zynqmp platform will have an example of forwarding the SMC with the new API call, seL4_ARM_SMC_Forward().
------------
New concepts
------------
seL4_ARM_SMC_Forward()
----------------------
seL4_ARM_SMC_Forward() is the new ObjectInvocation API call and can be called in two ways:
* A VM attempts to make an SMC call which traps into the microkernel. The microkernel informs the VMM’s handler and the VMM decides that the SMC call really needs to be made. The VMM then calls seL4_ARM_SMC_Forward() with the SMC capability and all the arguments that the VM provided.
* Directly by constructing the SMC call from the thread with the SMC capability. This means that a trap isn’t needed.
Our expected use case has been using the first approach in the camkes-vm for VMM handlers on the behalf of its VM. I am not sure if there is a use case for the second approach, but it is certainly possible. In either case, the function still requires the caller to set all 8 registers, even if the specific SMC call uses less arguments.
seL4_ARM_SMC
------------
seL4_ARM_SMC is the new capability. It needs to be given to any threads that you expect to call seL4_ARM_SMC_Forward() , otherwise it will be denied. In most use cases, it will be given to each VMM thread.
---------
For users
---------
We plan to implement this configuration as the default for users of camkes-vm for ARM platforms. However, it should be quite seamless. We will implement a default configuration for each platform that denies forwarding on any SMC calls so that functionality will remain the same. The exception will be for the zynqmp platform, which up until now has not been a supported platform for the camkes-vm. This was actually due to the fact that this platform required KernelAllowSMCCalls to be enabled to get Linux working as a VM. This change will fix that issue so that zynqmp will be a valid platform. This platform will have a different default, which allows specific SMC calls to be forwarded on so that Linux guests will boot correctly.
If a user needs to update the configuration for their platform, they can make changes to projects/seL4_projects_libs/libsel4vmmplatsupport/src/plat/$KernelPlatform/devices/smc.c.
***************************
Reference-level explanation
***************************
This implementation will meet the SMC Calling Convention specification from ARM ( ARM DEN 0028E v1.4 ( https://developer.arm.com/documentation/den0028/latest?_ga=2.116565828.3903… ) ) for 32-bit SMC calls. 64-bit SMC calls are partially supported, but full support could be added later.
----------------------
seL4_ARM_SMC_Forward()
----------------------
static int seL4_ARM_SMC_Forward ( seL4_ARM_SMC _service,
seL4_ARM_SMCContext * smc_args,
seL4_ARM_SMCContext * smc_response
)
Parameters
[in] _service Capability to allow threads to interact with SMC calls.
[in] smc_args The structure that has the provided arguments.
[out] smc_response The structure to capture the responses.
Returns
--------------------------
struct seL4_ARM_SMCContext
--------------------------
typedef struct seL4_ARM_SMCContext_ {
/* register arguments */
seL4_Word x0, x1, x2, x3, x4, x5, x6, x7;
} seL4_ARM_SMCContext;
*********
Drawbacks
*********
The main drawback of this approach is that you are allowing the user to make the decision on which SMC calls should go through or not. This will require a fairly advanced configuration and allows the user to open themselves up to system-level vulnerabilities. However, I think the need for the feature outweighs the risk. Also, to help mitigate the risk in the camkes-vm, I think we can setup a default configuration that allows as few SMC calls as possible for most platforms. We have done some testing with Linux on the zynqmp platform and have identified the minimum SMC calls we need the VMM's to support just to boot the VMs. So that platform configuration will be more permissive but will meet the needs of a Linux guest VM.
One other way to mitigate this issue is identifying some SMC calls that should never be made by a thread. Then create a platform-dependent blacklist in the microkernel that would disallow certain SMC calls even from threads that have the SMC capability.
**************************
Rationale and alternatives
**************************
This design is the best because it gives users flexibility by allowing them to make some SMC calls while still being able to emulate other SMC calls. It is the preferred method, because the microkernel shouldn't need to be updated with each new platform, VMM configuration, and secure application implementation. It can all be configured in user space.
----------------------
Alternative approaches
----------------------
SMC filtering could be done in the microkernel so that developers are less likely to configure the system incorrectly. However, there are some cases where the filtering is dependent on specific threads and the application level (Secure world communication) instead of just platform dependent. On the first point, instead of creating a cap for making the SMC call, you could have a cap for each numbered SMC call so that you could configure what SMC call each thread can make using the root thread. However, that is quite a bit of work as that will differ for each ARM platform and adds another step when porting the microkernel to new ARM platforms. That could also solve the second point, but then users would likely need to modify the microkernel to support their specific secure world application, which is something that should really be avoided.
Another part of the implementation that could be done differently is using the VCPU capability with an access right instead of creating a new type of capability. The kinds of access rights are already determined as read, write, grant, and grantreply. A write access right would probably make the most sense, but it still seems obtuse that a write access on a VCPU capability would allow the invoker to make an SMC call. Besides that, there are two other issues I see with this approach:
* I can foresee a use-case where you have a health monitor component that isn’t associated with a VCPU but needs to be able to forward an SMC call to actually interact with EL3 or secure applications.
* If the VCPU capability ever needs read/write access rights that make more sense with the type of object it is, for example, limiting which threads can actually read or write that VCPUs registers.
For those reasons we believe a new capability makes the most sense.
-------------------------------
Impact of not implementing this
-------------------------------
Not implementing this will limit the ARM platforms we can use for the camkes-vm and/or the guests that can run on those platforms. It will also limit support for applications that run in secure side of the trustzone that need to communicate with the nonsecure side (Where sel4, VMMs, and the VMs run).
*********
Prior art
*********
Xen has a similar implementation for the Zynqmp where arbitration is done to allow/disallow/emulate SMC calls from guest VMs. The arbitration is quite specific and is automatically generated based on resource allocation for each VM. Xen does not have VMM pieces that run in Userspace. Basically all of the management runs in EL2. Therefore, all of the arbitration is done in Xen itself. This approach does not make sense for seL4 for the reasons stated above in the rationale ( #rationale ) section.
However, this does show a need for this feature in hypervisors on ARMv8 platforms.
If necessary, we could reference the Xen arbitration code to help improve the default arbitration for the zynqmp in our implementation.
********************
Unresolved questions
********************
* Should this implementation be configurable? * I would argue it makes the most sense to have it automatically enabled if you are using the VMM on ARM and KernelAllowSMCCalls is disabled. Then your default whitelist in the SMC handler, for most platforms, can be to allow no SMC calls through, which would mean the default configuration would essentially be the same as the second configuration that is defined in the motivation section. However, I am wondering if there is a case where you still just want to disable it completely?
* Should the microkernel have an overriding blacklist? * How and who decides what the blacklist is?
* Is it configurable?
* I don't think this question blocks the RFC as the blacklist could be implemented easily enough after the proposed implementation.
( https://sel4.atlassian.net/browse/RFC-9#add-comment?atlOrigin=eyJpIjoiNzE4M… ) Add Comment ( https://sel4.atlassian.net/browse/RFC-9#add-comment?atlOrigin=eyJpIjoiNzE4M… )
Get Jira notifications on your phone! Download the Jira Cloud app for Android ( https://play.google.com/store/apps/details?id=com.atlassian.android.jira.co… ) or iOS ( https://itunes.apple.com/app/apple-store/id1006972087?pt=696495&ct=EmailNot… ) This message was sent by Atlassian Jira (v1001.0.0-SNAPSHOT#100183- sha1:8e458bc )
Matthew Brecknell ( https://sel4.atlassian.net/secure/ViewProfile.jspa?accountId=557058%3Ad9d2f… ) *updated* an issue
RFCs ( https://sel4.atlassian.net/browse/RFC?atlOrigin=eyJpIjoiZDhlODg3ZmQwYWMxNDA… ) / RFC ( https://sel4.atlassian.net/browse/RFC-8?atlOrigin=eyJpIjoiZDhlODg3ZmQwYWMxN… ) RFC-8 ( https://sel4.atlassian.net/browse/RFC-8?atlOrigin=eyJpIjoiZDhlODg3ZmQwYWMxN… ) Remove support for ARMv6, i.MX31 and KZM ( https://sel4.atlassian.net/browse/RFC-8?atlOrigin=eyJpIjoiZDhlODg3ZmQwYWMxN… )
Change By: Matthew Brecknell ( https://sel4.atlassian.net/secure/ViewProfile.jspa?accountId=557058%3Ad9d2f… )
h1. Summary
This RFC proposes to remove seL4 support for KMC's KZM evaluation board, which used the freescale (now NXP) i.MX31 processor with an ARM1136 core. Since this is the only ARMv6 platform supported by seL4, this proposal would also remove support for ARMv6.
h1. Motivation
The KZM board has been unavailable for purchase for some years. The i.MX31 processor exited NXP's product longevity program in 2015. We previously asked the seL4 community \[[1|https://sel4.discourse.group/t/should-we-continue-to-support-armv6-and-kzm-imx31/46],[2|https://lists.sel4.systems/hyperkitty/list/devel@sel4.systems/thread/EQ27WYPJIN3KKRGB4VNXUZKJYL65IZ6J/],[3|https://github.com/seL4/seL4/issues/578]] if there are current users of the KZM platform. From the responses, we believe there is already consensus that there is little value in continuing support for KZM.
With no known users, we are motivated to eliminate the overhead of maintaining support for KZM. Unlike all other platforms supported by seL4, KZM lacks sufficient registers for thread-local identifiers such as TLS_BASE and the IPC buffer address. For KZM alone, seL4 implements a workaround by mapping a special frame into each thread (the globals frame) to store these identifiers.
As an example of the cost of maintaining support for KZM, the globals frame currently makes it difficult to uniformly describe the address space available to user-space programs.
Additionally, the KZM board the seL4 Foundation has used for testing KZM support is no longer reliable, so we are currently only able to test using QEMU simulation. Although KZM was the original seL4 verification target, KZM support has not been verified since early 2016.
Since we believe there is already consensus around the removal of KZM support, this RFC can be considered a formal last call for objections.
h1. Guide-level explanation
We propose to remove from the seL4:
* seL4 kernel build targets for the KZM board, i.MX31 processor, and ARMv6 architecture;
* any platform-specific code that is only required for those build targets, including drivers platform-specific definitions , drivers and device-tree specifications.
Prior to removing support from seL4, we will remove code from other repositories that depend on seL4 support for KZM and ARMv6 , including tests, benchmarks, and tools such as CAmkES.
We propose to update the documentation on the [ docs.sel4.systems |https://docs.sel4.systems/Hardware/] website, to show that the KZM board is no longer supported, and to show the last seL4 release version that included support.
The impact of this change is that any current users of seL4 on the KZM board would not be able to use new releases of seL4 on the KZM board. They would most likely need to freeze the version of seL4 they use, and would therefore not be able to expect the usual level of support from the seL4 community.
Since we are not aware of any users, we do not expect significant impact.
h1. Reference-level explanation
We have drafted a pull request to remove seL4 support for KZM, i.MX31 and ARMv6:
* [https://github.com/seL4/seL4/pull/580|https://github.com/seL4/seL4/pull/580]
A number of related pull requests remove references from related repositories. Some of these are already merged, since the changes they make do not require an RFC process:
* [https://github.com/seL4/seL4_libs/pull/46|https://github.com/seL4/seL4_libs…]
* [https://github.com/seL4/sel4test/pull/51|https://github.com/seL4/sel4test/p…]
* [https://github.com/seL4/sel4runtime/pull/13|https://github.com/seL4/sel4run…]
* [https://github.com/seL4/sel4bench/pull/7|https://github.com/seL4/sel4bench/…]
* [https://github.com/seL4/camkes-tool/pull/92|https://github.com/seL4/camkes-…]
* [https://github.com/seL4/seL4_projects_libs/pull/22|https://github.com/seL4/…]
h1. Drawbacks
Since the KZM board is the only ARMv6-based platform supported by seL4, this proposal implies the removal of ARMv6, even though later ARMv6 boards might would not require the same workaround as the KZM board. Removing ARMv6 will make it slightly harder to add a new ARMv6-based platform in the future. But it would not be prohibitively difficult, since the main difference between ARMv6 and ARMv7 support in seL4 is the workaround required for KZM.
In any case, we believe that new ARM-based appliations applications of seL4 will most likely use ARMv7 or later, so the benefits of removing KZM support currently outweigh the drawbacks.
h1. Rationale and alternatives
We've already addressed the rationale for removing KZM support.
One might ask why it's necessary to remove all references to ARMv6 when we remove support for KZM, since retaining ARMv6 references might make it easier to support other ARMv6 boards in the future. However, we do not think it makes sense to retain references to ARMv6 without a specific ARMv6 board to exercise them, and without a clear demand for ARMv6 support. Again, the cost of maintaining those references outweighs any clear benefit.
h1. Prior art
The Linux kernel removed support for the KZM board and i.MX31 processor in 2015.
h1. Unresolved questions
We are not aware of any unresolved questions, but welcome discussion on this RFC.
( https://sel4.atlassian.net/browse/RFC-8#add-comment?atlOrigin=eyJpIjoiZDhlO… ) Add Comment ( https://sel4.atlassian.net/browse/RFC-8#add-comment?atlOrigin=eyJpIjoiZDhlO… )
Get Jira notifications on your phone! Download the Jira Cloud app for Android ( https://play.google.com/store/apps/details?id=com.atlassian.android.jira.co… ) or iOS ( https://itunes.apple.com/app/apple-store/id1006972087?pt=696495&ct=EmailNot… ) This message was sent by Atlassian Jira (v1001.0.0-SNAPSHOT#100177- sha1:149c709 )