high-availability system should have these software services
Hi, In general, a high-availability system should have the following software services http://electronicdesign.com/boards/high-availability-rtoss-deliver-five-nine s-reliability : - Heartbeat support for each server and each application. - Event management capability for change notification. - Alarm management for error handling. - Transactions capability for check-pointing and rollback/restart. - Clustering for server management and applications links. - Reliable storage support for RAIDs and for journaling file systems. I want to develop a high-availability system on seL4, can anyone give me some suggestions? QNX has high availability support http://www.qnx.com/developers/docs/6.3.0SP3/neutrino/sys_arch/ham.html , can we learn something from them? Xilong Pei Tongji University 2015/6/11
In general, a high-availability system should have the following software services I want to develop a high-availability system on seL4, can anyone give me some suggestions?
Does that have to all be in the OS? To what degree could it be a library independed of OS?
On 12 Jun 2015, at 7:11 , Raoul Duke <raould@gmail.com> wrote:
In general, a high-availability system should have the following software services I want to develop a high-availability system on seL4, can anyone give me some suggestions?
Does that have to all be in the OS? To what degree could it be a library independed of OS?
What is the “OS” becomes a bit relative with a true microkernel (which QNX isn’t). But you’re right if you think this should certainly not be part of the *kernel*. Gernot ________________________________ The information in this e-mail may be confidential and subject to legal professional privilege and/or copyright. National ICT Australia Limited accepts no liability for any damage caused by this email or its attachments.
It would be particularly neat to do a fault-tolerant implementation of the seL4 API and a refinement proof that it implemented the spec when the number of concurrent failures was limited. On Jun 12, 2015 4:40 AM, "Gernot Heiser" <gernot@nicta.com.au> wrote:
On 12 Jun 2015, at 7:11 , Raoul Duke <raould@gmail.com> wrote:
In general, a high-availability system should have the following software services I want to develop a high-availability system on seL4, can anyone give me some suggestions?
Does that have to all be in the OS? To what degree could it be a library independed of OS?
What is the “OS” becomes a bit relative with a true microkernel (which QNX isn’t). But you’re right if you think this should certainly not be part of the *kernel*.
Gernot
________________________________
The information in this e-mail may be confidential and subject to legal professional privilege and/or copyright. National ICT Australia Limited accepts no liability for any damage caused by this email or its attachments. _______________________________________________ Devel mailing list Devel@sel4.systems https://sel4.systems/lists/listinfo/devel
On 12 Jun 2015, at 19:10 , Harry Butterworth <heb1001@gmail.com<mailto:heb1001@gmail.com>> wrote: It would be particularly neat to do a fault-tolerant implementation of the seL4 API and a refinement proof that it implemented the spec when the number of concurrent failures was limited. do you mean something like this? https://ssrg.nicta.com.au/projects/TS/cots.pml Gernot ________________________________ The information in this e-mail may be confidential and subject to legal professional privilege and/or copyright. National ICT Australia Limited accepts no liability for any damage caused by this email or its attachments.
I hadn't noticed you'd done that work and it is interesting but no, I happened to be thinking more along the lines of a loosely coupled cluster of heterogeneous architectures (say ARM, x86 and Power) and perhaps using Byzantine fault tolerant replication so as to lift the API over any n failures, compromised machines or hardware design bugs.
I am also curious to know if sel4 kernel support fault-tolerant for system-level service. Though those services are not implemented in the kernel, they are still critical for the whole system. Can we achieve such fault-tolerant without kernel support? Thanks. Yuxin On Wed, Jun 10, 2015 at 9:59 PM, XilongPei(裴喜龙) <pei_xilong@tongji.edu.cn> wrote:
Hi,
In general, a high-availability system should have the following software services
http://electronicdesign.com/boards/high-availability-rtoss-deliver-five-nine s-reliability : - Heartbeat support for each server and each application. - Event management capability for change notification. - Alarm management for error handling. - Transactions capability for check-pointing and rollback/restart. - Clustering for server management and applications links. - Reliable storage support for RAIDs and for journaling file systems.
I want to develop a high-availability system on seL4, can anyone give me some suggestions? QNX has high availability support http://www.qnx.com/developers/docs/6.3.0SP3/neutrino/sys_arch/ham.html , can we learn something from them?
Xilong Pei Tongji University 2015/6/11
_______________________________________________ Devel mailing list Devel@sel4.systems https://sel4.systems/lists/listinfo/devel
Hi All, We’re in the middle of finalising the camera ready for our latest publication in the area of fault detection (fault tolerance is future work). It should be available in the next week or so. I don’t want to delve into too much detail prior, but basically the kernel features new (non-verified) mechanisms to enable the redundant execution of seL4-based systems, including replicas of seL4 itself. There are some assumptions/caveats, with the most prominent being non-redundant hardware devices are a single point of failure, which is also obvious. However, except for a small part of drivers and kernel code, the rest of the system is within the sphere of replication. Stay tuned for more details when the camera ready is finished. - Kevin From: Devel [mailto:devel-bounces@sel4.systems] On Behalf Of Yuxin Ren Sent: Monday, 15 June 2015 6:47 AM To: XilongPei(裴喜龙) Cc: devel@sel4.systems Subject: Re: [seL4] high-availability system should have these software services I am also curious to know if sel4 kernel support fault-tolerant for system-level service. Though those services are not implemented in the kernel, they are still critical for the whole system. Can we achieve such fault-tolerant without kernel support? Thanks. Yuxin On Wed, Jun 10, 2015 at 9:59 PM, XilongPei(裴喜龙) <pei_xilong@tongji.edu.cn<mailto:pei_xilong@tongji.edu.cn>> wrote: Hi, In general, a high-availability system should have the following software services http://electronicdesign.com/boards/high-availability-rtoss-deliver-five-nine s-reliability<http://electronicdesign.com/boards/high-availability-rtoss-deliver-five-nines-reliability> : - Heartbeat support for each server and each application. - Event management capability for change notification. - Alarm management for error handling. - Transactions capability for check-pointing and rollback/restart. - Clustering for server management and applications links. - Reliable storage support for RAIDs and for journaling file systems. I want to develop a high-availability system on seL4, can anyone give me some suggestions? QNX has high availability support http://www.qnx.com/developers/docs/6.3.0SP3/neutrino/sys_arch/ham.html , can we learn something from them? Xilong Pei Tongji University 2015/6/11 _______________________________________________ Devel mailing list Devel@sel4.systems<mailto:Devel@sel4.systems> https://sel4.systems/lists/listinfo/devel ________________________________ The information in this e-mail may be confidential and subject to legal professional privilege and/or copyright. National ICT Australia Limited accepts no liability for any damage caused by this email or its attachments.
participants (6)
-
Gernot Heiser
-
Harry Butterworth
-
Kevin Elphinstone
-
Raoul Duke
-
XilongPei(裴喜龙)
-
Yuxin Ren