Hi,
Now that the fix for this issue has made it through our integration
pipeline, I can explain what happened, since some of you might be
interested. The fix is here:
https://github.com/seL4/seL4/commit/d12bb374ab47d2b13f438969d2eb5dde1021af84
At line 56, the target variable was originally declared uninitialised.
If the loop at lines 57 to 64 executes at least once, then the variable
is initialised at line 58. But if the loop does not execute, then the
variable would remain uninitialised when it is used at line 65. This
would be the case if infer_cpu_gic_id was ever called with nirqs <= 0.
This is *not* a problem in the released kernel, becuase infer_cpu_gic_id
is only ever called once in dist_init, with 32 <= nirqs <= 1024.
However, the binary translation validation test that failed does not do
the inter-procedural analysis that would be necessary to realise this.
When looking at infer_cpu_gic_id, the analysis allows nirqs to take any
value. It considers that the loop might not execute, and therefore
target might be uninitialised when it is used at line 65. The dataflow
model that is produced by the analysis cannot model such undefined
behaviour, and so the test fails. With the fix, the variable is never
uninitialised, so the analysis does not run into this problem.
In summary, the fix you see here keeps our proofs happy, but it does not
increase the safety of the kernel. This function, as it is actually
used, was already safe.
In terms of our development process, this kind of failure in the binary
translation validation is quite rare and difficult to predict. And since
our proofs take quite a long time to run, we don't always insist that
our kernel developers run the proofs for changes in boot code, which is
currently unverified except for binary translation validation. We accept
that these failures will occasionally happen, and we deal with them when
they do. The functions dist_init and infer_cpu_gic_id are boot-only
code, so this was one of those rare failures.
Of course, we should insist on running all the tests for versions we
intend to formally release, and we will for future releases.
Best regards,
Matt
________________________________________
From: Brecknell, Matthew (Data61, Kensington NSW)
Sent: Thursday, 8 November 2018 11:25 AM
To: devel@sel4.systems
Cc: Mcleod, Kent (Data61, Kensington NSW)
Subject: Re: seL4 10.1.0 and camkes-3.6.0
Hi,
Due to an oversight on my part, seL4 10.1.0 was released without passing
the full verification test suite.
We have no reason to believe that there is any problem in the verified
parts of this seL4 version.
The test that failed is part of the binary translation validation, which
produces a dataflow graph from the Isabelle/HOL model of the C code, and
then attempts to automatically prove a refinement relation between the C
and that dataflow graph. The failure occurred in boot code, which in any
case is currently not verified. The failure does not mean that there is
any problem with the C code. Most likely, it just means that some new
code uses some C idiom which is currently not handled by the automatic
translation process.
All other components of the verification test suite pass, including
security proofs and functional correctness proofs for the Isabelle/HOL
model of the C code.
Nevertheless, we intend to rectify this failure as soon as possible, and
we'll make further announcements when we understand what happened.
I want to reiterate that this was entirely my fault, and was not at all
Kent's fault.
Best regards,
Matt
________________________________________
From: Devel