Copilot Verifier is an add-on to the Copilot Stream
DSL for verifying the correctness of C
code generated by the copilot-c99
package.
The main idea of the verifier is to use the Crucible symbolic simulator to interpret the semantics of the generated C program and and to produce verification conditions sufficient to guarantee that the meaning of the generated program corresponds in a precise way to the meaning of the original stream specification. The generated verification conditions are then dispatched to SMT solvers to be automatically solved. We will have more to say about exactly what is meant by this correspondence below.
Copilot Verifier is described in the ICFP 2023 paper Trustworthy Runtime Verification via Bisimulation (Experience Report).
To build the verifier from source, first make sure you have met the following prerequisites:
-
Ensure that you have the
cabal
andghc
executables in yourPATH
. If you don't already have them, we recommend usingghcup
to install them: https://www.haskell.org/ghcup/. We recommendCabal
3.10 or newer, and one of GHC 9.4, 9.6, or 9.8. -
Ensure that you have the
clang
andllvm-link
utilities from LLVM in yourPATH
. Currently, LLVM versions up to 16 are supported. LLVM binaries are available at https://github.com/llvm/llvm-project/releases. -
Ensure that you have the
z3
SMT solver in yourPATH
.z3
binaries are available at https://github.com/Z3Prover/z3/releases.
Then, clone the repo and run:
$ git submodule update --init
$ cabal test copilot-verifier
This will clone the repository, build the verifier, and run the associated test suite. If you have performed all of the steps above correctly, the test suite should pass.
We also provide a Dockerfile which automates the process of installing Copilot and the verifier. The Dockerfile can be built and run using the following commands:
$ docker build -t <tag> .
$ docker run -it <tag>
Where <tag>
is a unique name for the Docker image.
The main interface to the verifier is the Copilot.Verifier.verify
function, which takes some options to drive the code generation
process and a Copilot Spec
object. It will invoke the Copilot
code generator to obtain C sources, compile the C sources into
LLVM bitcode using the clang
compiler front-end, then
parse and interpret the bitcode using crucible
. After generating
verification conditions, it will dispatch them to an SMT solver,
and the result of those queries will be presented to the user.
There are a number of examples (based on similar examples
from the main Copilot repository) demonstrating how to
incorporate the verifier into a Copilot program.
See the copilot-verifier/examples
subdirectory of this repository.
The Copilot DSL represents a certain nicely-behaved
class of discrete-time stream programs. Each Stream
in Copilot denotes an infinite stream of values; one may
just as well think that Stream a
represents a pure mathematical
function ℕ → a
from natural numbers to values of type a
.
See the
Copilot manual
for more details of the Copilot language itself and its semantics.
One of the central design considerations for Copilot is that is should be possible to implement stream programs using a fixed, finite amount of storage. As a result, Copilot will only accept stream programs that access a bounded amount of their input streams (including any recursive stream references). This allows an implementation strategy where the generated C code can use fixed-size ring buffers to store a limited number of previous stream values.
The execution model for the generated programs is that the program
starts in a state corresponding to stream values at time t = 0
;
"external" stream input values are placed in distinguished global
variables by the calling environment, which then executes a step()
function to move to the next time step. The step()
function captures
the values of these external inputs and does whatever computation is
necessary to update its internal state from time t=n
to time
t=n+1
. In addition, it will call any defined "trigger" functions
if the stream state at that time step satisfies the defined guard condition.
In short, the generated C program steps one moment at a time through
the stream program, consuming a sequence of input values provided by
a cooperating environment and calling handler functions whenever
states of interest occur.
What does it mean, then, for a generated C program in this style
to correctly implement a given stream program? The intuition
is basically that after n
calls to the step
function,
the state of the ring-buffers of the C program should correctly
compute the value of the corresponding stream expression
evaluated at index n
, assuming the C program has been fed
inputs corresponding to the first n
values of the external stream
inputs. Moreover, the trigger functions should be called from
the step
function exactly at the time values when the stream expressions
evaluate to true.
The notion of correspondence for the values flowing in streams is relatively straightforward: these values consist of fixed-width machine integers, floating-point values, structs and fixed-length arrays. For each, the appropriate notion of equality is fairly clear.
Both the original Stream
program and the generated C program
can be viewed straightforwardly as a transition system, and under
this view, the correspondence we want to establish is a bisimulation
between the states of the high-level stream program and the low-level
C program. The proof method for bisimulation requires us to provide
a "correspondence" relation between the program states, and then prove
three things about this relation:
- that the initial states of the programs are in the relation;
- if we assume two arbitrary program states begin in the relation and each takes a single transition (consuming corresponding inputs), the resulting states are back in the relation;
- that any observable actions taken by one program are exactly mirrored by the other.
On the high-level side of the bisimulation, the program
state is essentially just the value of the current time step n
,
whereas on the C side it consists of the regions of global memory that
contain the ring-buffers and their current indices. The transition
relation for the high-level program just increments the time value by
one, and the transition for the C program is defined by the action
of the generated step()
function.
Suppose s
is one of the stream definitions in the original Copilot
program which is required to retain k
previous values;
let buf
be the global variable name of the ring-buffer in the C
program, and idx
be the global variable name maintaining the
current index into the ring buffer. Then the correspondence
relation is basically that 0 <= idx < k
and
s[n+i] = buf[(idx+i) mod k]
as i
ranges from 0 .. k-1
.
By abuse of notation, here we mean that s[j]
is
the value of the stream expression s
evaluated at index j
,
whereas buf[j]
means the value obtained by reading the j
th value
of the buffer buf
from memory. The overall correspondence relation
is a conjunction of statements like this, one for each stream
expression that is realized via a buffer.
The kind of program correspondence property we desire is a largely mechanical affair. As the code under consideration is automatically generated, it has a very regular structure and is specifically intended to implement the semantics we wich to verify it against. As such, we expect these proofs to be highly automatable.
The proof principle of bisimulation itself is not amenable
to reduction to SMT, as if falls outside the first-order theories
those solvers understand. Likewise, the semantics of Copilot
and C might possibly be reducible directly to SMT, but it would be
impractical to do so. However, we can reduce the individual
proof obligations mentioned above into a series of lower-level
logical statements that are amenable to SMT proof by
defining the logical semantics of stream programs, and using
symbolic simulation techniques to interpret the semantics of the
C program. Performing this reduction is the key contribution
of copilot-verifier
.
The proof first obligation we must discharge is to show that
the initial states of the two programs correspond. For each
stream s
there is a corresponding k
, which is the length of
the ring-buffer implementing it. We must simply verify that
the C program initializes its buffer with the first k
values
of the stream s
, and that the idx
value starts at 0.
Because of the restrictions Copilot places on programs, these
first k
values must be specified concretely and will not be
able to depend on any external inputs. As such, this step
is quite easily performed, as it requires only direct evaluation
of concrete inputs.
The bulk of the proof effort consists of demonstrating that
the bisimulation relation is preserved by transitions.
In order to do this step, we must begin with totally symbolic
initial states: we know nothing except that we are at some
arbitrary time value n
, and that the C program buffers
correspond to their stream values as required by the relation.
Thus, we create fresh symbolic variables for the streams
from n
to n + k-1
, and for the values of all the involved
external streams at time n
. Then, we run forward the Copilot
program by evaluating the stream recurrence expression to
compute the value of each stream at time n+k
.
Next we set up an initial state of the C program by choosing,
for each ring buffer, an arbitrary value for its current index
within its allowed range, and then writing the variables
corresponding to each stream value into the buffers at
their appropriate offsets. The symbolic simulator is then
invoked to compute the state update effects of the step()
function. Afterward, we read the poststate values from the
ring-buffers and verify that they correspond to the stream
values from n+1
up to n+k
.
As part of symbolic simulation, Crucible may also generate side-conditions that relate to memory safety of the program, or to error conditions that must be avoided. All of the bisimulation equivalence conditions and the safety side-conditions will be submitted to an SMT solver.
For our purposes, the only observable effects of a Copilot program
relate to any "trigger" functions defined in the spec. Our task is to
show that the generated C code calls the external trigger functions if
and only if the corresponding guard condition is true, and that the
arguments passed to those functions are as expected.
This proof obligation is proved in the same phase along with
the transition relation proof above because the step()
function
is responsible for both invoking triggers and for performing state
updates.
The technique we use to perform this aspect of the proof is to install "override" functions to the external symbols corresponding to the C trigger functions before we begin symbolic simulation. In a real system, the external environment would be responsible for implementing these functions and taking whatever appropriate action is necessary when the triggers fire. However, we are verifying the generated code in isolation from its environment, so we have no implementation in hand. Instead, the override will essentially implement a stub function that simply captures its arguments and the path condition under which it was called. After simulation completes, the captured arguments and path condition are required to be equivalent to the corresponding trigger guard and arguments from the Copilot spec. These conditions are discharged to the SMT solver at the same time as the transition relation obligations.
Because of the way we model the trigger functions, we make a number of implicit assumptions about how the actual implementations of those functions must behave. The most important of those assumptions is that the trigger functions must not modify any memory under the control of the Copilot program, including its ring buffers and stack. We also assume that the trigger functions are well defined, i.e. they are memory safe and do not perform any undefined behavior.
A generated C program may make use of partial operations. These range from division, which can fail if the second argument is zero, to signed integer arithmetic, which can overflow and result in undefined behavior. The verifier has two modes for dealing with partial operations:
-
Any invocation of a partial operation on undefined inputs in the generated C program will result in an error, provided that the user did not add an assertion that assumes this behavior will not occur.
-
If the generated C program invokes a partial operation on undefined inputs, the verifier will check if this coincides with a corresponding invocation of a partial operation in the Copilot spec. If so, the verification will succeed. In other words, the verifier will check that the spec and the C program are "crash-equivalent".
The verifier uses mode (1) by default, but mode (2) can be enabled by using
Copilot.Verifier.verifyWithOptions sideCondVerifierOptions
. In this mode,
the verifier will analyze any invocation of a operation which could be partial
and generate a side condition that this operation will only be invoked on
well defined inputs. During the transition step of the bisimulation proof,
the verifier will add these side conditions as assumptions. Therefore, if
simulating the C program generates any side conditions due to invoking partial
operations, these side conditions from the C program should be dischargeable
using the corresponding side conditions from the Copilot spec.
Mode (2) has the caveat that clang
may compile C code to LLVM bitcode in
which a partial function is no longer applied to undefined inputs. For
instance, clang
will sometimes promote 16-bit integer values to 32 bits
before performing arithmetic on them. This can turn an operation that would
result in signed 16-bit integer overflow into a 32-bit integer operation
that does not overflow, for instance.
We rely on the clang
compiler front-end to consume C source files
and produce LLVM intermediate language, which then becomes the input
to the later verification steps. To the extent that the input program
is not fully-portable C, clang
may make implementation-specific
decisions about how to compile the program which might be made
different if compiled by a different compiler (e.g. gcc
). We expect
this aspect to be mitigated by the fact that Copilot programs are
automatically generated into a rather simple subset of the C language,
and is designed to be as simple as possible.
Any code-generation bugs in clang
itself may affect the soundness
of our verifier. Again, however, Copilot generates a well-understood
subset of the language, and we expect clang
to be well-tested on
the code patterns produced.
The semantics of LLVM bitcode, as encoded in the crucible-llvm
package, may have errors that affect soundness. We mitigate this risk
by testing our semantics against a corpus of verification problems
produced for the SV-COMP verification competition, paying special
attention to any soundness issues that arise. Crux
, a standalone
verification system based on crucible-llvm
, was a participant in the
2022 edition of SV-COMP.
The semantics of Copilot programs, as encoded in the
Copilot.Theorem.What4
module may have errors that affect soundness.
For the moment we do not have an effective mitigation strategy for
this risk other than manual examination and comparison against the
intended semantics of Copilot, as encoded in the interpreter.
There is limited SMT solver support for floating-point values,
especially for transcendental functions like the trig primitives. As a
result, we reason about floating point expressions via uninterpreted
functions. In other words, we leave the semantics of the
floating-point operations totally abstract, and simply verify that the
Copilot program and the corresponding C program apply the same
operations in the same order. This is sound, but leaves the possibility
that the compiler will apply some correct transformation to
floating-point expressions that we are nonetheless unable to verify.
However, on low optimizations and without the --fast-math
flag,
compilers generally (and clang
in particular) are very reluctant to
rearrange floating-point code, and the verifications generally succeed.