Overview¶
3d supports several fixed-width integer base types, (nested) structs, constraints, enums, parameterized data types, tagged or otherwise value-dependent unions, fixed-size arrays, and variable-size arrays.
In addition to data validation, 3d also supports local actions to pass values read from the data structure, or pointers to them, to the caller code.
3d also supports validating structures that contain pointers, probing pointers for validity and traversing them in order to validate the data they refer to.
Base types¶
The primitive types in 3d include unsigned integers of the following flavors:
UINT8
: 8-bit unsigned little-endian integer- UINT16: 16-bit unsigned little-endian integer
- Literal values of
UINT8
are integers with theuy
suffix, e.g.,0uy, 8uy 255uy
etc. - Literals can also be specified in hexadecimal, with hex digits in either lower or upper case``0xabuy, 0x1Auy, 0xFFuy, 0xFfuy`` etc.
- Literal values of
UINT8
are integers with theuy
suffix, e.g.,0uy, 8uy 255uy
etc. - Literals can also be specified in hexadecimal, e.g.,
0xabuy, 0xffuy
etc.
- Literal values of
- UINT32: 32-bit unsigned little-endian integer
- UINT64: 64-bit unsigned little-endian integer
Literal values of unsigned integer types can be specified in either decimal or
hexadecimal, e.g., 0, 1, 255
etc. in decimal; or in with hexadecimal with
either lower or upper case, e.g., 0x0, 0x01, 0xff, 0xFF, 0xFf
etc.
3d infers the smallest width of a literal based on type information, but the width can also be specified explicitly.
For UINT8
, a literal has the uy
suffix, e.g., 0uy, 1uy, 0xffuy,
255uy
etc.
For UINT16
, a literal has the us
suffix, e.g., 0us, 1us, 0xffffus,
255us, 65535us
etc.
For UINT32
, a literal has the ul
suffix, e.g., 0ul, 1ul, 0xfffffffful,
255ul, 4294967295ul
etc.
For UINT64
, a literal has the uL
suffix, e.g., 0uL, 1uL, 0xffffffffffffffffuL,
255uL
etc.
We also provide big-endian unsigned integers:
- UINT16BE: 16-bit unsigned big-endian integer
- UINT32BE: 32-bit unsigned big-endian integer
- UINT64BE: 64-bit unsigned big-endian integer
Big-endian integers are often useful in describing network message formats. 3d ensures suitable endianness conversions are applied when comparing or equating integers represented in different endianness. We show an example of this below.
Structs¶
We would like to extract a validator for a simple point type, with X
and Y coordinates. So we create a file, HelloWorld.3d
, with the
following 3d data format description:
entrypoint typedef struct _point {
UINT16 x;
UINT16 y;
} point;
This data format is very similar to a C type description, where
UINT16
denotes the type of unsigned 16-bit integers, represented
as little-endian; with the addition of the entrypoint
keyword,
which tells 3d to expose the validator for the type to the final user.
Once we run 3d
with this file, we obtain several files:
HelloWorld.c
andHelloWorld.h
, which contain the actual verified validators;HelloWorldWrapper.c
andHelloWorldWrapper.h
, which contain entrypoint function definitions for data validators that the user should ultimately use in their client code. More precisely, here is the contents ofHelloWorldWrapper.h
:
BOOLEAN HelloWorldCheckPoint(uint8_t *base, uint32_t len);
The HelloWorldCheckPoint
function is the actual validator for the
point
type. It takes an array of bytes, base
, and its length
len
, and it returns 1 if base
starts with valid point
data
that fits in len
bytes or less, and 0 otherwise.
More generally, for a given Module.3d
, a type definition typ
marked with entrypoint
tells 3d to expose its validator in
ModuleWrapper.h
which will bear the name ModuleCheckTyp
.
Structs can be nested, such as in the following instance:
typedef struct _point {
UINT16 x;
UINT16 y;
} point;
entrypoint typedef struct _triangle {
point a;
point b;
point c;
} triangle;
Then, since in this file the definition of point
is not prefixed
with entrypoint
, only triangle
will have its validator exposed
in TriangleWrapper.h
.
There can be multiple definitions marked entrypoint
in a given
.3d
file.
Warning
By default, 3d does not enforce any alignment constraints, and does not introduce any implicit alignment padding. So, for instance, in the following data format description:
typedef struct _point {
UINT16 x;
UINT16 y;
} point;
entrypoint typedef struct _coloredPoint1 {
UINT8 color;
point pt;
} coloredPoint1;
entrypoint typedef struct _coloredPoint2 {
point pt;
UINT8 color;
} coloredPoint2;
- in
coloredPoint1
, 3d will not introduce any padding between thecolor
field and thept
field; - in
coloredPoint2
, 3d will not introduce any padding after thecolor
field.
This is in the spirit of keeping 3d specifications explicit. However, 3d does support an option to add alignment padding to a structure, as described below.
Constraints¶
Validators for structs alone are only layout-related. Beyond that, 3d provides a way to actually check for constraints on their field values:
entrypoint typedef struct _smoker {
UINT32 age { age >= 21 };
UINT8 cigarettesConsumed;
} smoker;
In this example, the validator for smoker
will check that the
value of the age
field is at least 21.
The constraint language includes integer arithmetic (+
, -
,
*
, /
, ==
, !=
, <=
, <
, >=
, >
) and
Boolean propositional logic (&&
, ||
, !
)
Constraints on a field can also depend on the values of the previous fields of the struct. For instance, here is a type definition for a pair ordered by increasing values:
entrypoint typedef struct _orderedPair {
UINT32 lesser;
UINT32 greater { lesser <= greater };
} orderedPair;
Warning
Arithmetics on constraints are evaluated in machine integers, not mathematical integers. Thus, the following naive definition:
entrypoint typedef struct _boundedSum {
UINT32 left;
UINT32 right { left + right <= 42 };
} boundedSum;
will fail at F* verification because the expression left + right
must be proven to not overflow before evaluating the condition. The
correct way of stating the condition is as follows:
entrypoint typedef struct _boundedSum {
UINT32 left;
UINT32 right { left <= 42 && right <= 42 - left };
} boundedSum;
This verifies because F* evaluates the right-hand-side
condition of a &&
in a context where the left-hand-side
condition is assumed to be true (thus 42 - left
will not underflow.)
Bitfields¶
Like in C, the fields of a struct type in 3d can include bitfields, i.e., unsigned integer types of user-specified width represented packed within unsigned integer fields of the canonical sizes UINT16, UINT32 and UINT64.
Consider the following example:
typedef struct _BF {
UINT32 x : 6;
UINT32 y : 10 { y <= 900 };
UINT32 z : 16 { y + z <= 60000 };
} BF;
This defines a struct BF
occupying 32 bits of memory, where the
first 6 bits are for the field x
; the next 10 bits are for the
field y
; and the following 16 bits are for the field z
.
The fields x
, y
, and z
can all be used in specifications
and are implicitly promoted to the underlying integer type, UINT32
in this case, although the 3d verifier is aware of suitable bounds on
the types, e.g., that 0 <= x < 64
.
For types UINT8
. UINT16
, UINT32
and UINT64
, 3d
implements MSVC’s rules for packing bit fields:
least-significant bit first. For instance:
typedef struct _BF2 {
UINT16 x : 6;
UINT16 y : 12;
UINT8 z;
} BF2;
In BF2
, although x
, y
and z
cumulatively consume only
26 bits, the layout of BF2
is actually as shown below, consuming
40 bits, since a given field must be represented within the bounds of
a single underlying type—we have 10 unused bits before x
and 4
unused bits before y
.
counting from most-significant bits to least-significant bits:
0 1 2 3 4
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
| Unused | x | Unused| y | z |
| | | | | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
EverParse version 2023.04.17 and beyond encode bitfields with
big-endian integer types UINT16BE
, UINT32BE
and UINT64BE
with the most-significant bit (MSB) first, which is necessary for many
IETF network protocols (“network byte order”.) Those versions of
EverParse introduce the integer type UINT8BE
to read a 8-bit
bitfield as MSB-first.
Constants and Enumerations¶
3d provides a way to define numerical constants:
#define red 1
#define green 2
#define blue 42
Alternatively, 3d provides a way to define enumerated types:
UINT32 enum color {
red = 1,
green,
blue = 42
}
entrypoint typedef struct _coloredPoint {
color col;
UINT32 x;
UINT32 y;
} coloredPoint;
The validator for coloredPoint
will check that the value of
the field col
is either 1, 2 (for green
), or 42.
Contrary to structs, enum types cannot be marked entrypoint
.
The first enum label must be associated with a value.
The support type (here UINT32
) must be the same type as the type
of the values associated to each label.
Due to a limitation in the way 3d currently checks for the absence of double-fetches, values with enum type cannot be used in constraints. For example, the following code is currently rejected.
UINT32 enum color {
red = 1,
green,
blue = 42
};
typedef struct _enum_constraint {
color col;
UINT32 x
{
x == 0 || color == green
};
} _enum_constraint ;
With the following error message:
(Error) The type of this field does not have a reader, either because its values are too large or because reading it may incur a double fetch; subsequent fields cannot depend on it
One must instead write:
UINT32 enum color {
red = 1,
green,
blue = 42
};
entrypoint
typedef struct _enum_constraint {
UINT32 col
{
col == red ||
col == green ||
col == blue
};
UINT32 x
{
x == 0 || col == green
};
} _enum_constraint ;
We expect to lift this limitation soon.
Parameterized data types¶
3d also offers the possibility to parameterize a data type description
with arguments that can then be reused in constraints. For instance,
the following BoundedSum.3d
data description defines a pair of two
integers whose sum is bounded by a bound provided by the user as
argument:
entrypoint typedef struct _boundedSum (UINT32 bound) {
UINT32 left;
UINT32 right { left <= bound && right <= bound - left };
} boundedSum;
Then, these arguments will show up as arguments of the
corresponding validator in the BoundedSumWrapper.h
header produced
by 3d:
BOOLEAN BoundedSumCheckBoundedSum(uint32_t bound, uint8_t *base, uint32_t len);
Parameterized data types can also be instantiated within the .3d
file itself, including by the value of the field of a struct:
entrypoint typedef struct mySum {
UINT32 bound;
boundedSum(bound) sum;
} mySum;
A parameterized data type can also check whether a condition on its arguments holds before even trying to check its contents:
entrypoint typedef struct _boundedSum (UINT32 bound)
where bound <= 1729
{
UINT32 left;
UINT32 right { left <= bound && right <= bound - left };
} boundedSum;
In this case, the validator for boundedSum
would check
that bound <= 1729
, before validating its fields.
Tagged unions or casetype
¶
3d supports tagged unions: a data type can store a value named tag and a payload whose type depends on the tag value. The tag does not need to be stored with the payload (e.g. it could be a parameter to the type).
For instance, the following description defines the type of an integer prefixed by its size in bits.
#define size8 8
#define size16 16
#define size32 32
casetype _int_payload (UINT32 size) {
switch(size) {
case size8: UINT8 value8;
case size16: UINT16 value16;
case size32: UINT32 value32;
}
} int_payload;
entrypoint typedef struct _integer {
UINT32 size;
int_payload(size) payload;
} integer;
Warning
3d does not enforce that all cases of a union be of the same size, and 3d does not introduce any implicit padding to enforce it. Nor does 3d introduce any alignment padding. This is in the spirit of keeping 3d specifications explicit: if you want padding, you need to add it explicitly. See also the section on alignment.
A casetype
type actually defines an untagged union type dependent
on an argument value, which can be reused, e.g. for several types that
put different constraints on the value of the tag.
A casetype
type can also be marked entrypoint
.
Rather than defining a top-level casetype
, one can define a type by cases as
a field in a struct. For example, the following type is equivalent to the one
before:
typedef struct _integer_alt {
UINT32 size;
switch (size) {
case size8: UINT8 value8;
case size16: UINT16 value16;
case size32: UINT32 value32;
} payload;
} integer_alt;
Arrays¶
A field in a struct or a casetype can be an array.
Arrays in 3d differ from arrays in C in a few important ways:
- Rather than counting elements, the size of an array in 3d is always given in bytes.
- Array sizes need not be a constant expression: any integer
expression is permissible for an array, so long as it fits in
UINT32
. This allows expressing variable-sized arrays.
3d supports several kinds of arrays.
Byte-sized arrays¶
T f[:byte-size n]
The notation T f[:byte-size n]
describes a field named f
holding an
array of elements of type T
whose cumulative size in bytes is n
.
When sizeof(T) = 1
, 3d supports the notation T f[n]
, i.e., for
byte arrays, since the byte size and element count coincide, you need
not qualify the size of the array with a :byte-size
qualifier.
Singleton arrays of variable size¶
T f[:byte-size-single-element-array n]
In some cases, it is required to specify that a field contains a
single variable-sized element whose size in bytes is equal to a given
expression. The notation above introduces a field f
that contains
a single element of type T
occupying exactly n
bytes.
A variation is:
T f[:byte-size-single-element-array-at-most n]
which describes a field f
that contains a single element of type
T
occupying at most n
bytes, followed by padding bytes to make
up to n
bytes. This construct thus always consumes exactly n
bytes.
We expect to add several other kinds of variable-length array-like types in the uture.
Actions¶
In addition to semantic constraints on types, 3d allows augmenting the the fields of a struct or union with imperative actions, which allows running certain user-chosen commands at specified points during validation.
Let’s start with a simple example. Suppose you want to validate that a byte array contains a pair of integers, and then read them into a couple of mutable locations of your choosing. Here’s how:
entrypoint
typedef struct _Pair(mutable UINT32* x,
mutable UINT32* y)
{
UINT32 first
{:on-success
*x = first;
return true;
};
UINT32 second
{:on-success
*y = second;
return true;
};
} Pair;
The struct Pair
takes two out-parameters, x
and y
. Out
parameters are signified by the mutable
keyword and have pointer
types—in this case UINT32*
. Each field in the struct is
augmented with an on-success
action, where the action’s body is a
runs a small program snippet, which writes the first
field into
x
; the second
field into y
; and returns true
in both
cases. Actions can also return false
, to signal that validation
should halt and signal failure.
Now, in greater detail:
Action basics¶
An action is a program composed from a few elements:
- Atomic actions: Basic commands to read and write variables
- Variable bindings and sequential composition
- Conditionals
- External function calls
An action is evaluated with respect to a handler (e.g.,
on-success
) associated with a given field. We refer to this field
as the “base field” of the action.
An action is invoked by EverParse immediately after validating the
base field of the action. The action of the on-success
handler is
invoked in case the field is valid (if preseent), otherwise the
on-error
handler is invoked (if present).
The on-success handler can influence whether or not validation of the other fields continues by returning a boolean—in case an on-success action returns false, the validation of the type halts with an error.
The on-error handler also returns a boolean: in case it returns false, the error code associated with the validator mentions that an on-error handler failed; if it returns true, the error code is the error code associated with the failed validation of the base field. In both cases, validation halts with an error.
Atomic actions¶
- Expression
e
consist of variables and constants. *i = e
: Assigns the value of the expressionse
to the memory referenced by the pointeri
.*i
: Dereferences the pointeri
field_pos
: Returns the offset of base field of the action from the base of the validation buffer as aUINT32
value.field_ptr
: Returns a pointer to base field of the action as aPUINT8
, i.e., an abstract pointer to aUINT8
.return e
Returns a boolean valuee
abort
: Causes the current validation process to fail.
Composing atomic actions sequentially and conditionally¶
Composite actions can be built in a few ways:
- Atomic actions: For any atomic action
a
,a;
(with a trailing semicolon) is a composite actionp
. - Sequential composition:
`a; p
Given an atomic actiona
, and a compositite actionp
, the forma;p
runsa
thenp
. - Variable binding:
var x = a; p
Given an atomic actiona
, and a compositite actionp
, the formvar x = a; p
runsa
, stores its result in the new variablex
(local to the action), and then runsp
(wherep
may mention x - Conditionals:
if (e) { p }
is a conditional action that runs the composite actionsp
only if the conditione
is true. Additionally,if (e) { p0 } else { p1 }
is also legal, with the expected semantics, i.e.,p1` is run in case ``e
is false. - External function calls: an action can call a user-defined callback
C function. The user first needs to declare the callback function with
a top-level declaration in the 3D file: for instance,
extern UINT8 myFunction(UINT32 myArg1, UINT16 myArg2)
Then the user can call this function in an action, withvar myReturnValue = myFunction(e1, e2);
The user can also declare a function that returns no value, withvoid
as its return type. Then, the user can call this function directly as a statement in an action, withoutvar
:myFunction(e1, e2);
An external function can also accept out-parameters, usingmutable myType* myArg
in its 3D declaration.
Another example¶
Consider the following definition:
entrypoint
typedef struct _T(mutable PUINT8* out) {
UINT8 f1[10];
UINT8 f2[20]
{:on-success
var x = field_ptr;
*out = x;
return true;
};
} T;
Here, we define a type T
with an out-parameter PUINT8* out
,
i.e., pointer to a pointer to a UINT8
.
Associated with field f2
we have an on-success handler, where we
read a pointer to the field f2
into the local variable x
;
then, we write x
into *out
; and finally return true
.
Note
The return type of field_ptr
is PUINT8
. In
EverParse, a PUINT8
is a pointer into the input buffer, unlike
a UINT8*
which may point to any other memory. This distinction
between pointers into the input buffer and other pointers allows
EverParse to prove that validators never read the contents of the
input buffer more than once, i.e., there are no double fetches from
the input buffer. As such, the out
parameter should have type
PUINT8*
rather than UINT8*
.
Actions that always succeed¶
For actions that always succeed, 3d supports a more concise notation, using the
:act
form, as shown below:
entrypoint
typedef struct _TAct(mutable PUINT8* out) {
UINT8 f1[10];
UINT8 f2[20]
{:act var x = field_ptr; *out = x; };
} TAct;
This is equivalent to the prior :on-success
action shown earlier.
Restrictions¶
- Actions can only be associated with fields.
- Actions cannot be associated with the elements of an array, unless the array elements themselves are defined types, in which case they can be associated with the fields of that type, if any.
- Actions cannot be associated with bit fields.
Alignment¶
As mentioned previously, 3d does not introduce any implicit alignment padding.
However, it is often convenient to use 3d to model the in-memory layout of a C
structure, including the alignment padding that a C compiler would insert.
Rather than requiring the user to manually insert padding fields, 3d allows
decorating a struct with an aligned
attribute, which instructs the 3d
frontend to add padding between fields modeling the behavior of a C compiler.
This page provides a useful reference about alignment padding in C.
For example, consider the following structs:
aligned
typedef struct _point {
UINT16 x;
UINT16 y;
} point;
aligned
entrypoint typedef struct _coloredPoint1 {
UINT8 color;
point pt;
} coloredPoint1;
aligned
typedef struct _coloredPoint2 {
point pt;
UINT8 color;
} coloredPoint2;
The aligned
attribute to each struct adds alignment padding
between fields.
- Both fields of the type
point
are aligned at 2-byte boundaries; so no padding is inserted between them.- In type
coloredPoint1
the fieldcolor
is aligned at a 1-byte boundary, whilepr
is aligned at a 2-byte boundary; so 1 byte of padding is inserted between them. So, the whole struct consumes six bytes, aligned at a 2-byte boundary.- In type
coloredPoint2
the fieldpt
is aligned at a 2-byte boundary, but the typecolor
is aligned at a 1-byte boundary. So, no padding is inserted between them. But, the resulting type must be aligned at a 2-byte boundary, so 1 byte of padding is inserted after thecolor
field—so, the whole struct consumes six bytes.
The 3d compiler emits diagnostics describing the padding it inserts:
Adding padding field in Align._coloredPoint1 for 1 bytes at (preceding field Align.point pt;)
Adding padding field in Align._coloredPoint2 for 1 bytes at (end padding)
The aligned
attribute can also be adding to casetypes, though the behavior is
more limited:
aligned
casetype _Value (UINT8 tag) {
switch (tag) {
case 0:
coloredPoint1 cp1;
case 1:
coloredPoint2 cp2;
default:
struct {
point pt;
UINT16 other;
} cp3;
}
} Value;
Here, we have an aligned “union”. 3d checks that every branch of the union
describes field with the same size. It does not insert padding to make every
type have the same size. For example, if the UINT16 other
fields were left out
of the default case of Value
, the 3d compiler would emit the following error
message:
With the 'aligned' qualifier, all cases of a union with a fixed size must have the same size; union padding is not yet supported
Note, the aligned
attribute is not allowed on typedefs, enums, or other kinds
of 3d declarations.
Variable-length Types¶
The aligned
attribute is also supported on variable-length types, but use it
with care. 3d inserts alignment padding to mimic the behavior of a C compiler;
but C does not support variable-length types! There are many idioms that C
programmers use to model variable-length types, e.g., a zero-length array at the
end of a struct; sometimes a 1-length array; and sometimes and array with no
length at all, using C99 flexible array members
A rule of thumb for aligned
on variable-length types is that it only applies
to the fixed-size prefix of the type. For example, consider the following type:
aligned
typedef struct _tlv {
UINT8 tag;
UINT32 length;
UINT8 other;
Value(tag) payload[:byte-size length];
UINT32 other2;
} TLV;
The output from the 3d compiler includes the following diagnostic:
Adding padding field in Align._tlv for 3 bytes at (preceding field dependent UINT32 length;)
Adding padding field in Align._tlv for 1 bytes at (preceding field Align.Value(tag) payload[:byte-size length];)
- 3d adds 3 bytes after
tag
and precedinglength
, sincelength
is 4-byte aligned. - It adds 1 byte after
other
and preceding thepayload
field, since the payload is aValue
array and is 2-byte aligned. - But, notice, it does not add a padding field after
payload
to align theother2
field, sinceother2
follows a variable-length field.
In contrast, consider the following type in C:
typedef struct _tlv {
UINT8 tag;
UINT32 length;
UINT8 other;
Value payload[0];
} TLV;
The alignment of the Value payload[0]
field is 2, but its size is zero. So,
a C compiler adds the following padding:
- 3 bytes after
tag
and precedinglength
, sincelength
is 4-byte aligned. - 1 byte after
other
and preceding thepayload
field, since the payload is 2-byte aligned. - 2 bytes of end padding, since the alignment of
TLV
is 4
In such cases, if one really intends to model a variable-length C type, it is better to explicitly insert alignment padding to match a given layout, rather than relying on 3d to insert alignment padding that may not match the layout you have in mind.
typedef struct _tlv_alt {
UINT8 tag;
UINT8 padding0[3];
UINT32 length;
UINT8 other;
UINT8 padding1;
Value(tag) payload[:byte-size length];
UINT16 padding2;
} TLV_ALT;
Static Assertions to Validate Alignment¶
To confirm that the alignment 3d inserts matches the layout of a C type, 3d
automatically generates C types corresponding to every type with an aligned
attribute, and emits C static assertions to check that the sizes and field
offsets computed by 3d match what the C compiler computes.
For example, for the types defined in this section, 3d generates a file
AlignAutoStaticAssertions.c
with the contents below:
#include <stddef.h>
#include <stdint.h>
typedef uint8_t UINT8;
typedef uint16_t UINT16;
typedef uint32_t UINT32;
typedef uint64_t UINT64;
typedef struct _point
{
UINT16 x;
UINT16 y;
} point;
typedef struct _coloredPoint1
{
UINT8 color;
point pt;
} coloredPoint1;
typedef struct _coloredPoint2
{
point pt;
UINT8 color;
} coloredPoint2;
typedef union _Value
{
coloredPoint1 cp1_0;
coloredPoint2 cp2_1;
struct { point pt;
UINT16 other; } cp3_2;
} Value;
typedef struct _tlv
{
UINT8 tag;
UINT32 length;
UINT8 other;
} TLV;
#define EVERPARSE_STATIC_ASSERT(e) typedef char __EVERPARSE_STATIC_ASSERT__[(e)?1:-1];
EVERPARSE_STATIC_ASSERT(sizeof(point) == 4);
EVERPARSE_STATIC_ASSERT(offsetof(point, x) == 0);
EVERPARSE_STATIC_ASSERT(offsetof(point, y) == 2);
EVERPARSE_STATIC_ASSERT(sizeof(coloredPoint1) == 6);
EVERPARSE_STATIC_ASSERT(offsetof(coloredPoint1, color) == 0);
EVERPARSE_STATIC_ASSERT(offsetof(coloredPoint1, pt) == 2);
EVERPARSE_STATIC_ASSERT(sizeof(coloredPoint2) == 6);
EVERPARSE_STATIC_ASSERT(offsetof(coloredPoint2, pt) == 0);
EVERPARSE_STATIC_ASSERT(offsetof(coloredPoint2, color) == 4);
EVERPARSE_STATIC_ASSERT(sizeof(TLV) == 10);
EVERPARSE_STATIC_ASSERT(offsetof(TLV, tag) == 0);
EVERPARSE_STATIC_ASSERT(offsetof(TLV, length) == 4);
EVERPARSE_STATIC_ASSERT(offsetof(TLV, other) == 8);
Notice that in the C transcription of the type TLV
, 3d simply omits the
suffix of the type starting with the variable-length payload field.
If you compile this file with a C compiler (e.g., gcc -c
AlignAutoStaticAssertions.c
), you will see that the following static assertion
will fail, an indicator that for this variable-length type, you will be better
off removing the align
attribute and explicitly modeling the alignment
padding yourself:
EVERPARSE_STATIC_ASSERT(sizeof(TLV) == 10);
This is becase the C compiler inserts 3 bytes of end padding after the field
other
, whereas with the variable-length field, 3d adds 1 byte of padding.
Note
Arguably, due to these potential discrepancies, 3d could simply forbid the
aligned
attribute on variable-length types.
Explicitly checking 3d types for correspondence with existing C types¶
In addition to automatically generating static assertions for types with the
aligned
attribute, 3d allows one to explicitly assert the correspondence
between some 3d typed and C types.
A typical scenario is that you have an existing C program with some collection
of types defined in a file Types.h
. You’ve written a Types.3d
file to
defined validators for byte buffers containing those types, typically refining
the C types with additional semantic constraints and also with actions. Now, you
may want to make sure that types you defined in Types.3d
correspond to the
types in Types.h
, e.g., to ensure that you didn’t forget to include a field
in a struct, or that you’ve made explicit in your Types.3d
the alignment
padding between struct fields that a C compiler is sometimes requires to insert.
To assist with this, 3d provides the following feature:
// SNIPPET_START: GetFieldPtr.T
entrypoint
typedef struct _T(mutable PUINT8* out) {
UINT8 f1[10];
UINT8 f2[20]
{:on-success
var x = field_ptr;
*out = x;
return true;
};
} T;
// SNIPPET_END: GetFieldPtr.T
// SNIPPET_START: GetFieldPtr.T act$
entrypoint
typedef struct _TAct(mutable PUINT8* out) {
UINT8 f1[10];
UINT8 f2[20]
{:act var x = field_ptr; *out = x; };
} TAct;
// SNIPPET_END: GetFieldPtr.T act$
refining "GetFieldPtrBase.h" {
S as T
}
Following the type definitions, the refining
section states that
the type S
defined in the C header file GetFieldPtrBase.h
is
refined by the type T
defined here. As a result of this
declaration, 3d emits a static assertion in the C code of the form
#include "GetFieldPtrBase.h"
C_ASSERT(sizeof(S) == 30);
checking that the sizeof(S)
as computed by the C compiler matches
3d’s computation of the sizeof(T)
.
In generality, the refining declaration takes the following form:
refining "I1.h", ..., "In.h" {
S1 as T1, ...
Sm as Tm
}
where each Si
is a type defined in one of the C header files
“I1.h”, …, “In.h”, and the Ti
are types defined in the current
3d file. In case the types have the same names, one can simply write
T
instead of T as T
.
As a second example, let’s revisit the type from the alignment section, aiming to show that the 3d type corresponds to the C types defined in the header file AlignBase.h shown below:
#include <stdint.h>
typedef struct _point {
uint16_t x;
uint16_t y;
} point;
typedef struct _coloredPoint1 {
uint8_t color;
point pt;
} coloredPoint1;
typedef struct _coloredPoint2 {
point pt;
uint8_t color;
} coloredPoint2;
typedef union _Value {
coloredPoint1 cp1;
coloredPoint2 cp2;
struct {
point pt;
uint16_t other;
} cp3;
} Value;
typedef struct _tlv {
uint8_t tag;
uint32_t length;
uint8_t other;
Value payload[0];
} TLV;
Let’s also decorate the 3d file with the following refining declaration:
refining "AlignBase.h" {
TLV, Value, coloredPoint2, coloredPoint1, point
}
The 3d compiler emits the following static assertions:
#include <stddef.h>
#include <stdint.h>
#include "AlignBase.h"
typedef uint8_t UINT8;
typedef uint16_t UINT16;
typedef uint32_t UINT32;
typedef uint64_t UINT64;
#define EVERPARSE_STATIC_ASSERT(e) typedef char __EVERPARSE_STATIC_ASSERT__[(e)?1:-1];
EVERPARSE_STATIC_ASSERT(sizeof(TLV) == 10);
EVERPARSE_STATIC_ASSERT(offsetof(TLV, tag) == 0);
EVERPARSE_STATIC_ASSERT(offsetof(TLV, length) == 4);
EVERPARSE_STATIC_ASSERT(offsetof(TLV, other) == 8);
EVERPARSE_STATIC_ASSERT(sizeof(Value) == 6);
EVERPARSE_STATIC_ASSERT(sizeof(coloredPoint2) == 6);
EVERPARSE_STATIC_ASSERT(offsetof(coloredPoint2, pt) == 0);
EVERPARSE_STATIC_ASSERT(offsetof(coloredPoint2, color) == 4);
EVERPARSE_STATIC_ASSERT(sizeof(coloredPoint1) == 6);
EVERPARSE_STATIC_ASSERT(offsetof(coloredPoint1, color) == 0);
EVERPARSE_STATIC_ASSERT(offsetof(coloredPoint1, pt) == 2);
EVERPARSE_STATIC_ASSERT(sizeof(point) == 4);
EVERPARSE_STATIC_ASSERT(offsetof(point, x) == 0);
EVERPARSE_STATIC_ASSERT(offsetof(point, y) == 2);
As before, the sizeof(TLV)==10
fails, because of differences in alignment
padding with variable-length structures, a good indication to use explicit
alignment padding for variable-length types.
Generating code with for several compile-time configurations¶
Sometimes one wants to write a single format specification to
accommodate several compile-time configurations, e.g., to support
multiple architectures. 3D offers some limited support for decorating
a specification with compile-time conditionals familiar to C
programmers, e.g., #if
and #else
, and to generate C code
while preserving these C preprocessor directives.
For example, the listing below shows an integer type that can either
be represented using 64 bits (if ARCH64
is true) or 32 bits.
typedef struct _INT {
#if ARCH64
UINT64 x;
#else
UINT32 x;
#endif
} INT;
entrypoint
typedef struct _POINT {
INT x;
INT y;
} POINT;
To compile such a file using 3D, we also need to provide a
.3d.config
file that declares all the compile-time flags used in
the specification and mentions a C header file in which to find
definitions for those flags. Here is a sample config file:
{ "compile_time_flags":
{
"flags":["ARCH64"],
"include_file":"arch_flags.h"
}
}
Then, one can invoke 3d.exe --config Arch.3d.config
PointArch_32_64.3d --batch
, which produces the following C code as
output.
In the header file PointArch_32_64.h
, we see an include for
the header file mentioned in the config:
#include "arch_flags.h"
In the generated C file, PointArch_32_64.c
, we see the code below,
with the suitable preprocessor directives protecting the two variants
of the the Int
type declared in the source 3d file.
static inline uint64_t
ValidateInt(...) {
{
#if ARCH64
/* Validating field x */
/* Checking that we have enough space for a UINT64, i.e., 8 bytes */
...
#else
/* Validating field x */
/* Checking that we have enough space for a UINT32, i.e., 4 bytes */
...
#endif
}
Validating Data with Indirections or Pointers¶
In some cases, rather than parsing from a flat array of contiguous memory, one wants to parse a structure with indirections, i.e., the input buffer may contain pointers to other chunks of memory containing sub-structures to be parsed.
Parsing such pointer-rich structures is delicate, since before following a pointer, one needs to check that the pointer references valid memory. Given a raw pointer (just a memory address), one cannot, in general, check that the pointer is valid. However, in some scenarios, such checks are possible, e.g., in kernel code, it may be possible to probe a pointer to check that it is valid, and only then proceed to read from it. 3d supports parsing structures containing pointers, provided safe probing functions can be provided by the caller.
Pointer Types¶
Let’s start by looking at some basic support for pointer types in 3d.
As in C, any field of a structure can be marked as a pointer: here, below, the
second field y
is marked as pointer to a UINT32
.
typedef struct _TypeWithPointer {
UINT32 x;
UINT32 *y;
} TypeWithPointer;
By default, a pointer type is simply treated as an unsigned 64-bit integer and
3d will not dereference the pointer when validating a type. One can mark any
field as a pointer, not just fields with base type, and a pointer field can be
associated with a constraint, as any other field of a base type. The example
below shows a constraint on a pointer field checking that it is
non-null—notice that in the constraint, the ptr
value is just treated as
having type UINT64
.
typedef struct _NonNullPointer {
UINT32 len;
TypeWithPointer *ptr
{ ptr != 0uL };
} NonNullPointer;
One can also associate an action with a pointer field, e.g., reading its value into an out parameter.
typedef struct _MoreTypesWithPointer(mutable UINT64 *out) {
UINT32 len;
TypeWithPointer *ptr
{:act *out = ptr;};
} MoreTypesWithPointer;
One can also explicitly mark the size of a pointer, giving it the type
UINT64
(the default) or UINT32
, as shown in the examples below.
typedef struct _ExplicitPointerSize64(mutable UINT64 *out) {
UINT32 len;
TypeWithPointer *pointer(UINT64) ptr
{ ptr != 0uL }
{:act *out = ptr;};
} ExplicitPointerSize64;
typedef struct _ExplicitPointerSize32(mutable UINT32 *out) {
UINT32 len;
TypeWithPointer *pointer(UINT32) ptr
{ ptr != 0ul }
{:act *out = ptr;};
} ExplicitPointerSize32;
Note, in all these examples, the type of pointed-to data is irrelevant: 3d simply treats the pointer as an integer value.
Probing Pointers: A first example¶
We now look at how to traverse pointers, dereferencing them and validating the data they point to.
Let’s start with a simple example. Our goal is to validate a format that
contains a single indirection: a structure S
containing a pointer to a
structure T
.
extern probe ProbeAndCopy
extern probe (INIT) ProbeInit
typedef struct _T(UINT32 bound) {
UINT16 x { x >= bound };
UINT16 y { y >= x };
} T;
entrypoint
typedef struct _S(EVERPARSE_COPY_BUFFER_T dest) {
UINT8 bound;
T(bound) *tpointer probe ProbeAndCopy(length = sizeof(T), destination = dest);
} S;
The first line declares a probe
function called ProbeAndCopy
. This is a
requirement on the user of the generated parser to link the generated code with
a function called ProbeAndCopy
. In fact, the generated code contains an
extern declaration with the signature shown below (in Probe_ExternalAPI.h
):
extern BOOLEAN ProbeAndCopy(
uint64_t size, //The number of bytes to copy
uint64_t read_offset, //starting at this offset from the src pointer
uint64_t write_offset, //writing to this offset in the dst buffer
uint64_t src, //the source address to be probed and checked for validity
EVERPARSE_COPY_BUFFER_T dst //the target buffer into which the data is to be copied
);
That is, the ProbeAndCopy
function is expected to check probe the source address,
check its validity at read_offset
, and copy size
bytes into the dst
buffer starting at write_offset
.
Note, although a typical implementation may choose to copy memory into the
destination buffer, this not strictly required. In particular, the type
EVERPARSE_COPY_BUFFER_T
is also left to the user to define. In particular,
in EverParseEndianness.h
, we have
typedef void* EVERPARSE_COPY_BUFFER_T;
While in EverParse.h
, we further have:
extern uint8_t *EverParseStreamOf(EVERPARSE_COPY_BUFFER_T buf);
extern uint64_t EverParseStreamLen(EVERPARSE_COPY_BUFFER_T buf);
That is, the client code can choose any definition for
EVERPARSE_COPY_BUFFER_T
(since it is just a void*
), so long as it can
also provide two functions: EverParseStreamOf
to extract a buffer of bytes
from a EVERPARSE_COPY_BUFFER_T
; and EverParseStreamLen
to extract the
length of the buffer.
The second line defines another extern callback for extern probe (INIT)
ProbeInit
. This results in the following extern C declaration in
Probe_ExternalAPI.h
:
extern BOOLEAN ProbeInit(
uint64_t size,
EVERPARSE_COPY_BUFFER_T dst
)
The ProbeInit
callback allows a caller to initialize the destination buffer,
preparing it to contain up to size
bytes of data, e.g., if need be, one
could use this callback to allocate memory.
Let’s return to the example to see how the ProbeAndCopy
function is used.
The type T
is just a struct with two fields, constrained by a lower bound.
The type S
is more interesting. It starts with a UINT8 bound
and then
contains a pointer t
to a T`(bound)
struct. To emphasize the point,
the following picture illustrates the layout:
bytes
0........1........2........3........4........5........6........7........8........9
S: { bound | tpointer }
|
|
.-------------------------.
|
v
0........1........2........3........4........5........6........7........8.........9
T: { x | y }
The input buffer represents the S
structure in 5 bytes, beginning with one
byte for the bound
, and following by 8 bytes for the tpointer
field—currently, 3D treats pointer fields as always 8 bytes long.
The tpointer
field contains a memory address that points to the a T
structure, which is represented in 4 bytes, with 2 bytes each for its x
and
y
fields, as usual.
The 3D notation below:
T(bound) *tpointer probe ProbeAndCopy(length = sizeof(T), destination = dest);
Instructs the parser to:
- First, read the contents of the
tpointer
field into a local vairablesrc
- Then, call
ProbeInit(4, dest)
to prepare the destination buffer, wheresizeof(T)=4
.- Then, if
ProbeInit
succeeds, useProbeAndCopy(sizeof(T), 0, 0, src, destS)
check that thesizeof(T)
bytes starting at the address pointed to bytpointer
is valid memory, using thedest
parameter as its copy buffer.- Finally, if
ProbeAndCopy
succeeds, then validate thatEverParseStreamOf(dest)
buffer contains a validT(bound)
structure, inEverParseStreamLen(dest)
bytes.
Ultimately, the generated code provides the following signature for a C caller,
in ProbeWrapper.h
, to validate a buffer pointers to by base
, containing
at least len
bytes, checking that it contains a valid S(dest)
.
BOOLEAN ProbeCheckS(EVERPARSE_COPY_BUFFER_T dest, uint8_t *base, uint32_t len);
Probing Multiple Indirections¶
Continuing our simple example, let’s add another layer of indirection:, with a
structure U
containing a pointer to S
. This can be specified as shown
below:
entrypoint
typedef struct _U(EVERPARSE_COPY_BUFFER_T destS, EVERPARSE_COPY_BUFFER_T destT) {
UINT8 tag;
S(destT) *spointer probe ProbeAndCopy(length = sizeof(S), destination = destS);
} U;
- The type
U
packages a pointer to anS
structure with a tag. - The specification of
U
is parameterized by two destintation buffers:destS
to receive the contents of the memory referenced byspointer
; anddestT
to receive the contents of the memory referenced bytpointer
.
Operationally, the validator for U
proceeds by:
- First, validating the tag field (in this case, it is a noop)
- Then, reading
spointer
intosrcS
and- Calling
ProbeInit(sizeof(S), destS)
- Then,
ProbeAndCopy(sizeof(S), 0, 0, srcS, destS)
- Then, validate
EverParseStreamOf(destS)
contains a validS(destT)
.
- Calling
The validation of S(destT)
proceeds as described before, copying the
contents of tpointer
into destT
and validating it.
The C interface includes the following:
BOOLEAN ProbeCheckU(EVERPARSE_COPY_BUFFER_T destS, EVERPARSE_COPY_BUFFER_T destT, uint8_t *base, uint32_t len);
Note, one can also reuse the same copy buffer for multiple probe, so long as the
probes are done sequentially. For instance, we use several probes below, reusing
destT
multiple times to parser the nested T
structure within sptr
,
and again for tptr
and t2ptr
.
entrypoint
typedef struct _V(EVERPARSE_COPY_BUFFER_T destS, EVERPARSE_COPY_BUFFER_T destT) {
UINT8 tag;
S(destT) *sptr probe ProbeAndCopy(length = 9, destination = destS);
T(17) *tptr probe ProbeAndCopy(length = 8, destination = destT);
T(tag) *t2ptr probe ProbeAndCopy(length = 8, destination = destT);
} V;
This is allowed since the probes are done sequentially, and the copy buffer is
not reused before the probe & validation are complete. On the other hand, if
one were to try to reuse a copy buffer before its probe & validation are
complete (e.g., by using destT
as the destination buffer for sptr
) 3D
issues an error message:
./Probe.3d:(30,16): (Error) Nested mutation of the copy buffer [destT]
Top-level Probes¶
Rather than attaching a probe to a field, one can also attach a probe to an entire type. For instance, one can write the following:
entrypoint probe ProbeAndCopy(length=sizeof(this))
typedef struct _Indirect {
UINT32 fst;
UINT32 snd;
UINT8 tag;
} Indirect;
This type specifies the following layout, with an input buffer containing a single pointer which refers to a buffer containing a valid struct with three fields.:
bytes
0........1........2........3........4........5........6........7........8
I:{ pointer }
|
|
.-------------------------.
|
v
0........1........2........3........4........5........6........7........8.........9
TT:{ x | y | tag }
This yields the following C interface, with two entry points. The first is to
probe and validate a pointer to the type Indirect
, while the second is to
validate a struct Indirect
.
BOOLEAN ProbeProbeAndCopyCheckIndirect(EVERPARSE_COPY_BUFFER_T probeDest, uint64_t probeAddr);
BOOLEAN ProbeCheckIndirect(uint8_t *base, uint32_t len);
The specification is equivalent to the following, though more concise:
typedef struct _TT {
UINT32 fst;
UINT32 snd;
UINT8 tag;
} TT;
entrypoint
typedef struct _I(EVERPARSE_COPY_BUFFER_T dest) {
TT *ttptr probe ProbeAndCopy(length=sizeof(TT), destination=dest);
} I;
Coercing Pointer Types¶
One can also add a probe to a pointer with an explicit pointer size, so long as
one also provides a callback to convert a value from that explicit pointer size
to UINT64
, as in the example below:
extern PURE UINT64 UlongToPtr(UINT32 ptr)
entrypoint
typedef struct _CoercePtr(EVERPARSE_COPY_BUFFER_T dest) {
UINT32 Bound;
T(Bound) *pointer(UINT32) ptr probe ProbeAndCopy(length=sizeof(T), destination=dest);
} CoercePtr;
Here, we first define a callback UlongToPtr
to convert a UINT32
to a
UINT64
.
In Probe_ExternalAPI.h
, this extern declaration produces the following C
signature:
extern uint64_t UlongToPtr(uint32_t ptr);
Then, in CoercePtr
, we can qualify our pointer type to 32-bits: the 3d
compiler will automatically insert the UlongToPtr
coercion on the 32-bit
pointer value before called ProbeAndCopy
with the coerced pointer value.
Note
If you declare more than one coercion to coerce between a pair of types, 3d will likely complain with the following error:
./Probe.3d:(132,33): (Error) Multiple extern coercions found for UINT32 -> UINT64: Probe.UlongToPtr, Probe.UlongToPtr2
This is because 3d applies coercions implicitly, and if multiple coercions are found between a pair of types, it cannot choose which coercion to apply.
Multiple Probe Callbacks¶
Sometimes, it can be useful to have several probe callbacks, e.g., some of them may copy, while for others it might be safe to validate the data in place.
The example below shows how to use multiple probes:
extern probe ProbeAndCopyAlt
entrypoint probe ProbeAndCopy(length=sizeof(this))
entrypoint probe ProbeAndCopyAlt(length=sizeof(this))
typedef struct _MultiProbe(EVERPARSE_COPY_BUFFER_T destT1, EVERPARSE_COPY_BUFFER_T destT2) {
UINT32 fst;
UINT32 snd;
UINT8 tag;
T(17) *tptr1 probe ProbeAndCopy(length=sizeof(T), destination=destT1);
T(42) *tptr2 probe ProbeAndCopyAlt(length=sizeof(T), destination=destT2);
} MultiProbe;
- The extern declaration of
ProbeAndCopyAlt
produces a second extern declaration inProbe_ExternalAPI.h
for the client code to provide and link with. - One can associate multiple entrypoint probe attributes on a type, each with a different probe and copy function.
- One can also associate different probes on each field of a type.
The resulting C interface contains multiple entry points, one for each variant of probing entry point, and one for the non-probing variant:
BOOLEAN ProbeProbeAndCopyCheckMultiProbe(
EVERPARSE_COPY_BUFFER_T destT1,
EVERPARSE_COPY_BUFFER_T destT2,
EVERPARSE_COPY_BUFFER_T probeDest,
uint64_t probeAddr);
BOOLEAN ProbeProbeAndCopyAltCheckMultiProbe(
EVERPARSE_COPY_BUFFER_T destT1,
EVERPARSE_COPY_BUFFER_T destT2,
EVERPARSE_COPY_BUFFER_T probeDest,
uint64_t probeAddr);
BOOLEAN ProbeCheckMultiProbe(
EVERPARSE_COPY_BUFFER_T destT1,
EVERPARSE_COPY_BUFFER_T destT2,
uint8_t *base,
uint32_t len);
Note
External declarations for probe callbacks and for pointer coercions are in scope for the entire file, since 3d can call these functions implicitly when elaborating a specification. So, unless you are an expert and have a particular need for multiple probe callbacks, it is possible that declaring multiple probe callbacks can result in errors such as the one below, especially when using multiple probe callbacks in conjuction with specialization.
./Specialize1.3d:(10,30): (Error) Found multiple probe functions: Specialize1.ProbeAndCopyAlt, Specialize1.ProbeAndCopy
Nullable Pointers¶
By default, all probed pointer fields are expected to be non-null. If a pointer value happens to be null, then either
- The
ProbeAndCopy
function can return false, in which case validation fails - Or, the
ProbeAndCopy
function can return true, in which the generated code would proceed to try to validate the contents of the destination buffer, which will likely fail.
If a pointer in a data structure is allowed to be null, then one can mark it as
such with a nullable qualifier, pointer?
as shown below.
entrypoint
typedef struct _MaybeT(EVERPARSE_COPY_BUFFER_T dest) {
UINT32 Bound;
T(Bound) *pointer? ptr probe ProbeAndCopy(length=sizeof(T), destination=dest);
} MaybeT;
For a pointer with a nullable qualifier, the generated code first checks if the pointer is non-null:
- If the pointer is null, validation succeeds without calling the probe function
- If the pointer is non-null, the probe function is called, and validation proceeds as in the non-null case.
An End-to-end Executable Example¶
A small but fully worked out example is available in the EverParse repository.
It shows the use of multiple probe functions, linked with callbacks implemented in C, as well as a main C driver program that validates several example inputs containing pointers.
Specialization for Different Pointer Sizes¶
Consider writing a specification to handle messages that could be sent from both 32- and 64-bit machines, particularly if those messages contain pointers. This scenario happens in practice, e.g., when a 64-bit OS kernel shares memory with user-mode processes that may be either native 64-bit processes or emulated 32-bit processes.
3d supports a form of compile-time specialization that allows one to write a specification with 64-bit clients in mind, and then have the compiler specialize the 64-bit specification also for use with a 32-bit clients. There are many subtle elements to consider, and we describe them gradually, starting with a simple first example.
A First Example¶
As in the previous section, we have a format with two levels of indirection:
R64
with a pointer to S64
, and S64
with a pointer to T
.
extern probe ProbeAndCopy
extern probe (INIT) ProbeInit
aligned
typedef struct _T(UINT32 bound) {
UINT32 t1;
UINT32 t2 { t2 <= bound };
} T;
aligned
typedef struct _S64(UINT32 bound, EVERPARSE_COPY_BUFFER_T dest) {
UINT32 s1 { s1 <= bound };
T(s1) *ptrT probe ProbeAndCopy(length=sizeof(T), destination=dest);
UINT32 s2;
} S64;
aligned
typedef struct _R64(EVERPARSE_COPY_BUFFER_T destS, EVERPARSE_COPY_BUFFER_T destT) {
UINT32 r1;
S64(r1, destT) *ptrS probe ProbeAndCopy(length=sizeof(S64), destination=destS);
} R64;
A First Manual Attempt¶
If we wanted to specify a variant of R32
with a 32-bit pointer to a S32
which in turn had a 32-bit pointer to a T
, we could explicitly rewrite our
entire specification, as shown below.
aligned
typedef struct _S32_Attempt(UINT32 bound, EVERPARSE_COPY_BUFFER_T dest) {
UINT32 f { f <= bound };
T(f) *pointer(UINT32) ptrT probe ProbeAndCopy(length=sizeof(T), destination=dest);
UINT32 g;
} S32Attempt;
aligned
typedef struct _R32_Attempt(EVERPARSE_COPY_BUFFER_T destS, EVERPARSE_COPY_BUFFER_T destT) {
UINT32 f;
S32Attempt(f, destT) *pointer(UINT32) ptrS probe ProbeAndCopy(length=sizeof(S32Attempt), destination=destS);
} R32_Attempt;
entrypoint
typedef struct _RMux(Bool requestor32, EVERPARSE_COPY_BUFFER_T destS, EVERPARSE_COPY_BUFFER_T destT) {
switch (requestor32) {
case true: R32_Attempt(destS, destT) r32;
default: R64(destS, destT) r64;
} field;
} RMux;
At the top-level of the specification, one could then define RMux
, a
multiplexing layer, which depending on the value of requestor32
, validates
either an R32_Attempt
or an R64
.
This looks plausible, even though it is verbose and leads to a lot of duplication. However, even aside from the verbosity this revised specification has a deeper problem.
Consider the case where requestor32=true
: if the probe on ptrS
runs
successfully, it will have copied sizeof(S32Attempt)=12
bytes, and then
checked that the bytes copied into destS
is a valid S32Attempt
. If after
this validation, a caller wants to, say, read the value of the s2
field,
then they would need to read 4
bytes at offset 8
from destS
buffer.
On the other hand, when requestor32=false
: if the probe on ptrS
runs
successfully, it will have copied sizeof(S64)=24
bytes (including the 4
bytes of padding between s1
and ptrT
and then 4 bytes of padding after
the field s2
), and then checked that the bytes copied into destS
is a
valid S64
. If after this validation, a caller wants to read the value of the
s2
field, then they would need to read 4
bytes at offset 16
from
destS
buffer.
That is, even after reading and validating the input, the caller has to
distinguish the cases of requestor32
. We would prefer instead to have a way
to handle either 32- or 64-bit inputs, but after validation, we would like the
contents of the destination buffers to always be in 64-bit form, for easy
manipulation by native 64-bit kernel code, without needing to bifurcate further
handling of 32- and 64-bit inputs.
A Second Manual Attempt¶
Here’s another attempt at specializing R64
:
typedef struct _R32Manual(UINT32 bound, EVERPARSE_COPY_BUFFER_T destS, EVERPARSE_COPY_BUFFER_T destT) {
UINT32 r1 { r1 <= bound };
S64(r1, destT) *pointer(UINT32) ptrS probe
(length=sizeof(S64), destination=destS) {
ProbeAndCopy(4); //copy field s1
SkipWrite(4); //4 bytes padding
var ptrT = ProbeAndReadU32(); //read ptrT
var ptrT64 = UlongToPtr(ptrT); //write (uint64_t)ptrT
WriteU64(ptrT64); //write ptrT64
ProbeAndCopy(4); //copy field s2
SkipWrite(4); //padding
};
} R32Manual;
This time, we have an even more verbose specification, but we’ll see that it actually achieves what we want. This specification is legal 3d, and it uses an explicit, low-level form of probing functions that we do not typically expect users to write. But, it’s a useful intermediate language to explain things.
The first field, r1
, is as before.
The second field is an explicitly qualified pointer, qualified to
pointer(UINT32)
and with probe block associated with it. Our goal is to
coerce probe the input pointer ptrS
and read its contents while coercing it
into a 64-bit layout, and then to validate that the copied bytes is a valid
S64(r1, destT)
.
The probe block will first call ProbeInit
, initializing the destS
buffer, preparing it to receive sizeof(S64)
bytes. Then, within the probe
block, it executes a sequence of actions:
- Copy the first 4 bytes references by
ptrS
–this is thes1
field - Skip 4 bytes of alignment padding
- Then read a 32 bit pointer
ptrT
, coerce it to 64-butes, and write it into the destination buffer - Then copy the next 4 bytes from the input buffer (field
s2
) - Finally, skip 4 bytes of padding at the end
After the probe block executed, we validate the EverParseStreamOf(destS)
to
contain a valid S64(r1, destT)
, which in turn will probe ptrT
etc.
This does what we want, in the sense that if validation succeeds, then destS
contains a 64-bit representation of the input, regardless of requestor32
,
and the caller can then proceed uniformly, without needing to bifurcate its
handling of 32- and 64-bit clients.
However, writing low-level coercions like this is impractical and error prone.
Automated Specialization¶
Instead, 3d offers a specialize
directive, to automatically rewrite a tree
of definitions rooted at a given type, replacing each occurrence of an
unqualified pointer type with an pointer(UINT32)
.
specialize (pointer(*), pointer(UINT32)) R64 R32;
This instructs 3d to automatically generate the definition for R32
from
R64
, in the style of R32Manual
. In doing so, 3d will also try to
specialize the probe function on the nested ptrT
in the S64
, but in
doing so it will discover that there is nothing to specialize in T
and the
behavior of probing ptrT
will be unchanged.
In order to use the specialize directive, we need to define a few additional callbacks.
specialize (pointer(*), pointer(UINT32)) R64 R32;
The first callback, a UlongToPtr
coercions, we saw before: we’ll need it to coerce a 32-bit pointer to 64-bits. It
produces the following C signature:
extern uint64_t UlongToPtr(uint32_t ptr);
The next callback, ProbeAndReadU32
produces the following C signature:
extern uint32_t ProbeAndReadU32(
BOOLEAN *failed,
uint64_t read_offset,
uint64_t src,
EVERPARSE_COPY_BUFFER_T dest);
It probes a pointer src
at a given read_offset
, checks its validity, and
reads 4 bytes from that offset and returns it as a uint32_t
. If the validity
check fails, it must set its out parameter failed
to true. It typically does
not use its last parameter dest
, though this can be used by the caller to
provide useful contextual information.
Finally, the callback WriteU64
produces the following C signature:
extern BOOLEAN WriteU64(
uint64_t value,
uint64_t write_offset,
EVERPARSE_COPY_BUFFER_T destination);
It allows writing a single uint64_t
value
at a given write_offset
into a destination buffer EVERPARSE_COPY_BUFFER_T
. If the write fails, e.g.,
because write_offset
is out of bounds, then it must return false
,
otherwise true
.
With R32
now automatically defined, we can easily define our multiplexing
layer, as shown below.
entrypoint
typedef struct R(Bool requestor32, EVERPARSE_COPY_BUFFER_T destS, EVERPARSE_COPY_BUFFER_T destT) {
switch (requestor32) {
case true: R32(destS, destT) r32;
default: R64(destS, destT) r64;
} field;
} R;
First Example in its Entirety¶
We’ve gone through several iterations to arrive at our first example of specialization. Here is the final specification in its entirety:
extern probe ProbeAndCopy
extern probe (INIT) ProbeInit
extern PURE UINT64 UlongToPtr(UINT32 ptr)
extern probe (READ UINT32) ProbeAndReadU32
extern probe (WRITE UINT64) WriteU64
aligned
typedef struct _T(UINT32 bound) {
UINT32 t1;
UINT32 t2 { t2 <= bound };
} T;
aligned
typedef struct _S64(UINT32 bound, EVERPARSE_COPY_BUFFER_T dest) {
UINT32 s1 { s1 <= bound };
T(s1) *ptrT probe ProbeAndCopy(length=sizeof(T), destination=dest);
UINT32 s2;
} S64;
aligned
typedef struct _R64(EVERPARSE_COPY_BUFFER_T destS, EVERPARSE_COPY_BUFFER_T destT) {
UINT32 r1;
S64(r1, destT) *ptrS probe ProbeAndCopy(length=sizeof(S64), destination=destS);
} R64;
specialize (pointer(*), pointer(UINT32)) R64 R32;
entrypoint
typedef struct R(Bool requestor32, EVERPARSE_COPY_BUFFER_T destS, EVERPARSE_COPY_BUFFER_T destT) {
switch (requestor32) {
case true: R32(destS, destT) r32;
default: R64(destS, destT) r64;
} field;
} R;
What is Proven About Specialization¶
Safety
As with all code produced by 3d, we prove that the generated code is:
- memory safe
- arithmetically safe
- has no undefined behaviors
- is double-fetch free
For specifications with probes, these guarantees are, of course, conditional on
the behavior of the extern callbacks. In particular, the callbacks implemnted in
C must themselves be memory safe, e.g., ProbeAndCopy(n, rd, wr, src, dest)
must safely check that src + rd
points to n
bytes of valid memory and
that safely copies those n
bytes into dest
at offset wr
is safe.
Soundness
We also prove that if validation succeeds, then the destination buffers contain
valid representations of their specified types. For instance, in the example
above, we prove that destS
contains a valid representation of an S64(r1,
destT)
and that destT
contains a valid representation of T(s1)
.
Completeness
For non-specialized specifications, we usually prove a completeness property,
namely that if the input contains a well-formatted representation of a type
T
, then the generated validator is guaranteed to accept that input.
Our proof for specialization does not cover this property: in particular, formally, we have not yet proven that a well-formatted 32-bit input will always be correctly coerced to a 64-bit layout and then accepted by the validator.
More concretely, an ideal result would be that an input correctly formatted
according to R32Attempt
will always be accepted by the type R32
computed
as a specialization of R64
. However, this equivalence is not yet covered by
our proofs.
We are working to enrich our proofs to cover this property.
Static assertions
Computing coercions between different layouts of types based on architecture-specific details of compilers is delicate. Rather than trusting outright 3d’s implementation of the layout of structures including alignment padding for 32- and 64-bit layouts, 3d emits static assertions to check that the layout it computes corresponds to the layout for the corresponding structures computed by whatever C compiler one uses to compile the code.
For our example above, 3d automatically generates a refining
block and emits
the following C code:
#include <stddef.h>
#include <stdint.h>
typedef uint8_t UINT8;
typedef uint16_t UINT16;
typedef uint32_t UINT32;
typedef uint64_t UINT64;
typedef struct _T
{
UINT32 t1;
UINT32 t2;
} T;
typedef struct ___specialized32__T
{
UINT32 t1;
UINT32 t2;
} ___specialized32_T;
typedef struct _S64
{
UINT32 s1;
UINT64 ptrT;
UINT32 s2;
} S64;
typedef struct ___specialized32__S64
{
UINT32 s1;
UINT32 ptrT;
UINT32 s2;
} ___specialized32_S64;
typedef struct _R64
{
UINT32 r1;
UINT64 ptrS;
} R64;
typedef struct ___specialized_R32
{
UINT32 r1;
UINT32 ptrS;
} R32;
#define EVERPARSE_STATIC_ASSERT(e) typedef char __EVERPARSE_STATIC_ASSERT__[(e)?1:-1];
EVERPARSE_STATIC_ASSERT(sizeof(T) == 8);
EVERPARSE_STATIC_ASSERT(offsetof(T, t1) == 0);
EVERPARSE_STATIC_ASSERT(offsetof(T, t2) == 4);
EVERPARSE_STATIC_ASSERT(sizeof(___specialized32_T) == 8);
EVERPARSE_STATIC_ASSERT(offsetof(___specialized32_T, t1) == 0);
EVERPARSE_STATIC_ASSERT(offsetof(___specialized32_T, t2) == 4);
EVERPARSE_STATIC_ASSERT(sizeof(S64) == 24);
EVERPARSE_STATIC_ASSERT(offsetof(S64, s1) == 0);
EVERPARSE_STATIC_ASSERT(offsetof(S64, ptrT) == 8);
EVERPARSE_STATIC_ASSERT(offsetof(S64, s2) == 16);
EVERPARSE_STATIC_ASSERT(sizeof(___specialized32_S64) == 12);
EVERPARSE_STATIC_ASSERT(offsetof(___specialized32_S64, s1) == 0);
EVERPARSE_STATIC_ASSERT(offsetof(___specialized32_S64, ptrT) == 4);
EVERPARSE_STATIC_ASSERT(offsetof(___specialized32_S64, s2) == 8);
EVERPARSE_STATIC_ASSERT(sizeof(R64) == 16);
EVERPARSE_STATIC_ASSERT(offsetof(R64, r1) == 0);
EVERPARSE_STATIC_ASSERT(offsetof(R64, ptrS) == 8);
EVERPARSE_STATIC_ASSERT(sizeof(R32) == 8);
EVERPARSE_STATIC_ASSERT(offsetof(R32, r1) == 0);
EVERPARSE_STATIC_ASSERT(offsetof(R32, ptrS) == 4);
An End-to-end Executable Example¶
A small but fully worked out example of specialization is available in the EverParse repository.
It shows an example similar to the one developed above, but linked with a main C program and test driver. It also illustrates the use of nullable pointers in conjunct with probing and specialization.
Limitations on Variable-length Structures¶
Automated specialization has only limited support for variable-length structures. The main restriction is that a coercion between types cannot depend on the data being coerced. We illustrate with a couple of examples:
Consider the following canonical tag-length-value encoding:
typedef struct _UNION(UINT8 tag) {
switch (tag) {
case 0:
UINT8 case0;
case 1:
UINT16 case1;
default:
UINT32 other;
} field;
} UNION;
typedef struct _TLV
{
UINT8 tag;
UINT32 length;
UNION(tag) payload[:byte-size length];
} TLV;
Now, let’s say one wanted to prove a pointer to a TLV
, one could attempt
this:
typedef struct _WRAPPER(EVERPARSE_COPY_BUFFER_T Output)
{
TLV *tlv probe ProbeAndCopy(length=???, destination=Output);
} WRAPPER;
But, this type is not expressible: when writing a probe, one needs to provide a
length
, bounding the amount of data to be copied into the Output
buffer.
But, in this case, there is no length to provide.
We could try another approach by expecting the context to bound the length in advance:
typedef struct _UNION(UINT8 tag) {
switch (tag) {
case 0:
UINT8 case0;
case 1:
UINT16 case1;
default:
UINT32 other;
} field;
} UNION;
typedef struct _TLV(UINT32 Len)
{
UINT8 tag;
UINT32 length { length == Len };
UNION(tag) payload[:byte-size length];
} TLV;
typedef struct _WRAPPER(UINT32 Len, EVERPARSE_COPY_BUFFER_T Output)
where (Len > 5)
{
TLV(Len - 5) *tlv probe ProbeAndCopy(length=Len, destination=Output);
} WRAPPER;
This works: if the caller can bound the entire size of the pointed to data and
pass it to WRAPPER
as a parameter Len
, then we can use that as a bound.
However, if now we try to specialize WRAPPER
as follows:
specialize (pointer(*), pointer(UINT32)) WRAPPER WRAPPER_32;
We get the following error:
./SpecializeDep1.3d:(22,11): (Error) Cannot coerce a type with data-dependency;
field payload of type SpecializeDep1.UNION(tag) may depend on the field `tag`
That is, in order to coerce a UNION(tag)
from a 32- to 64-bit
representation, in genneral, a coercion would have to read the tag
field of
TLV
, then branch on it, and then coerce each case of UNION
accordingly.
As mentioned earlier, coercions cannot depend on the data being coerced—so 3d
rejects this specification.
Note, in this case, one could argue that UNION(tag)
has the same
representation in 32- and 64-bits, so 3d could accept the specification:
however, 3d does not yet recognize such special cases.
One could play the same game again, and try to get the caller to provide the
tag
as a parameter (although as we do this, increasingly, the caller has to
be able to predict the cases of the data). The listing below works:
extern probe ProbeAndCopy
extern probe (READ UINT32) ProbeAndReadU32
extern probe (WRITE UINT64) WriteU64
extern probe (INIT) ProbeInit
extern PURE UINT64 UlongToPtr(UINT32 ptr)
typedef struct _UNION(UINT8 tag) {
switch (tag) {
case 0:
UINT8 case0;
case 1:
UINT16 case1;
default:
UINT32 other;
} field;
} UNION;
typedef struct _TLV(UINT8 Expected, UINT16 Len)
{
UINT8 tag { tag == Expected };
UINT32 length { length == Len };
UNION(Expected) payload[:byte-size Len];
} TLV;
typedef struct _WRAPPER(UINT8 Expected, UINT16 Len, EVERPARSE_COPY_BUFFER_T Output)
where (Len > 5)
{
TLV(Expected, Len - 5) *tlv
probe ProbeAndCopy(length=Len, destination=Output);
} WRAPPER;
specialize (pointer(*), pointer(UINT32)) WRAPPER WRAPPER_32;
entrypoint
casetype _ENTRY(
Bool Requestor32,
UINT8 Expected,
UINT16 Len,
EVERPARSE_COPY_BUFFER_T Output)
{
switch (Requestor32)
{
case true:
WRAPPER_32(Expected, Len, Output) w32;
case false:
WRAPPER(Expected, Len, Output) w64;
}
} ENTRY;
However, we remark on a few points:
First, this specification is far from ideal, since it requires the caller to predict both the
tag
andlength
of the TLV message.Second, 3d’s determination of data dependence is a syntactic criterion: notice how we have changed the type of the payload field to only mention the parameters in scope, rather than the fields.
UNION(Expected) payload[:byte-size Len];
Note, one does not always need the calling context to pass in arguments like
Len
or Expected
: these just need to be values in scope at the point at
which the probe is used. For instance, the following would work too, for a
pointer to a variable length array, each of whose elements is a UNION(tag)
:
typedef struct _UNION_ARRAY(UINT8 tag, UINT32 len)
{
UNION(tag) payload[:byte-size len];
} UNION_ARRAY;
typedef struct _TLV_ALT(EVERPARSE_COPY_BUFFER_T Output)
{
UINT8 tag;
UINT32 length;
UNION_ARRAY(tag, length) *ptr
probe ProbeAndCopy(length=length, destination=Output);
} TLV_ALT;
specialize (pointer(*), pointer(UINT32)) TLV_ALT TLV_ALT_32;
Data Dependency for Well-formedness¶
There is another form of data dependency that is also not supported: dependence on data constraints that ensure well-formedness of specifications. The following variant of our previous example illustrates.
typedef struct _TLV(UINT8 Expected, UINT16 Len)
{
UINT8 tag { tag == Expected };
UINT32 length { Len > 5 && length == Len - 5 };
UNION(Expected) payload[:byte-size (Len - 5)];
} TLV;
typedef struct _WRAPPER(UINT8 Expected, UINT16 Len, EVERPARSE_COPY_BUFFER_T Output)
{
TLV(Expected, Len) *tlv
probe ProbeAndCopy(length=Len, destination=Output);
} WRAPPER;
In this variant, rather than constrain Len > 5
in the Wrapper
, we add a
constraint on the length
field enforcing Len > 5
, and then using Len -
5
for the length of payload
—the constraint ensures that the subtraction
Len - 5
does not underflow.
However, if we try to specialize WRAPPER
to 32 bits:
specialize (pointer(*), pointer(UINT32)) WRAPPER WRAPPER_32;
We get the following verification error:
* Error 19 at out/SpecializeDep1Fail.fst(195,2-205,63):
- Cannot verify u16 subtraction
- The SMT solver could not prove the query. Use --query_stats for more
details.
- Also see: SpecializeDep1Fail.3d(22,40-22,40)
3d accepts the specification and then translates it to F* for well-formedness
checking: but F* rejects the specification saying it cannot prove that the
subtraction Len - 5
does not underflow. This is because, again, the coercion
of TLV
from 32- to 64-bits is not data dependent, and as such, the
constraint on the length
field is not enforced by the coercion, and F*
rightfully rejects the subtraction as unsafe.
And End-to-end Example with Variable-length Structures¶
Although data dependency is forbidden in coercions, there are many cases where variable-length structures fit well with 3d’s support for auto-specialization.
A small but fully worked out example of specialization with variable-length structures is available in the EverParse repository, including a main file driving the generated code with test input.
Other forms of Specialization¶
Today, 3d only supports specialized 64-bit pointer types to 32-bit pointers. In the future, we envision adding support for other forms of specialization, e.g., automatically specializing little-endian to big-endian types.
Comments¶
The user can insert comments in their .3d
file, some of which will
be inserted into the .c
file:
There are three kinds of comments:
- Block comments are delimited by
/*
and*/
and do not nest. These are never propagated to the C code. - Line comments begin with
//
and are propagated to C heuristically close to the translated source construct. - Each top-level declaration can be preceded by a type declaration
comment delimited by
/*++
and--*/
: These are propagated to the C code preceding the C procedure corresponding to the validator of the source type.
Adding copyright notices to produced .c/.h files¶
If, along with some MyFile.3d
, in the same directory, you provide
a file MyFile.3d.copyright.txt
that contains syntactically correct
C comments (with //
or /* ... */
), then EverParse will prepend
MyFile.c
, MyFileWrapper.c
and their corresponding .h
with
these comments. You do not need to mention this file on the command
line.
These comments can contain the following special symbols:
EVERPARSEVERSION
, which EverParse will automatically replace with its version number;FILENAME
, which EverParse will automatically replace with the name of the.c
/.h
file being generated.EVERPARSEHASHES
, which EverParse will automatically replace with a hash of the contents of the corresponding .3d file for the purpose of--check_hashes inplace
or--check_inplace_hash
(see Hash checking).
Modular structure and files¶
A 3d specification is described in a collection of modules, each
stored in a file with a .3d
extension. The name of a module is
derived from its filename, i.e., the file A.3d
defines a module
named A
. A module can Derived
(in Derived.3d) can refer to
another module Base
(in Base.3d) by its name, allowing definitions
in Derived
to reuse the definitions that are exported in Base
.
For example, in module Base
we could define the following types:
export
typedef UINT32 ULONG;
export
typedef struct _Pair {
ULONG first;
ULONG second;
} Pair;
typedef struct _Mine {
UINT8 f;
UINT8 g;
} Mine;
Note, the export
qualifier indicate that these definitions may be
referenced from another module. Types that are not exproted (like
Mine
) are not visible from another module.
In Derived
we can use the type from Base
by referring to it
using a fully qualified name of the form <MODULE NAME>.<IDENTIFIER>
.
entrypoint
typedef struct _Triple {
Base::Pair pair;
Base::ULONG third;
} Triple;
3d also allows defining module abbreviations. For example, using
module B = Base
we introduce a shorter name for the module
Base
for use within the current module.
module B = Base
entrypoint
typedef struct _Quad {
B::Pair _12;
B::Pair _34;
} Quad;
A commented example is available in the EverParse repository.
Error handling¶
When a validator fails, EverParse supports invoking a user-provided callback with contextual information about the failure.
An error handling callback is a C procedure with the following signature:
typedef void (*ErrorHandler)(
const char *TypeName,
const char *FieldName,
const char *ErrorReason,
uint64_t ErrorCode,
uint8_t *Context,
uint32_t Length,
uint8_t *Base,
uint64_t StartPosition,
uint64_t EndPosition
);
Every EverParse validator is parameterized by:
- A function pointer, of type
ErrorHandler
- A context parameter,
uint8_t* Context
At the top-level, when calling into EverParse from an application, one
can instantiate both the ErrorHandler
with a function of one’s
choosing and the Context
argument with a pointer to some
application-specific context.
The ErrorHandler
expects
- The
Base
andContext
pointers to refer to live and disjoint pieces of memory.- For
Length
to be the length in bytes of valid memory pointed to byBase
and forStartPosition <= EndPosition <= Length
.
The ErrorHandler
can
- Read from all the pointers
- May only modify the memory reference by
Context
.
When validating a field f
in a type T
, in case the validator
fails, EverParse calls the user-provided ErrorHandler
, passing in
the following arguments:
- The
TypeName
argument is the name of the typeT
- The
FieldName
argument is name of the fieldf
- The
ErrorReason
andErrorCode
arguments are related and can be one of the following pairs:
- “generic error”,
EVERPARSE_ERROR_GENERIC
(1uL)- “not enough data”,
EVERPARSE_ERROR_NOT_ENOUGH_DATA
(2uL)- “impossible”,
EVERPARSE_ERROR_IMPOSSIBLE
(3uL)- “list size not multiple of element size”,
EVERPARSE_ERROR_LIST_SIZE_NOT_MULTIPLE
(4uL)- “action failed”,
EVERPARSE_ERROR_ACTION_FAILED
(5uL)- “constraint failed”,
EVERPARSE_ERROR_CONSTRAINT_FAILED
(6uL)- “unexpected padding”,
EVERPARSE_ERROR_UNEXPECTED_PADDING
(7uL)- “unspecified”, with the
ErrorCode > 7uL
- The
Context
argument is the user-providedContext
pointer- The
Length
argument is the length in bytes of the input buffer- The
Base
argument is a pointer to the base of the input buffer- The
StartPosition
argument is the offset fromBase
of the start of the fieldf
- The
EndPosition
argument is the offset fromBase
of the end of the fieldf
at which the validation failure occurred.
Following a validation failure at a given field, EverParse will invoke
the ErrorHandler
at each enclosing type as well. This allows a
caller to reconstruct a stack trace of a failing validation.
EverParse generates a default error handler that records just the deepest validation failure that occurred.
Fully worked examples: TCP Segment Headers¶
The classic IETF RFC 793 from 1981 introduces the TCP protocol, including the format of the header of TCP segments. The format has been extended slightly since then, to accommodate new options and flags—this Wikipedia page provides a good summary.
Reproduced below is an ASCII depiction of the format of TCP headers. In this section, we show how to specify this format in 3d. The full specification can be found here.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Source Port | Destination Port |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Sequence Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Acknowledgment Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Data | |N|C|E|U|A|P|R|S|F| |
| Offset| Rese|S|W|C|R|C|S|S|Y|I| Window |
| | rved| |R|E|G|K|H|T|N|N| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Checksum | Urgent Pointer |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Options | Padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Each -
in the diagram represents a single bit, with fioelds
separated by vertical bars |
.
The main subtle element of the specification is the handling of the
DataOffset
field. It is a 4-bit value representing an offset from
the beginning of the segment, in 32-bit increments, of the start of
the data
field. The Options
and Padding
fields are
optional, so the DataOffset
field is used to encode the size of
the Options
field. As such, DataOffset
is always at least
5
.
The Options
field itself is an array of tagged unions,
representing various kinds of options. Padding and the end of the
options array ensures that the data
fields is always begins at a
multiple of 32-bits from the start of the segment.
Additionally, semantic constraints restrict the values of the fields
depending on the values of some other fields. For example, the
Acknowledgement number can only be non-zero when the ACK
bit is
set.
To specify the type of a TCP header, we begin by defining some basic types.
typedef UINT16 PORT;
typedef UINT32 SEQ_NUMBER;
typedef UINT32 ACK_NUMBER;
typedef UINT16 WINDOW;
typedef UINT16 CHECK_SUM;
typedef UINT16 URGENT_PTR;
The source and destination port each occupy 16 bits and are
represented by a UINT16
; the sequence and acknowledgment number
fields are UINT32
etc.
Next, we define the various kind of options that are allowed. Every option begins with an option kind tag, an 8-bit value. Depending on the option kind, a variable number of bits of an option value can follow. The permitted option kinds are:
#define OPTION_KIND_END_OF_OPTION_LIST 0x00
#define OPTION_KIND_NO_OPERATION 0x01
#define OPTION_KIND_MAX_SEG_SIZE 0x02
#define OPTION_KIND_WINDOW_SCALE 0x03
#define OPTION_KIND_SACK_PERMITTED 0x04
#define OPTION_KIND_SACK 0x05
#define OPTION_KIND_TIMESTAMP 0x08
An OPTION
is parameterized by a boolean, MaxSegSizeAllowed
,
which constraints when the OPTION_KIND_MAX_SEG_SIZE
is allowed to
be present—it turns out, the SYN
bit in the header must be set
for this option to be allowed. The general shape of an OPTION
is
as below.
typedef struct _OPTION(Bool MaxSegSizeAllowed)
{
UINT8 OptionKind;
OPTION_PAYLOAD(OptionKind, MaxSegSizeAllowed) OptionPayload;
} OPTION;
We have an OptionKind
field followed by an OptionPayload
that
depends on the OptionKind
and the MaxSegSizeAllowed
flag.
Next, to define the OPTION_PAYLOAD
type, we use a casetype
, as
shown below.
casetype _OPTION_PAYLOAD(UINT8 OptionKind, Bool MaxSegSizeAllowed)
{
switch(OptionKind)
{
case OPTION_KIND_END_OF_OPTION_LIST:
unit EndOfList;
case OPTION_KIND_NO_OPERATION:
unit Noop;
case OPTION_KIND_MAX_SEG_SIZE:
MAX_SEG_SIZE_PAYLOAD(MaxSegSizeAllowed) MaxSegSizePayload;
case OPTION_KIND_WINDOW_SCALE:
WINDOW_SCALE_PAYLOAD WindowScalePayload;
case OPTION_KIND_SACK_PERMITTED:
UINT8 SackPermittedPayload;
case OPTION_KIND_SACK:
SELECTIVE_ACK_PAYLOAD SelectiveAckPayload;
case OPTION_KIND_TIMESTAMP:
TIMESTAMP_PAYLOAD TimestampPayload;
}
} OPTION_PAYLOAD;
In the first two cases of OptionKind
, no payload is expected. The
unit
type in 3d is an empty type—it consumes no space in the
message format.
For OPTION_KIND_MAX_SEG_SIZE
, we have the following payload—the
use of the where` constraint ensures that this case is present only
when `MaxSegSizeAllowed == true
. The payload is a length field (4
bytes) and a 2-byte MaxSegSize
value.
typedef struct _MAX_SEG_SIZE_PAYLOAD(Bool MaxSegSizeAllowed)
where MaxSegSizeAllowed
{
UINT8 Length
{
Length == 4
};
UINT16 MaxSegSize;
} MAX_SEG_SIZE_PAYLOAD;
The other cases are relatively straightforward, where
SELECTIVE_ACK_PAYLOAD
and TIMESTAMP_PAYLOAD
illustrate the use
of variable length arrays.
typedef struct _WINDOW_SCALE_PAYLOAD
{
UINT8 Length
{
Length == 3
};
UINT8 WindowScale;
} WINDOW_SCALE_PAYLOAD;
typedef struct _SELECTIVE_ACK_PAYLOAD
{
UINT8 Length
{
Length == 10 ||
Length == 18 ||
Length == 26 ||
Length == 34
};
UINT8 SelectiveAck[:byte-size (Length - 2)];
} SELECTIVE_ACK_PAYLOAD;
typedef struct _TIMESTAMP_PAYLOAD
{
UINT8 Length
{
Length == 10
};
UINT8 TimeStamp[:byte-size (Length - 2)];
} TIMESTAMP_PAYLOAD;
Finally, we can assemble our top-level TCP header type, as shown below. The specification of the options array is weaker than it could be. It currently permits an end-of-options-list option to appear anywhere in the Options list, rather than as just the last element. This can be improved by using a more advanced combinator from EverParse, however we leave it as is for simplicity of this example.
/*++
The top-level type of a TCP Header
Arguments:
* UINT32 SegmentLength, the size of the segment,
including both header and data, passed in by the caller
--*/
entrypoint
typedef struct _TCP_HEADER(UINT32 SegmentLength)
{
PORT SourcePort;
PORT DestinationPort;
SEQ_NUMBER SeqNumber;
ACK_NUMBER AckNumber;
UINT16 DataOffset:4
{ //DataOffset is in units of 32 bit words
sizeof(this) <= DataOffset * 4 && //DataOffset points after the static fields
DataOffset * 4 <= SegmentLength //and within the current segment
};
UINT16 Reserved:3
{
Reserved == 0 //Reserved bytes are unused
};
UINT16 NS:1;
UINT16 CWR:1;
UINT16 ECE:1;
UINT16 URG:1;
UINT16 ACK:1
{
AckNumber == 0 ||
ACK == 1 //AckNumber can only be set if the ACK flag is set
} ;
UINT16 PSH:1;
UINT16 RST:1;
UINT16 SYN:1;
UINT16 FIN:1;
WINDOW Window;
CHECK_SUM CheckSum;
URGENT_PTR UrgentPointer
{
UrgentPointer == 0 ||
URG == 1 //UrgentPointer can only be set if the URG flag is set
};
//The SYN=1 condition indicates when MAX_SEG_SIZE option can be received
//This is an array of options consuming
OPTION(SYN==1) Options[:byte-size (DataOffset * 4) - sizeof(this)];
UINT8 Data[SegmentLength - (DataOffset * 4)];
} TCP_HEADER;
The type is parameterized by SegmentLength
, the size in bytes
determined by the caller of the entire segment, including the header
and the data.
The first four fields, SourcePort
, DestinationPort
,
SeqNumber
and AckNumber
are straightforward.
The DataOffset
field is 4-bit value constrained to point beyond
the static fields of the header. Here sizeof(this)
is a 3d
compile-time constant referring to the size of the non-variable length
prefix of the current type, i.e., the sum of the length in bytes of
all the fields up to the Options
field. In this case, that number
is 20
. DataOffset
is also constrained to reference an offset
within the curent segment.
Next, we have 3 reserved bits, following by 1 bit each for the 9
flags. The Ack
flag is interesting, since its constraints states
that the AckNumber
can be non-zero only if the Ack
bit is set.
Then, we have a Window
and CheckSum
, both of which are
straightforward. Note, we do not specify the CheckSum
as part of
the format—that’s up to an application-specific check to
confirm. Alternatively, one could check this using a user-provided
action callback, though this is not yet supported.
The UrgentPointer
field is similar to AckNumber
in that it can
only be non-zero when the URG
flag is set.
Then, we have an Options
array, using the condition SYN==1
to
determine the MaxSegSizeAllowed
condition. The size in bytes of
the options array is variable and includes also the padding field, to
ensure 32-bit alignment. Note, this type is little too permissive, as
it will permit options arrays where the end-of-list option kind is not
necessarily only the last element.
Finally, we have the data field itself, whose byte size is bytes is the computed expression.
Generated code¶
Running the EverParse toolchain on the TCP segment header
specification produces a C procedure with the following signature, in TCPWrapper.h
:
BOOLEAN TcpCheckTcpHeader(uint32_t ___SegmentLength, uint8_t *base, uint32_t len);
This procedure is a validator for the TCP_HEADER
type. The caller
passes in three parameters:
__SegmentLength
, representing theSegmentLength
argument of theTCPHeader
type in the 3d specification.base
: a pointer to an array of byteslen
: a lower bound on the length of that array thatbase
points to.
TcpCheckTcpHeader
returns TRUE
if, and only if, the contents
of base
represent a valid TCPHeader
, while enjoying all the
guarantees of memory safety, arithmetic safety, double-fetch freedom,
not modifying any of the caller’s memory, not allocating any heap
data, and being provably functional correct.
Error handling¶
TCPWrapper.h
includes a default error handler to report the leaf
validation failure. It also expects a client module to supply an
implementation of
void TcpEverParseError(
const char *TypeName,
const char *FieldName,
const char *Reason,
uint64_t ErrorCode)
This can be instantiated with a procedure to, say, log an error.
Alternatively, the default error handler in TCPWrapper.h
can be
replaced by a custom error handler of your choosing.
Fully worked examples: ELF files¶
ELF (Executable and Linkable Format) is a common, standard file format for various kinds of binary files (object files, executables, shared libraries, and core dumps). The file format is described as C-structures in the elf.h file.
In this section we develop (parts of) a 3d specification for 64-bits ELF files and describe how it can be integrated in existing projects for validating potentially untrusted ELF files. A complete ELF specification can be found in the 3d test suite.
An ELF file consists of an ELF header, followed by a program header table and a section header table. Both the tables are optional and describe the rest of the ELF file. The ELF header specifies the offsets and the number of entries in each of the tables. One interesting aspect of validating ELF files is then to check that both the tables contain the specified number of entries and point to the valid parts of the rest of the ELF file.
The ELF header starts with a 16 byte array. The first four bytes of the array are fixed: 0x7f, followed by ‘E’, ‘L’, and ‘F’. Other bytes of the array specify the binary architecture (32-bits or 64-bits), endianness, ELF specification version, target OS, and ABI version of the file. The last 7 bytes of the array are padding bytes set to 0. To be able to constrain the individual bytes of this array, we specify in 3d as a struct.
typedef struct _E_IDENT
{
UCHAR ZERO { ZERO == 0x7f };
UCHAR ONE { ONE == 0x45 };
UCHAR TWO { TWO == 0x4c };
UCHAR THREE { THREE == 0x46 };
//This 3d spec applies to 64-bit only currently
ELFCLASS FOUR { FOUR == ELFCLASS64 };
ELFDATA FIVE;
//ELF specification version is always set to 1
UCHAR SIX { SIX == 1 };
ELFOSABI SEVEN;
//ABI version, always set to 0
ZeroByte EIGHT;
//padding, remaining 7 bytes are 0
ZeroByte NINE_FIFTEEN[E_IDENT_PADDING_SIZE];
} E_IDENT;
(The omitted definitions can be found in the full development.)
Following this 16 byte array, the ELF header specifies the file type,
file version, followed by fields of our interest: E_PHOFF
,
E_SHOFF
(offsets of the two tables), and E_PHNUM
, E_SHNUM
(number of entries in the two tables).
// ELF HEADER BEGIN
E_IDENT IDENT;
ELF_TYPE E_TYPE { E_TYPE != ET_NONE };
UINT16 E_MACHINE;
UINT32 E_VERSION { E_VERSION == 1 };
ADDRESS E_ENTRY;
//Program header table offset
OFFSET E_PHOFF;
//Section header table offset
OFFSET E_SHOFF;
UINT32 E_FLAGS;
UINT16 E_EHSIZE { E_EHSIZE == sizeof (this) };
UINT16 E_PHENTSIZE;
//Number of program header table entries
UINT16 E_PHNUM
{ (E_PHNUM == 0 && E_PHOFF == 0) || //no Program Header table
(0 < E_PHNUM && E_PHNUM < PN_XNUM &&
sizeof (this) == E_PHOFF && //Program Header table starts immediately after the ELF Header
E_PHENTSIZE == sizeof (PROGRAM_HEADER_TABLE_ENTRY)) };
UINT16 E_SHENTSIZE;
//Number of section header table entries
UINT16 E_SHNUM
{ (E_SHNUM == 0 && E_SHOFF == 0) || // no Section Header table
(0 < E_SHNUM && E_SHNUM < SHN_LORESERVE &&
E_SHENTSIZE == sizeof (SECTION_HEADER_TABLE_ENTRY)) };
//Section header table index of the section names table
UINT16 E_SHSTRNDX
{ (E_SHNUM == 0 && E_SHSTRNDX == SHN_UNDEF) ||
(0 < E_SHNUM && E_SHSTRNDX < E_SHNUM) };
// ELF HEADER END
The constraint on E_PHNUM
enforces that either the file has no
program header table (E_PHNUM == 0 && E_PHOFF == 0
) or the table
has non-zero number of entries and it starts immediately after the ELF
header (sizeof (this)
for the encapsulating struct type refers to
the ELF header shown here). We add similar constraints to E_SHNUM
but do not add any check for E_SHOFF
, since unlike the
program header table, the section header table does not have a fixed
offset.
The ELF header is followed by the two optional tables. We specify
these optional tables using casetype
. First, the program header
table:
casetype _PROGRAM_HEADER_TABLE_OPT (UINT16 PhNum,
OFFSET ElfFileSize)
{
switch (PhNum)
{
case 0:
unit Empty;
default:
PROGRAM_HEADER_TABLE_ENTRY(ElfFileSize) Tbl[:byte-size sizeof (PROGRAM_HEADER_TABLE_ENTRY) * PhNum]
}
} PROGRAM_HEADER_TABLE_OPT;
The type PROGRAM_HEADER_TABLE_OPT
is parameterized by
the number of program header table entries, as specified in the ELF
header, and the size of the ELF file; the latter allows us to check
that the segments pointed to by the program header table entries are in
the file range.
In case PhNum
is 0, the type is the empty unit
type. Otherwise, it is an PROGRAM_HEADER_TABLE_ENTRY
array of
size sizeof (PROGRAM_HEADER_TABLE_ENTRY) * PhNum
bytes where the type
PROGRAM_HEADER_TABLE_ENTRY
describes a segment:
typedef struct _PROGRAM_HEADER_TABLE_ENTRY (OFFSET ElfFileSize)
{
UINT32 P_TYPE;
UINT32 P_FLAGS { P_FLAGS <= 7 };
OFFSET P_OFFSET;
ADDRESS P_VADDR;
ADDRESS P_PADDR;
//The constraint checks that the segment is in the file range
UINT64 P_FILESZ { P_FILESZ < ElfFileSize &&
P_OFFSET <= ElfFileSize - P_FILESZ };
UINT64 P_MEMSZ;
UINT64 P_ALIGN;
} PROGRAM_HEADER_TABLE_ENTRY;
The specification of the section header table is also a casetype
:
casetype _SECTION_HEADER_TABLE_OPT (OFFSET PhTableEnd,
OFFSET ShOff,
UINT16 ShNum,
OFFSET ElfFileSize)
{
switch (ShNum)
{
case 0:
NO_SECTION_HEADER_TABLE(PhTableEnd, ElfFileSize) NoTbl;
default:
SECTION_HEADER_TABLE(PhTableEnd, ShOff, ShNum, ElfFileSize) Tbl;
}
} SECTION_HEADER_TABLE_OPT;
We parameterize this type by the offset where the program header table ends, the section header table offset, number of section header table entries, and the size of the file.
When number of entries is 0, the file does not have a section header
table, but we still need to check that the file contains enough bytes
after the program header table so that its total size is
ElfFileSize
. NO_SECTION_HEADER_TABLE
specifies such a type:
typedef struct _NO_SECTION_HEADER_TABLE (OFFSET PhTableEnd,
UINT64 ElfFileSize)
where (PhTableEnd <= ElfFileSize && ElfFileSize - PhTableEnd <= MAX_UINT32)
{
UINT8 Rest[:byte-size (UINT32) (ElfFileSize - PhTableEnd)];
} NO_SECTION_HEADER_TABLE;
The checks in the where
clause ensure safety of the arithmetic
operations.
In case the section header table is non-empty, we specify (a) the bytes between the end of the program header table and the beginning of the section header table, (b) the section header table, and (c) final check that end of the section header table is the end of the file.
typedef struct _SECTION_HEADER_TABLE (OFFSET PhTableEnd,
UINT64 ShOff,
UINT16 ShNum,
OFFSET ElfFileSize)
where (PhTableEnd <= ShOff && ShOff - PhTableEnd <= MAX_UINT32)
{
UINT8 PHTABLE_SHTABLE_GAP[(UINT32) (ShOff - PhTableEnd)];
SECTION_HEADER_TABLE_ENTRY(ShNum, ElfFileSize) SHTABLE[:byte-size sizeof (SECTION_HEADER_TABLE_ENTRY) * ShNum];
// Check that we have consumed all the bytes in the file
unit EndOfFile
{:on-success var x = field_pos; return (x == ElfFileSize); };
} SECTION_HEADER_TABLE;
The section header table, similar to the program header table, is an array of entries.
Finally, the top-level ELF format:
entrypoint
typedef struct _ELF (UINT64 ElfFileSize)
{
... //ELF header as above
//Optional Program Header table
PROGRAM_HEADER_TABLE_OPT (E_PHNUM,
ElfFileSize) PH_TABLE;
//Optional Section Header Table
//Recall that the first argument is the end of the program header table
SECTION_HEADER_TABLE_OPT ((E_PHNUM == 0) ? E_EHSIZE : E_PHOFF + (sizeof (PROGRAM_HEADER_TABLE_ENTRY) * E_PHNUM),
E_SHOFF,
E_SHNUM,
ElfFileSize) SH_TABLE;
} ELF;
Integrating ELF validator in existing code¶
To compile the 3d specification to C code, download the latest
EverParse release and compile
the 3d spec with the EverParse binary, e.g. for Windows:
everparse.cmd --batch ELF.3d
. This command generates
ELFWrapper.h
and ELFWrapper.c
files, with the top-level
validation function as follows:
BOOLEAN ElfCheckElf(uint64_t ___ElfFileSize, uint8_t *base, uint32_t len);
The actual validator implementation is generated in ELF.c
. To
integrate these validators into existing C code, drop in these
generated .c
and .h
files
in the development and invoke `ElfCheckElf
as necessary.