The Magic of zerocopy
(compared with scroll)
— 10 minIf you want to parse binary formats in Rust, you have a few crates to chose from apart from rolling your own.
Some popular contenders are zerocopy
and scroll
.
I would like to take this chance to explain the difference between the two,
which one you likely want to use in which situation, and why zerocopy
truely
is magical.
However, neither is perfect, there is some papercuts and ideas for improvement that I will explain in the end as well.
# What does zero-copy mean?
To start off, we assume that are dealing with a byte slice, &[u8]
. We can
either read a complete file from disk, or rather just mmap
into our address
space. The topic of incremental / streaming parsing that works with network
streams is something else entirely that I do not want to touch now.
So our complete binary file content is available as a &'data [u8]
, and we want
to parse it into its logical format. As much as possible, we want to refer to
data inside that buffer directly rather than copying things out.
scroll
has partial support, as it allows to parse a &'data str
which points
directly to the original buffer without allocating and copying a new String
.
Parsing other data-types however, scroll
tends to rather copy the contents
out of the buffer when parsing. Whereas zerocopy
will give you a &'data T
by default.
Both approaches have advantages and disadvantages that we will look at.
# Examples
Lets look at a small example of how to use both crates. In both cases we want to write, and then read, a simple nested struct to/from a buffer:
use scroll::{IOwrite, Pread, Pwrite, SizeWith};
use zerocopy::{AsBytes, FromBytes, LayoutVerified};
#[repr(C)]
#[derive(Copy, Clone, Debug, PartialEq, AsBytes, FromBytes, Pread, Pwrite, IOwrite, SizeWith)]
struct MyNestedPodStruct {
a: u32,
b: u16,
_pad: u16,
}
#[repr(C)]
#[derive(Copy, Clone, Debug, PartialEq, AsBytes, FromBytes, Pread, Pwrite, IOwrite, SizeWith)]
struct MyPodStruct {
nested: MyNestedPodStruct,
c: u64,
}
This is already a mouthful. My structs are #[repr(C)]
so that I have full
control over their memory layout. zerocopy
also has the additional requirement
that one has to be explicit about padding, in between members, or at the end.
A #[reps(packed)]
annotation would avoid the need for that, at a cost that we
will discuss soon.
zerocopy
requires us to derive the AsBytes
and FromBytes
traits, that as
their name makes clear allows us to read a struct from raw bytes, or interpret
it as raw bytes that we can write.
scroll
on the other hand has a bunch of traits that we can derive. Pread
,
Pwrite
to read and write respectively, IOWrite
to be able to write a
struct to a std::io::Write
stream, and SizeWith
for structs that have a
fixed size that does not depend on any Context
, more on that later.
Lets define a few instances of our struct we want to write and then read back:
let structs: &[MyPodStruct] = &[
MyPodStruct {
nested: MyNestedPodStruct {
a: 1,
b: 1,
_pad: 0,
},
c: 1,
},
MyPodStruct {
nested: MyNestedPodStruct {
a: 2,
b: 2,
_pad: 0,
},
c: 2,
},
];
zerocopy
lets us turn this whole slice into a &[u8]
which we can then
copy around or write as we see fit:
let mut buf = Vec::new();
buf.write_all(structs.as_bytes())?;
For reading, there is a wide range of options. You can read/cast an exactly-sized
buffer, a prefix or a suffix. You can specifically chose to read unaligned
.
All these different methods are implemented as constructors of the
LayoutVerified
struct, which can then be turned into a reference or a slice.
Here is an example of how to get a single struct or the whole slice from our buffer:
let lv = LayoutVerified::<_, [MyPodStruct]>::new_slice(buf).unwrap();
let parsed_slice = parsed_slice.into_slice();
assert_eq!(structs, parsed_slice);
let (lv, _rest) = LayoutVerified::<_, MyPodStruct>::new_from_prefix(buf).unwrap();
let parsed_one = lv.into_ref();
assert_eq!(&structs[0], parsed_one);
One thing to note here is that we have to provide explicit type annotations, as for some reason the compiler is not able to infer it automatically.
As far as I know, scroll
on the other hand does not allow to directly write
either a slice, or a reference. That is the reason why I derived Copy
and by
extension Clone
for our structs above.
Please reach out to me and prove me wrong here.
We thus write owned copies one by one here:
let mut buf = Vec::new();
for s in structs {
buf.iowrite(*s)?;
}
Parsing also does not work for a whole slice as far as I know (please prove me wrong),
and as scroll
is not zero-copy, we have to collect parsed structs into a
Vec
manually:
let offset = &mut 0;
let mut parsed = Vec::new();
while *offset < buf.len() {
parsed.push(buf.gread::<MyPodStruct>(offset).unwrap());
}
assert_eq!(structs, parsed);
# Why this matters
When parsing binary files, we want that parsing to be as fast as possible, we also want to allocate / copy as little memory as possible.
The zerocopy
crate truely is zero-copy. It does a fixed number of pointer
arithmetic (essentially an alignment check, and bounds check) to verify the
layout of our buffer. I guess that’s why its main type is called LayoutVerified
.
scroll
on the other hand parses each struct (in fact, each member) one by
one and copied them into a Vec
that needs to be allocated. It is thus a lot
more expensive. You don’t necessarily need to parse and collect everything.
You can parse structs on demand. And if your structs have a fixed size
(we derived SizeWith
), you can skip ahead in the source buffer to do some
random access.
# Endianness and other Context
Which approach is better is, as always, a matter of tradeoffs.
zerocopy
is the better choice if all your raw data structures have a fixed
size and endianness. scroll
is the better choice if your data structures have
a variable size, or you want to parse files with dynamic endianness.
scroll
calls this Context
, and there is a
complete example
how to create a custom parser that is aware of both endianness and the size of
certain fields.
While scroll
supports endianness aware parsing based on a runtime context,
zerocopy
is very different here. It supports types that are byteorder aware,
but their byteorder is fixed at compile time. A zerocopy
U64<LE>
is
statically typed, and its get
method is optimized at compile to only read LE
data.
# Making zero-copy context-aware
With that zerocopy
limitation, I thought is was a fun exercise to make it
somehow handle formats of all kinds of endianness and field sizes at runtime.
As example, I will choose the ELF header, which has differently sized fields for 32-bit and 64-bit variants, as well as different endianness. The header is also self-describing as it has two flags for bit-width and endianness, which can be read without knowing either as it is just a bunch of bytes. It looks like this:
#[repr(C)]
#[derive(FromBytes)]
pub struct ElfIdent {
/// ELF Magic, must be `b"\x7fELF"`
e_mag: [u8; 4],
/// Field size flag: 1 = 32-bit variant, 2 = 64-bit.
e_class: u8,
/// Endianness flag: 1 = LE, 2 = BE.
e_data: u8,
e_version: u8,
e_abi: u8,
e_abiversion: u8,
e_pad: [u8; 7],
}
Next, we can define different structures for each of the variants. As you might have guessed, this leads to combinatorial explosion as we have to define four different variants. Here are two to keep things simple:
use zerocopy::byteorder::*;
#[repr(C, align(8))]
#[derive(FromBytes)]
pub struct ElfHeader_L64 {
e_ident: ElfIdent,
e_type: U16<LE>,
e_machine: U16<LE>,
e_version: U32<LE>,
e_entry: U64<LE>,
e_phoff: U64<LE>,
e_shoff: U64<LE>,
e_flags: U32<LE>,
e_ehsize: U16<LE>,
e_phentsize: U16<LE>,
e_phnum: U16<LE>,
e_shentsize: U16<LE>,
e_shnum: U16<LE>,
e_shstrndx: U16<LE>,
}
#[repr(C, align(4))]
#[derive(FromBytes)]
pub struct ElfHeader_B32 {
e_ident: ElfIdent,
e_type: U16<BE>,
e_machine: U16<BE>,
e_version: U32<BE>,
e_entry: U32<BE>,
e_phoff: U32<BE>,
e_shoff: U32<BE>,
e_flags: U32<BE>,
e_ehsize: U16<BE>,
e_phentsize: U16<BE>,
e_phnum: U16<BE>,
e_shentsize: U16<BE>,
e_shnum: U16<BE>,
e_shstrndx: U16<BE>,
}
#[test]
fn test_struct_layout() {
use core::mem;
assert_eq!(mem::align_of::<ElfHeader_L64>(), 8);
assert_eq!(mem::size_of::<ElfHeader_L64>(), 64);
assert_eq!(mem::align_of::<ElfHeader_B32>(), 4);
assert_eq!(mem::size_of::<ElfHeader_B32>(), 52);
}
Implementing this example, I was a bit surprised that I had to specify the
alignment of my structures manually. Turns out the zerocopy::U64
and similar
types are unaligned. Which means reading from them needs to use the appropriate
instructions that do unaligned loads which might be a bit slower, but I guess
this is a wash in the grand scheme of things.
A recommendation here would be to write tests that explicitly check the size and alignment of your structs. Very helpful. I wouldn’t have caught this issue otherwise.
With these two variants defined, we can then create a context-aware wrapper around that, which choses the variant at runtime depending on its input:
pub enum ElfHeader<'data> {
L64(&'data ElfHeader_L64),
B32(&'data ElfHeader_B32),
// TODO: L32, B64
}
impl<'data> ElfHeader<'data> {
pub fn parse(buf: &'data [u8]) -> Option<(Self, &'data [u8])> {
let (e_ident, _rest) = LayoutVerified::<_, ElfIdent>::new_from_prefix(buf)?;
if e_ident.e_mag != *b"\x7fELF" {
return None;
}
match e_ident.e_class {
// 32-bit
1 => {
match e_ident.e_data {
// LE
1 => todo!(),
// BE
2 => {
let (e_header, rest) =
LayoutVerified::<_, ElfHeader_B32>::new_from_prefix(buf)?;
Some((Self::B32(e_header.into_ref()), rest))
}
_ => None,
}
}
// 64-bit
2 => {
match e_ident.e_data {
// LE
1 => {
let (e_header, rest) =
LayoutVerified::<_, ElfHeader_L64>::new_from_prefix(buf)?;
Some((Self::L64(e_header.into_ref()), rest))
}
// BE
2 => todo!(),
_ => None,
}
}
_ => None,
}
}
pub fn e_shoff(&self) -> u64 {
match self {
ElfHeader::L64(header) => header.e_shoff.get(),
ElfHeader::B32(header) => header.e_shoff.get() as u64,
}
}
}
This wrapper can have accessors that check at runtime which variant the underlying data has and do the appropriate access in a typesafe manner. That wrapper is also lightweight and zero-copy. It only has the enum discriminant, plus a pointer to the same underlying data in all cases. So it is essentially a tagged pointer.
# API Papercuts
Well there you have it. A detailed explanation of zerocopy
and scroll
, the
difference between the two, and how zerocopy
can be extremely lightweight as
it only validates the correct size and alignment of things without doing any
parsing at all. It only "parses" things when you start to access that data.
Both of these have their pros and cons, both have different use cases and
strength. zerocopy
is better if you have fixed size structs and don’t need to
care about endianness, although it is possible to make that work with some
effort as shown above.
scroll
makes these use cases trivial, at the cost of parsing everything ahead
of time and copying things out into agnostic structs.
Unfortunately though, both these libraries are a bit hard to work with, and their APIs could use some streamlining.
The API surface of scroll
is huge, as it supports a ton of features. But
the main APIs that you interact with are very unintuitive and confusing. There
is pread
and gread
. Whats the difference between the two? I honestly can’t
tell you without looking at the docs, which I have to do constantly as I simply
can’t remember that myself.
There are quite some papercuts with zerocopy
as well. First of, why is
LayoutVerified
a concrete type to begin with? I can’t think of a good use-case
for which you want to actually keep that type around. You rather want to
immediately turn it into_ref
or into_slice
. Free functions would serve that
use-case a lot better.
The API is also extremely repetitive, as we have seen in this example:
let mystructs = LayoutVerified::<_, [MyPodStruct]>::new_slice(buf)?.into_slice();
I repeat the slice
three times in this line. new_slice
and into_slice
,
plus the fact that these two functions only exist if the type parameter itself
is a slice. Can I rather have a single free function instead of this?
Usage of the Unaligned
APIs is a bit confusing, and I was surprised to see
that the endian-aware types such as U64
are unaligned as well.
The usage of custom derive for FromBytes
is interesting as it validates safe
usage. But it also means that it is impossible to derive if you have some
foreign types such as Uuid
which are #[repr(C)]
but do not implement
FromBytes
themselves. uuid
btw has unstable support for zerocopy
, but it
requires passing in custom RUSTFLAGS
which is inconvenient.
# Variable-size and Compressed data
A use-case that I have not explored in this post, which is rather trivial
to handle in scroll
but close to impossible in zerocopy
is truly variable
sized data, such as structures that embed length-prefixed or nul-terminated
strings inline. Those are the devil.
When picking tradeoffs, you can either have something that as simple and fast. Or something that is compact and small. A compact format is almost certainly variable sized, which means you can’t use zerocopy patterns, and you lose the ability of random access. A very clear example of this is delta compression, where you have to parse things in order.
# Watch this space
As a matter of fact, most debug formats use some clever tricks to represent source information very compactly. I want to explore some of these formats in a lot more detail in the future. More specifically, watch out for future blog posts about:
- DWARF line programs
- PDB line programs
- Portable PDB sequence points
- SourceMap VLQ mappings