The Magic of zerocopy

(compared with scroll)

6 August 2022 — 10 min

If you want to parse binary formats in Rust, you have a few crates to chose from apart from rolling your own.

Some popular contenders are zerocopy and scroll.

I would like to take this chance to explain the difference between the two, which one you likely want to use in which situation, and why zerocopy truely is magical.

However, neither is perfect, there is some papercuts and ideas for improvement that I will explain in the end as well.

# What does zero-copy mean?

To start off, we assume that are dealing with a byte slice, &[u8]. We can either read a complete file from disk, or rather just mmap into our address space. The topic of incremental / streaming parsing that works with network streams is something else entirely that I do not want to touch now.

So our complete binary file content is available as a &'data [u8], and we want to parse it into its logical format. As much as possible, we want to refer to data inside that buffer directly rather than copying things out.

scroll has partial support, as it allows to parse a &'data str which points directly to the original buffer without allocating and copying a new String.

Parsing other data-types however, scroll tends to rather copy the contents out of the buffer when parsing. Whereas zerocopy will give you a &'data T by default.

Both approaches have advantages and disadvantages that we will look at.

# Examples

Lets look at a small example of how to use both crates. In both cases we want to write, and then read, a simple nested struct to/from a buffer:

use scroll::{IOwrite, Pread, Pwrite, SizeWith};
use zerocopy::{AsBytes, FromBytes, LayoutVerified};

#[repr(C)]
#[derive(Copy, Clone, Debug, PartialEq, AsBytes, FromBytes, Pread, Pwrite, IOwrite, SizeWith)]
struct MyNestedPodStruct {
    a: u32,
    b: u16,
    _pad: u16,
}

#[repr(C)]
#[derive(Copy, Clone, Debug, PartialEq, AsBytes, FromBytes, Pread, Pwrite, IOwrite, SizeWith)]
struct MyPodStruct {
    nested: MyNestedPodStruct,
    c: u64,
}

This is already a mouthful. My structs are #[repr(C)] so that I have full control over their memory layout. zerocopy also has the additional requirement that one has to be explicit about padding, in between members, or at the end. A #[reps(packed)] annotation would avoid the need for that, at a cost that we will discuss soon.

zerocopy requires us to derive the AsBytes and FromBytes traits, that as their name makes clear allows us to read a struct from raw bytes, or interpret it as raw bytes that we can write.

scroll on the other hand has a bunch of traits that we can derive. Pread, Pwrite to read and write respectively, IOWrite to be able to write a struct to a std::io::Write stream, and SizeWith for structs that have a fixed size that does not depend on any Context, more on that later.

Lets define a few instances of our struct we want to write and then read back:

let structs: &[MyPodStruct] = &[
    MyPodStruct {
        nested: MyNestedPodStruct {
            a: 1,
            b: 1,
            _pad: 0,
        },
        c: 1,
    },
    MyPodStruct {
        nested: MyNestedPodStruct {
            a: 2,
            b: 2,
            _pad: 0,
        },
        c: 2,
    },
];

zerocopy lets us turn this whole slice into a &[u8] which we can then copy around or write as we see fit:

let mut buf = Vec::new();
buf.write_all(structs.as_bytes())?;

For reading, there is a wide range of options. You can read/cast an exactly-sized buffer, a prefix or a suffix. You can specifically chose to read unaligned. All these different methods are implemented as constructors of the LayoutVerified struct, which can then be turned into a reference or a slice.

Here is an example of how to get a single struct or the whole slice from our buffer:

let lv = LayoutVerified::<_, [MyPodStruct]>::new_slice(buf).unwrap();
let parsed_slice = parsed_slice.into_slice();
assert_eq!(structs, parsed_slice);

let (lv, _rest) = LayoutVerified::<_, MyPodStruct>::new_from_prefix(buf).unwrap();
let parsed_one = lv.into_ref();
assert_eq!(&structs[0], parsed_one);

One thing to note here is that we have to provide explicit type annotations, as for some reason the compiler is not able to infer it automatically.

As far as I know, scroll on the other hand does not allow to directly write either a slice, or a reference. That is the reason why I derived Copy and by extension Clone for our structs above. Please reach out to me and prove me wrong here.

We thus write owned copies one by one here:

let mut buf = Vec::new();
for s in structs {
    buf.iowrite(*s)?;
}

Parsing also does not work for a whole slice as far as I know (please prove me wrong), and as scroll is not zero-copy, we have to collect parsed structs into a Vec manually:

let offset = &mut 0;
let mut parsed = Vec::new();
while *offset < buf.len() {
    parsed.push(buf.gread::<MyPodStruct>(offset).unwrap());
}

assert_eq!(structs, parsed);

# Why this matters

When parsing binary files, we want that parsing to be as fast as possible, we also want to allocate / copy as little memory as possible.

The zerocopy crate truely is zero-copy. It does a fixed number of pointer arithmetic (essentially an alignment check, and bounds check) to verify the layout of our buffer. I guess that’s why its main type is called LayoutVerified.

scroll on the other hand parses each struct (in fact, each member) one by one and copied them into a Vec that needs to be allocated. It is thus a lot more expensive. You don’t necessarily need to parse and collect everything. You can parse structs on demand. And if your structs have a fixed size (we derived SizeWith), you can skip ahead in the source buffer to do some random access.

# Endianness and other Context

Which approach is better is, as always, a matter of tradeoffs. zerocopy is the better choice if all your raw data structures have a fixed size and endianness. scroll is the better choice if your data structures have a variable size, or you want to parse files with dynamic endianness. scroll calls this Context, and there is a complete example how to create a custom parser that is aware of both endianness and the size of certain fields.

While scroll supports endianness aware parsing based on a runtime context, zerocopy is very different here. It supports types that are byteorder aware, but their byteorder is fixed at compile time. A zerocopy U64<LE> is statically typed, and its get method is optimized at compile to only read LE data.

# Making zero-copy context-aware

With that zerocopy limitation, I thought is was a fun exercise to make it somehow handle formats of all kinds of endianness and field sizes at runtime.

As example, I will choose the ELF header, which has differently sized fields for 32-bit and 64-bit variants, as well as different endianness. The header is also self-describing as it has two flags for bit-width and endianness, which can be read without knowing either as it is just a bunch of bytes. It looks like this:

#[repr(C)]
#[derive(FromBytes)]
pub struct ElfIdent {
    /// ELF Magic, must be `b"\x7fELF"`
    e_mag: [u8; 4],
    /// Field size flag: 1 = 32-bit variant, 2 = 64-bit.
    e_class: u8,
    /// Endianness flag: 1 = LE, 2 = BE.
    e_data: u8,

    e_version: u8,
    e_abi: u8,
    e_abiversion: u8,
    e_pad: [u8; 7],
}

Next, we can define different structures for each of the variants. As you might have guessed, this leads to combinatorial explosion as we have to define four different variants. Here are two to keep things simple:

use zerocopy::byteorder::*;

#[repr(C, align(8))]
#[derive(FromBytes)]
pub struct ElfHeader_L64 {
    e_ident: ElfIdent,
    e_type: U16<LE>,
    e_machine: U16<LE>,
    e_version: U32<LE>,
    e_entry: U64<LE>,
    e_phoff: U64<LE>,
    e_shoff: U64<LE>,
    e_flags: U32<LE>,
    e_ehsize: U16<LE>,
    e_phentsize: U16<LE>,
    e_phnum: U16<LE>,
    e_shentsize: U16<LE>,
    e_shnum: U16<LE>,
    e_shstrndx: U16<LE>,
}

#[repr(C, align(4))]
#[derive(FromBytes)]
pub struct ElfHeader_B32 {
    e_ident: ElfIdent,
    e_type: U16<BE>,
    e_machine: U16<BE>,
    e_version: U32<BE>,
    e_entry: U32<BE>,
    e_phoff: U32<BE>,
    e_shoff: U32<BE>,
    e_flags: U32<BE>,
    e_ehsize: U16<BE>,
    e_phentsize: U16<BE>,
    e_phnum: U16<BE>,
    e_shentsize: U16<BE>,
    e_shnum: U16<BE>,
    e_shstrndx: U16<BE>,
}

#[test]
fn test_struct_layout() {
    use core::mem;

    assert_eq!(mem::align_of::<ElfHeader_L64>(), 8);
    assert_eq!(mem::size_of::<ElfHeader_L64>(), 64);

    assert_eq!(mem::align_of::<ElfHeader_B32>(), 4);
    assert_eq!(mem::size_of::<ElfHeader_B32>(), 52);
}

Implementing this example, I was a bit surprised that I had to specify the alignment of my structures manually. Turns out the zerocopy::U64 and similar types are unaligned. Which means reading from them needs to use the appropriate instructions that do unaligned loads which might be a bit slower, but I guess this is a wash in the grand scheme of things.

A recommendation here would be to write tests that explicitly check the size and alignment of your structs. Very helpful. I wouldn’t have caught this issue otherwise.

With these two variants defined, we can then create a context-aware wrapper around that, which choses the variant at runtime depending on its input:

pub enum ElfHeader<'data> {
    L64(&'data ElfHeader_L64),
    B32(&'data ElfHeader_B32),
    // TODO: L32, B64
}

impl<'data> ElfHeader<'data> {
    pub fn parse(buf: &'data [u8]) -> Option<(Self, &'data [u8])> {
        let (e_ident, _rest) = LayoutVerified::<_, ElfIdent>::new_from_prefix(buf)?;
        if e_ident.e_mag != *b"\x7fELF" {
            return None;
        }

        match e_ident.e_class {
            // 32-bit
            1 => {
                match e_ident.e_data {
                    // LE
                    1 => todo!(),
                    // BE
                    2 => {
                        let (e_header, rest) =
                            LayoutVerified::<_, ElfHeader_B32>::new_from_prefix(buf)?;
                        Some((Self::B32(e_header.into_ref()), rest))
                    }
                    _ => None,
                }
            }

            // 64-bit
            2 => {
                match e_ident.e_data {
                    // LE
                    1 => {
                        let (e_header, rest) =
                            LayoutVerified::<_, ElfHeader_L64>::new_from_prefix(buf)?;
                        Some((Self::L64(e_header.into_ref()), rest))
                    }
                    // BE
                    2 => todo!(),
                    _ => None,
                }
            }
            _ => None,
        }
    }

    pub fn e_shoff(&self) -> u64 {
        match self {
            ElfHeader::L64(header) => header.e_shoff.get(),
            ElfHeader::B32(header) => header.e_shoff.get() as u64,
        }
    }
}

This wrapper can have accessors that check at runtime which variant the underlying data has and do the appropriate access in a typesafe manner. That wrapper is also lightweight and zero-copy. It only has the enum discriminant, plus a pointer to the same underlying data in all cases. So it is essentially a tagged pointer.

# API Papercuts

Well there you have it. A detailed explanation of zerocopy and scroll, the difference between the two, and how zerocopy can be extremely lightweight as it only validates the correct size and alignment of things without doing any parsing at all. It only "parses" things when you start to access that data.

Both of these have their pros and cons, both have different use cases and strength. zerocopy is better if you have fixed size structs and don’t need to care about endianness, although it is possible to make that work with some effort as shown above.

scroll makes these use cases trivial, at the cost of parsing everything ahead of time and copying things out into agnostic structs.

Unfortunately though, both these libraries are a bit hard to work with, and their APIs could use some streamlining.

The API surface of scroll is huge, as it supports a ton of features. But the main APIs that you interact with are very unintuitive and confusing. There is pread and gread. Whats the difference between the two? I honestly can’t tell you without looking at the docs, which I have to do constantly as I simply can’t remember that myself.

There are quite some papercuts with zerocopy as well. First of, why is LayoutVerified a concrete type to begin with? I can’t think of a good use-case for which you want to actually keep that type around. You rather want to immediately turn it into_ref or into_slice. Free functions would serve that use-case a lot better.

The API is also extremely repetitive, as we have seen in this example:

let mystructs = LayoutVerified::<_, [MyPodStruct]>::new_slice(buf)?.into_slice();

I repeat the slice three times in this line. new_slice and into_slice, plus the fact that these two functions only exist if the type parameter itself is a slice. Can I rather have a single free function instead of this?

Usage of the Unaligned APIs is a bit confusing, and I was surprised to see that the endian-aware types such as U64 are unaligned as well.

The usage of custom derive for FromBytes is interesting as it validates safe usage. But it also means that it is impossible to derive if you have some foreign types such as Uuid which are #[repr(C)] but do not implement FromBytes themselves. uuid btw has unstable support for zerocopy, but it requires passing in custom RUSTFLAGS which is inconvenient.

# Variable-size and Compressed data

A use-case that I have not explored in this post, which is rather trivial to handle in scroll but close to impossible in zerocopy is truly variable sized data, such as structures that embed length-prefixed or nul-terminated strings inline. Those are the devil.

When picking tradeoffs, you can either have something that as simple and fast. Or something that is compact and small. A compact format is almost certainly variable sized, which means you can’t use zerocopy patterns, and you lose the ability of random access. A very clear example of this is delta compression, where you have to parse things in order.

# Watch this space

As a matter of fact, most debug formats use some clever tricks to represent source information very compactly. I want to explore some of these formats in a lot more detail in the future. More specifically, watch out for future blog posts about:

DWARF line programs
PDB line programs
Portable PDB sequence points
SourceMap VLQ mappings