The magic of AsRef
— 2 minBoth at work, and also personally, I do think about efficient parsers and data formats a lot. Some time ago, I also wrote an article about writing a custom binary format and associated parser. That exercise started something like this:
#[repr(C)]
struct Header {
version: u32,
num_a: u32,
num_b: u32,
}
pub struct Format<'data> {
buf: &'data [u8],
header: &'data Header,
}
impl<'data> Format<'data> {
pub fn parse(buf: &'data [u8]) -> Self {
// TODO:
// * actually verify the version
// * ensure the buffer is actually valid
Format {
buf,
header: unsafe { &*(buf.as_ptr() as *const Header) },
}
}
}
While this works perfectly fine, and the Format
is truly zero-copy, it does
have one major drawback. It has the lifetime parameter 'data
, and is thus not
'static
. I can’t capture it by an async move
closure and tokio::spawn
it.
Also for reasons that I must admit I don’t fully understand, trait objects also
always carry an explicit 'static
bound on them. Well, although now thinking
about this again, is becomes a bit more obvious to me. If I want to package
up a callback function into a struct of mine that does not carry a lifetime
itself, I have to use a Box<dyn Fn() + 'static>
or equivalent container.
Either way, for various reasons, we want to have fully “self-owned” types that
are 'static
, and our example Format
above is not self-contained.
There are a couple of different approaches to this, but what I have found as
the go-to solution which offers the most flexibility to API users might be to
use AsRef<T>
, and in our specific case AsRef<[u8]>
, so lets try to use that.
Without further ado, here is the finished demo code, along with tests that
ensure things work as intended, and that our final Format
is indeed 'static
.
We can use any kind of underlying buffer type, no matter if its an array, a Vec
,
a Cow
or a memory mapped file, as long as it implements AsRef<[u8]>
.
use core::{mem, ptr};
#[repr(C)]
#[derive(Clone, Copy)]
struct Header {
version: u32,
num_a: u32,
num_b: u32,
}
pub struct Format<Buf> {
buf: Buf,
header: Header,
}
#[repr(C)]
#[derive(Debug, PartialEq, Eq)]
pub struct A(u32);
#[repr(C)]
#[derive(Debug, PartialEq, Eq)]
pub struct B(u32);
impl<Buf: AsRef<[u8]>> Format<Buf> {
pub fn parse(buf: Buf) -> Self {
// TODO:
// * actually verify the version
// * ensure the buffer is actually valid
let header = unsafe { *(buf.as_ref().as_ptr() as *const Header) };
Format { buf, header }
}
pub fn into_inner(self) -> Buf {
self.buf
}
pub fn get_as(&self) -> &[A] {
let a_start =
unsafe { self.buf.as_ref().as_ptr().add(mem::size_of::<Header>()) as *const A };
let a_slice = ptr::slice_from_raw_parts(a_start, self.header.num_a as usize);
unsafe { &*a_slice }
}
pub fn get_bs(&self) -> &[B] {
let b_start = unsafe {
self.buf
.as_ref()
.as_ptr()
.add(mem::size_of::<Header>())
.add(mem::size_of::<A>() * self.header.num_a as usize) as *const B
};
let b_slice = ptr::slice_from_raw_parts(b_start, self.header.num_b as usize);
unsafe { &*b_slice }
}
}
#[test]
fn format_works() {
use std::borrow::Cow;
fn is_static<T: 'static>(_: &T) {}
let array_buf: [u8; 24] = [
// there are all little-endian:
1, 0, 0, 0, // version
1, 0, 0, 0, // num_a
2, 0, 0, 0, // num_b
3, 0, 0, 0, // a[0]
4, 0, 0, 0, // b[0]
5, 0, 0, 0, // b[1]
];
let parsed: Format<[u8; 24]> = Format::parse(array_buf);
is_static(&parsed);
assert_eq!(parsed.get_as(), &[A(3)]);
assert_eq!(parsed.get_bs(), &[B(4), B(5)]);
let vec_buf = Vec::from(array_buf);
let parsed: Format<Vec<_>> = Format::parse(vec_buf);
is_static(&parsed);
assert_eq!(parsed.get_as(), &[A(3)]);
assert_eq!(parsed.get_bs(), &[B(4), B(5)]);
let vec_buf = parsed.into_inner();
let cow_buf: Cow<'static, [u8]> = Cow::Owned(vec_buf);
let parsed: Format<Cow<_>> = Format::parse(cow_buf);
is_static(&parsed);
assert_eq!(parsed.get_as(), &[A(3)]);
assert_eq!(parsed.get_bs(), &[B(4), B(5)]);
let slice_buf: &[u8] = &array_buf;
let parsed: Format<&[u8]> = Format::parse(slice_buf);
// is_static(&parsed);
// ^ this would fail with:
// error[E0597]: `array_buf` does not live long enough
// --> playground/asref/src/lib.rs:89:28
// |
// 89 | let slice_buf: &[u8] = &array_buf;
// | ^^^^^^^^^^
// | |
// | borrowed value does not live long enough
// | cast requires that `array_buf` is borrowed for `'static`
// ...
// 94 | }
// | - `array_buf` dropped here while still borrowed
assert_eq!(parsed.get_as(), &[A(3)]);
assert_eq!(parsed.get_bs(), &[B(4), B(5)]);
}
The one shortcoming that this format has though is that it is not fully zero-copy
anymore. The parse()
method does copy the header bytes. In order not to do that,
we would need to have better (and safe) ways to declare self-referencial structs.
But that is a topic for another post ;-)