Format Ossification— 6 min
Before going into the details of my recent discovery, lets define the term Ossification, as likely a lot of people have never heard that word before.
Imagine you have an extensible format or protocol. As example, we can take a list of elements of different type. The list is extensible. It can have an arbitrary number of elements, and over time the different types can also be extended.
This is great. But what happens if you never use this extensibility? Lets say that, for years and years, your list has always had exactly one element, and that element has always been of a very specific type.
Well, users of that format or protocol will start relying on that very fact, and will assert this assumption in code, or worse, in hardware.
So your format is extensible in theory, but you can never extend it in practice because tools have come to rely on a very specific size and order.
That is called Ossification and is sadly a reality, especially in network protocols.
And as I found out recently, it is also a thing for the COFF/PE file format,
the format of Windows
My journey starts with a Sentry Customer Issue. We got a report about a processing error that complained about an invalid "image type", whatever that means. (Image here is a loaded library/executable)
The image in question indeed was missing its
type field, but it did have other
fields that are normal for images in the sentry protocol. The event also made
it clear that it was coming from Windows.
With that information I was looking at the code in the
sentry-native SDK that
collected these images, and indeed found some early-returns that would leave
an image entry without a
type. I fixed the issue by
reordering the code so
we still get a
type even though we can’t find a CodeView record for the image.
A while later while investigating how to link from a C# stack trace to the corresponding portable PDB, I stumbled across the PEReader.ReadDebugDirectory method.
This method returned an Array of
DebugDirectoryEntry, whereas the code from
sentry-native I was looking at just two weeks earlier was reading a single
Fast forward to today, where I am again investigating a customer issue related
.dll that does not seem to have a valid
debug_id (which comes from the
CodeView record mentioned above).
It took some time until the things I have seen clicked in my brain. What if our tools make wrong assumptions about the shape of a PE file and its Debug Directory Entries? What if for years all the PE files always had a single Debug Directory Entry that happened to be the CodeView record? What if suddenly some new compiler version is generating PE files that have more than one Debug Directory Entry, and the CodeView record is not the first one anymore?
Well, classic case of Ossification. Things are extensible in theory, but since that extensibility was never practiced for years, all the tools developed around this format came to expect things that are not true anymore.
# How did this happen?
Well, the simple answer is that the available documentation around all this is quite lacking to put it mildly.
The main documentation for
IMAGE_DATA_DIRECTORY mentions a
Size that is described as:
The size of the table, in bytes.
Okay, yeah, great. There is no documentation or example of what to do with this.
It is not at all obvious this is supposed to be the number of bytes of an array,
and that the resulting array has
total_size / sizeof(IMAGE_DEBUG_DIRECTORY)
The documentation for
is also quite outdated. The docs online describe the
Type field up to number
winnt.h header has defines up to number
20, without any description
If you happen to stumble upon the specification of the .NET/C# extension to PE/COFF, that document does indeed say this is an array:
This directory consists of an array of debug directory entries whose location and size are indicated in the image optional header.
Hooray, big success! The doc also describes some of the
Types missing from
winnt.h header and the other documentation.
It also has a description for the CodeView record itself, which is lacking from
the other Windows docs and from the
In particular, this
RSDS (PDB 7.0) CodeView format is being read by a huge
number of tools, but I can’t find any official documentation anywhere.
This .NET extension linked above is the closest I could find.
MINIDUMP_MODULE documentation also mentions a CodeView record,
but it is also missing a description of how to interpret it.
So to summarize, the PE format has very incomplete or outright missing documentation. And the tools dealing with it are probably cargo-culting wrong assumptions from one implementation to the next.
# What now?
Well, we figured out that a PE file can have multiple Debug Directory entries, and either one of them can be the CodeView record we are looking for.
Time to see which tool got this right, and fix the ones that got it wrong.
Here are PRs for
To my surprise,
actually got this right.
To my surprise because I was also looking at a customer minidump created by
crashpad that was missing CodeView records for some of the minidump modules.
(Yes, the loaded executable code is called
image in PE and Sentry terminology,
whereas minidumps call them
modules. Confused yet?)
Looking at the customer
.dll again, it became clear that it did have a
Debug Directory entry, but it wasn’t a CodeView one. Maybe if it had one, it
would indeed be the first? Even if, the point here is to not make any assumptions
So in the end I was chasing a ghost all along. But at least I learned a ton in the process, and de-ossified a bunch of tools along the way.
The specific customer issue boils down to "fix your build system", and that is the end of the story.