The R code to reproduce the results in this post is available from https://github.com/nanxstats/r-serialize-timemachine.
A mystery on serialize()
Serialization/deserialization is an important topic for exchanging data
efficiently at scale. In R, there is a native choice for this:
serialize()
/unserialize()
and their more convenient interface
saveRDS()
/readRDS()
.
Yihui once asked why the
first 14 bytes in R serialized data were skipped in digest::digest()
,
instead of the first 17 bytes for the binary format,
as the additional three filling zero-bytes are always there.
Although there is an entire section in R Internals about serialization formats, I did not find any detailed technical explanations about the bytes in the header. So I decided to collect more empirical evidence to answer the question.
An unlikely solution
My first assumption is that seeing the same data serialized in different R versions instead of different data serialized in the same R version might give us more information. This is because the non-data-encoding section in the header likely only changes when the R versions are different, which will make any minor variations more observable and thus more interpretable.
This solution then becomes a pure automation exercise. To maximize the number of R versions I can test, we need to choose the right platform.
- We should avoid compiling from source because it is almost impossible to reuse the original toolchains after so many years. Using compiled R binaries would be our best bet.
- To run all the previously compiled R binaries on a single, modern platform, we will want to choose Windows because it has probably the best ABI compatibility among the common platforms.
It eventually took ~130 lines of R code to accomplish this automation. The project is available at https://github.com/nanxstats/r-serialize-timemachine. You can click the button below to view the serialization results.
Click here to expand the table
R Version | Hex value of serialized "ABCDEF" |
---|---|
1.9.1 | 58 0a 00 00 00 02 00 01 09 01 00 01 04 00 00 00 04 10 00 00 00 01 00 00 04 09 00 00 00 06 41 42 43 44 45 46 |
2.0.0 | 58 0a 00 00 00 02 00 02 00 00 00 01 04 00 00 00 04 10 00 00 00 01 00 00 04 09 00 00 00 06 41 42 43 44 45 46 |
2.0.1 | 58 0a 00 00 00 02 00 02 00 01 00 01 04 00 00 00 04 10 00 00 00 01 00 00 04 09 00 00 00 06 41 42 43 44 45 46 |
2.1.0 | 58 0a 00 00 00 02 00 02 01 00 00 01 04 00 00 00 00 10 00 00 00 01 00 00 00 09 00 00 00 06 41 42 43 44 45 46 |
2.1.1 | 58 0a 00 00 00 02 00 02 01 01 00 01 04 00 00 00 00 10 00 00 00 01 00 00 00 09 00 00 00 06 41 42 43 44 45 46 |
2.2.0 | 58 0a 00 00 00 02 00 02 02 00 00 01 04 00 00 00 00 10 00 00 00 01 00 00 00 09 00 00 00 06 41 42 43 44 45 46 |
2.2.1 | 58 0a 00 00 00 02 00 02 02 01 00 01 04 00 00 00 00 10 00 00 00 01 00 00 00 09 00 00 00 06 41 42 43 44 45 46 |
2.3.0 | 58 0a 00 00 00 02 00 02 03 00 00 02 03 00 00 00 00 10 00 00 00 01 00 00 00 09 00 00 00 06 41 42 43 44 45 46 |
2.3.1 | 58 0a 00 00 00 02 00 02 03 01 00 02 03 00 00 00 00 10 00 00 00 01 00 00 00 09 00 00 00 06 41 42 43 44 45 46 |
2.4.0 | 58 0a 00 00 00 02 00 02 04 00 00 02 03 00 00 00 00 10 00 00 00 01 00 00 00 09 00 00 00 06 41 42 43 44 45 46 |
2.4.1 | 58 0a 00 00 00 02 00 02 04 01 00 02 03 00 00 00 00 10 00 00 00 01 00 00 00 09 00 00 00 06 41 42 43 44 45 46 |
2.5.0 | 58 0a 00 00 00 02 00 02 05 00 00 02 03 00 00 00 00 10 00 00 00 01 00 00 00 09 00 00 00 06 41 42 43 44 45 46 |
2.5.1 | 58 0a 00 00 00 02 00 02 05 01 00 02 03 00 00 00 00 10 00 00 00 01 00 00 00 09 00 00 00 06 41 42 43 44 45 46 |
2.6.0 | 58 0a 00 00 00 02 00 02 06 00 00 02 03 00 00 00 00 10 00 00 00 01 00 00 00 09 00 00 00 06 41 42 43 44 45 46 |
2.6.1 | 58 0a 00 00 00 02 00 02 06 01 00 02 03 00 00 00 00 10 00 00 00 01 00 00 00 09 00 00 00 06 41 42 43 44 45 46 |
2.6.2 | 58 0a 00 00 00 02 00 02 06 02 00 02 03 00 00 00 00 10 00 00 00 01 00 00 00 09 00 00 00 06 41 42 43 44 45 46 |
2.7.0 | 58 0a 00 00 00 02 00 02 07 00 00 02 03 00 00 00 00 10 00 00 00 01 00 00 00 09 00 00 00 06 41 42 43 44 45 46 |
2.7.1 | 58 0a 00 00 00 02 00 02 07 01 00 02 03 00 00 00 00 10 00 00 00 01 00 00 00 09 00 00 00 06 41 42 43 44 45 46 |
2.7.2 | 58 0a 00 00 00 02 00 02 07 02 00 02 03 00 00 00 00 10 00 00 00 01 00 00 00 09 00 00 00 06 41 42 43 44 45 46 |
2.8.0 | 58 0a 00 00 00 02 00 02 08 00 00 02 03 00 00 00 00 10 00 00 00 01 00 00 00 09 00 00 00 06 41 42 43 44 45 46 |
2.8.1 | 58 0a 00 00 00 02 00 02 08 01 00 02 03 00 00 00 00 10 00 00 00 01 00 00 00 09 00 00 00 06 41 42 43 44 45 46 |
2.9.0 | 58 0a 00 00 00 02 00 02 09 00 00 02 03 00 00 00 00 10 00 00 00 01 00 00 00 09 00 00 00 06 41 42 43 44 45 46 |
2.9.1 | 58 0a 00 00 00 02 00 02 09 01 00 02 03 00 00 00 00 10 00 00 00 01 00 00 00 09 00 00 00 06 41 42 43 44 45 46 |
2.9.2 | 58 0a 00 00 00 02 00 02 09 02 00 02 03 00 00 00 00 10 00 00 00 01 00 00 00 09 00 00 00 06 41 42 43 44 45 46 |
2.10.0 | 58 0a 00 00 00 02 00 02 0a 00 00 02 03 00 00 00 00 10 00 00 00 01 00 00 00 09 00 00 00 06 41 42 43 44 45 46 |
2.10.1 | 58 0a 00 00 00 02 00 02 0a 01 00 02 03 00 00 00 00 10 00 00 00 01 00 00 00 09 00 00 00 06 41 42 43 44 45 46 |
2.11.0 | 58 0a 00 00 00 02 00 02 0b 00 00 02 03 00 00 00 00 10 00 00 00 01 00 00 00 09 00 00 00 06 41 42 43 44 45 46 |
2.11.1 | 58 0a 00 00 00 02 00 02 0b 01 00 02 03 00 00 00 00 10 00 00 00 01 00 00 00 09 00 00 00 06 41 42 43 44 45 46 |
2.12.0 | 58 0a 00 00 00 02 00 02 0c 00 00 02 03 00 00 00 00 10 00 00 00 01 00 00 00 09 00 00 00 06 41 42 43 44 45 46 |
2.12.1 | 58 0a 00 00 00 02 00 02 0c 01 00 02 03 00 00 00 00 10 00 00 00 01 00 00 00 09 00 00 00 06 41 42 43 44 45 46 |
2.12.2 | 58 0a 00 00 00 02 00 02 0c 02 00 02 03 00 00 00 00 10 00 00 00 01 00 00 00 09 00 00 00 06 41 42 43 44 45 46 |
2.13.0 | 58 0a 00 00 00 02 00 02 0d 00 00 02 03 00 00 00 00 10 00 00 00 01 00 00 00 09 00 00 00 06 41 42 43 44 45 46 |
2.13.1 | 58 0a 00 00 00 02 00 02 0d 01 00 02 03 00 00 00 00 10 00 00 00 01 00 00 00 09 00 00 00 06 41 42 43 44 45 46 |
2.13.2 | 58 0a 00 00 00 02 00 02 0d 02 00 02 03 00 00 00 00 10 00 00 00 01 00 00 00 09 00 00 00 06 41 42 43 44 45 46 |
2.14.0 | 58 0a 00 00 00 02 00 02 0e 00 00 02 03 00 00 00 00 10 00 00 00 01 00 00 00 09 00 00 00 06 41 42 43 44 45 46 |
2.14.1 | 58 0a 00 00 00 02 00 02 0e 01 00 02 03 00 00 00 00 10 00 00 00 01 00 00 00 09 00 00 00 06 41 42 43 44 45 46 |
2.14.2 | 58 0a 00 00 00 02 00 02 0e 02 00 02 03 00 00 00 00 10 00 00 00 01 00 00 00 09 00 00 00 06 41 42 43 44 45 46 |
2.15.0 | 58 0a 00 00 00 02 00 02 0f 00 00 02 03 00 00 00 00 10 00 00 00 01 00 04 00 09 00 00 00 06 41 42 43 44 45 46 |
2.15.1 | 58 0a 00 00 00 02 00 02 0f 01 00 02 03 00 00 00 00 10 00 00 00 01 00 04 00 09 00 00 00 06 41 42 43 44 45 46 |
2.15.2 | 58 0a 00 00 00 02 00 02 0f 02 00 02 03 00 00 00 00 10 00 00 00 01 00 04 00 09 00 00 00 06 41 42 43 44 45 46 |
2.15.3 | 58 0a 00 00 00 02 00 02 0f 03 00 02 03 00 00 00 00 10 00 00 00 01 00 04 00 09 00 00 00 06 41 42 43 44 45 46 |
3.0.0 | 58 0a 00 00 00 02 00 03 00 00 00 02 03 00 00 00 00 10 00 00 00 01 00 04 00 09 00 00 00 06 41 42 43 44 45 46 |
3.0.1 | 58 0a 00 00 00 02 00 03 00 01 00 02 03 00 00 00 00 10 00 00 00 01 00 04 00 09 00 00 00 06 41 42 43 44 45 46 |
3.0.2 | 58 0a 00 00 00 02 00 03 00 02 00 02 03 00 00 00 00 10 00 00 00 01 00 04 00 09 00 00 00 06 41 42 43 44 45 46 |
3.0.3 | 58 0a 00 00 00 02 00 03 00 03 00 02 03 00 00 00 00 10 00 00 00 01 00 04 00 09 00 00 00 06 41 42 43 44 45 46 |
3.1.0 | 58 0a 00 00 00 02 00 03 01 00 00 02 03 00 00 00 00 10 00 00 00 01 00 04 00 09 00 00 00 06 41 42 43 44 45 46 |
3.1.1 | 58 0a 00 00 00 02 00 03 01 01 00 02 03 00 00 00 00 10 00 00 00 01 00 04 00 09 00 00 00 06 41 42 43 44 45 46 |
3.1.2 | 58 0a 00 00 00 02 00 03 01 02 00 02 03 00 00 00 00 10 00 00 00 01 00 04 00 09 00 00 00 06 41 42 43 44 45 46 |
3.1.3 | 58 0a 00 00 00 02 00 03 01 03 00 02 03 00 00 00 00 10 00 00 00 01 00 04 00 09 00 00 00 06 41 42 43 44 45 46 |
3.2.0 | 58 0a 00 00 00 02 00 03 02 00 00 02 03 00 00 00 00 10 00 00 00 01 00 04 00 09 00 00 00 06 41 42 43 44 45 46 |
3.2.1 | 58 0a 00 00 00 02 00 03 02 01 00 02 03 00 00 00 00 10 00 00 00 01 00 04 00 09 00 00 00 06 41 42 43 44 45 46 |
3.2.2 | 58 0a 00 00 00 02 00 03 02 02 00 02 03 00 00 00 00 10 00 00 00 01 00 04 00 09 00 00 00 06 41 42 43 44 45 46 |
3.2.3 | 58 0a 00 00 00 02 00 03 02 03 00 02 03 00 00 00 00 10 00 00 00 01 00 04 00 09 00 00 00 06 41 42 43 44 45 46 |
3.2.4 | 58 0a 00 00 00 02 00 03 02 04 00 02 03 00 00 00 00 10 00 00 00 01 00 04 00 09 00 00 00 06 41 42 43 44 45 46 |
3.2.5 | 58 0a 00 00 00 02 00 03 02 05 00 02 03 00 00 00 00 10 00 00 00 01 00 04 00 09 00 00 00 06 41 42 43 44 45 46 |
3.3.0 | 58 0a 00 00 00 02 00 03 03 00 00 02 03 00 00 00 00 10 00 00 00 01 00 04 00 09 00 00 00 06 41 42 43 44 45 46 |
3.3.1 | 58 0a 00 00 00 02 00 03 03 01 00 02 03 00 00 00 00 10 00 00 00 01 00 04 00 09 00 00 00 06 41 42 43 44 45 46 |
3.3.2 | 58 0a 00 00 00 02 00 03 03 02 00 02 03 00 00 00 00 10 00 00 00 01 00 04 00 09 00 00 00 06 41 42 43 44 45 46 |
3.3.3 | 58 0a 00 00 00 02 00 03 03 03 00 02 03 00 00 00 00 10 00 00 00 01 00 04 00 09 00 00 00 06 41 42 43 44 45 46 |
3.4.0 | 58 0a 00 00 00 02 00 03 04 00 00 02 03 00 00 00 00 10 00 00 00 01 00 04 00 09 00 00 00 06 41 42 43 44 45 46 |
3.4.1 | 58 0a 00 00 00 02 00 03 04 01 00 02 03 00 00 00 00 10 00 00 00 01 00 04 00 09 00 00 00 06 41 42 43 44 45 46 |
3.4.2 | 58 0a 00 00 00 02 00 03 04 02 00 02 03 00 00 00 00 10 00 00 00 01 00 04 00 09 00 00 00 06 41 42 43 44 45 46 |
3.4.3 | 58 0a 00 00 00 02 00 03 04 03 00 02 03 00 00 00 00 10 00 00 00 01 00 04 00 09 00 00 00 06 41 42 43 44 45 46 |
3.4.4 | 58 0a 00 00 00 02 00 03 04 04 00 02 03 00 00 00 00 10 00 00 00 01 00 04 00 09 00 00 00 06 41 42 43 44 45 46 |
3.5.0 | 58 0a 00 00 00 02 00 03 05 00 00 02 03 00 00 00 00 10 00 00 00 01 00 04 00 09 00 00 00 06 41 42 43 44 45 46 |
3.5.1 | 58 0a 00 00 00 02 00 03 05 01 00 02 03 00 00 00 00 10 00 00 00 01 00 04 00 09 00 00 00 06 41 42 43 44 45 46 |
3.5.2 | 58 0a 00 00 00 02 00 03 05 02 00 02 03 00 00 00 00 10 00 00 00 01 00 04 00 09 00 00 00 06 41 42 43 44 45 46 |
3.5.3 | 58 0a 00 00 00 02 00 03 05 03 00 02 03 00 00 00 00 10 00 00 00 01 00 04 00 09 00 00 00 06 41 42 43 44 45 46 |
3.6.0 | 58 0a 00 00 00 03 00 03 06 00 00 03 05 00 00 00 00 06 43 50 31 32 35 32 00 00 00 10 00 00 00 01 00 04 00 09 00 00 00 06 41 42 43 44 45 46 |
3.6.1 | 58 0a 00 00 00 03 00 03 06 01 00 03 05 00 00 00 00 06 43 50 31 32 35 32 00 00 00 10 00 00 00 01 00 04 00 09 00 00 00 06 41 42 43 44 45 46 |
3.6.2 | 58 0a 00 00 00 03 00 03 06 02 00 03 05 00 00 00 00 06 43 50 31 32 35 32 00 00 00 10 00 00 00 01 00 04 00 09 00 00 00 06 41 42 43 44 45 46 |
3.6.3 | 58 0a 00 00 00 03 00 03 06 03 00 03 05 00 00 00 00 06 43 50 31 32 35 32 00 00 00 10 00 00 00 01 00 04 00 09 00 00 00 06 41 42 43 44 45 46 |
4.0.0 | 58 0a 00 00 00 03 00 04 00 00 00 03 05 00 00 00 00 06 43 50 31 32 35 32 00 00 00 10 00 00 00 01 00 04 00 09 00 00 00 06 41 42 43 44 45 46 |
4.0.1 | 58 0a 00 00 00 03 00 04 00 01 00 03 05 00 00 00 00 06 43 50 31 32 35 32 00 00 00 10 00 00 00 01 00 04 00 09 00 00 00 06 41 42 43 44 45 46 |
4.0.2 | 58 0a 00 00 00 03 00 04 00 02 00 03 05 00 00 00 00 06 43 50 31 32 35 32 00 00 00 10 00 00 00 01 00 04 00 09 00 00 00 06 41 42 43 44 45 46 |
4.0.3 | 58 0a 00 00 00 03 00 04 00 03 00 03 05 00 00 00 00 06 43 50 31 32 35 32 00 00 00 10 00 00 00 01 00 04 00 09 00 00 00 06 41 42 43 44 45 46 |
4.0.4 | 58 0a 00 00 00 03 00 04 00 04 00 03 05 00 00 00 00 06 43 50 31 32 35 32 00 00 00 10 00 00 00 01 00 04 00 09 00 00 00 06 41 42 43 44 45 46 |
4.0.5 | 58 0a 00 00 00 03 00 04 00 05 00 03 05 00 00 00 00 06 43 50 31 32 35 32 00 00 00 10 00 00 00 01 00 04 00 09 00 00 00 06 41 42 43 44 45 46 |
4.1.0 | 58 0a 00 00 00 03 00 04 01 00 00 03 05 00 00 00 00 06 43 50 31 32 35 32 00 00 00 10 00 00 00 01 00 04 00 09 00 00 00 06 41 42 43 44 45 46 |
4.1.1 | 58 0a 00 00 00 03 00 04 01 01 00 03 05 00 00 00 00 06 43 50 31 32 35 32 00 00 00 10 00 00 00 01 00 04 00 09 00 00 00 06 41 42 43 44 45 46 |
4.1.2 | 58 0a 00 00 00 03 00 04 01 02 00 03 05 00 00 00 00 06 43 50 31 32 35 32 00 00 00 10 00 00 00 01 00 04 00 09 00 00 00 06 41 42 43 44 45 46 |
4.1.3 | 58 0a 00 00 00 03 00 04 01 03 00 03 05 00 00 00 00 06 43 50 31 32 35 32 00 00 00 10 00 00 00 01 00 04 00 09 00 00 00 06 41 42 43 44 45 46 |
4.2.0 | 58 0a 00 00 00 03 00 04 02 00 00 03 05 00 00 00 00 05 55 54 46 2d 38 00 00 00 10 00 00 00 01 00 04 00 09 00 00 00 06 41 42 43 44 45 46 |
An evolving language
From this small window, we can have a glimpse at how the infrastructure in R evolved in the last 20 years, tracing from the earliest release we can test (R 1.9.1, released in 2004):
- The differences in the serialized data since R 3.6.0 are apparent. If you still remember, it was because the serialization format version 3 became the default, although it has already existed since R 3.5.0.
- There are notable differences in R 4.2.0, although still using serialization format version 3. Perhaps this is related to the UCRT update?
serialize()
return value. We cannot useserialize(connection = NULL)
as our test payload directly since it returned a character string instead of a raw vector until R 2.4.0. Therefore, we used the higher-level functionsaveRDS()
as a proxy to get the outputs.saveRDS()
compression option. For our purpose of cross-version comparison, we setsaveRDS(compress = FALSE)
because the default ofcompress
was flipped toTRUE
since R 2.4.0.saveRDS()
was called.saveRDS()
before R 2.13.0.Rscript.exe
did not exist until R 2.5.0. Therefore, we usedRcmd.exe
instead in the earlier versions.
I think these are all very positive language and tooling improvements—which
benefit all R developers every day!
The consistency and compatibility in other aspects are also amazingly high.
If we don’t remove each R version after they are extracted into dist/
,
you can open them and run every app/bin/Rgui.exe
on the latest
Windows 10 without issues.
A possible answer
Here is my answer to the original question on why the skipping offset should be 14 instead of 17.
From the table above, there are many 00
as zero-bytes of fills.
So naturally, it is critical to know how these filler bytes are used.
If we look into the serialize()
upstream serialization format
XDR,
its corresponding RFC 1832 offers an informative
example
and some
clues:
BASIC BLOCK SIZE
The representation of all items requires a multiple of four bytes (or 32 bits) of data. … If the n bytes needed to contain the data are not a multiple of four, then the n bytes are followed by enough (0 to 3) residual zero bytes, r, to make the total byte count a multiple of 4.
Using R 4.2.0 as an example, the serialized "ABCDEF"
is:
58 0a
00 00 00 03
00 04 02 00
00 03 05 00
00 00 00 05
...
We can annotate it like this:
OFFSET HEX BYTES ASCII COMMENTS
------ --------- ----- --------
0 58 0a X\n -- X (XDR format) and line break
2 00 00 00 03 ...3 -- serialization format version = 3
6 00 04 02 00 .420 -- current R version = 4.2.0
10 00 03 05 00 .350 -- format available since 3.5.0
14 00 00 00 05 ...5 -- serialized data starting
This is a rough hypothesis, and I could be wrong. So, don’t be shy and leave a comment to add the correct explanation.