Building the Fastest JS (De)serializer

Published in

Doctor Evidence Development

8 min readJan 12, 2021

Virtually any interaction of data between computers, or storage of structured data involves serialization and deserialization, and therefore the performance of these methods is crucial for optimum application performance. In our clinical research applications, we deal with storing, searching, and updating massive amounts of structured data, and therefore we have invested in building the fastest serialization and deserialization techniques that are available with msgpackr and cbor-x, providing significantly better performance than typical JSON serialization. In this post, I wanted to share a few techniques that we used to achieve this performance.

The msgpackr package uses the MessagePack format, and cbor-x uses CBOR, of course. These are binary-level formats for structured data, and the format is highly conducive to high performance serialization and deserialization. However, converting binary data to structured JavaScript objects and values comes with challenges. And effective performance optimization involves looking at performance bottlenecks and hotspots, and finding more optimal strategies. Here we will look at string conversion, record/object conversion, DataViews, buffer handling, code structure, string caching, and arena allocation.

String Conversion

The majority of data structures will inevitably contain many strings, which may from property keys or values. Encoding and decoding strings can be critical to performance is often a substantial bottleneck. One may initially expect that simply using NodeJS Buffer’s encoding or decoding functions, or the TextEncoder and TextDecoder would be the fastest options, but unfortunately, these functions can actually be relatively slow, and often there are faster options. Here is an overview of the some the string decoding options available:

Buffer.toString This is the most convenient method decoding strings from binary/buffer data in NodeJS. For default UTF-8, this actually uses utf8Slice.
TextDecoder.decode This is the experimental JS API for decoding strings from binary data
String.fromCharCode JavaScript based decoding can be written that provides a sequence of unicode codes to fromCharCode to generate a string
String.slice While this isn’t a direct decoding function, one valuable optimization technique is creating a large decoded string from source binary data, and then slice-ing the individual strings needed for specific keys/values.

Now let’s look at how these perform, here is a brief overview of the speed of these functions (in nanoseconds). The decoding functions are using UTF-8 decoding with only latin characters, and running on Node V15:

As you can there are dramatic (and perhaps surprising) differences in performance across these functions. TextDecoder is terribly slow and should be avoided as much as possible (only to be used as necessary in the browser). Buffer.toString is faster, but still has a significant overhead for smaller strings. This is due to the expense of cross the JavaScript/C++ barrier, which involves and necessitates the construction of a local “scope” that tracks JS values (that are frequently moved around with V8’s copy-based GC), with deterministic pointers for C++. However, once the overhead of crossing this barrier and constructing a string is paid, the C++ code is very efficient at per-byte translation.

For smaller strings, String’s fromCharCode is clearly much faster. However, using this involves iterating through bytes to translate them to unicode character codes. For smaller strings (under 32 bytes), this is typically still faster (and much faster for very small strings) than using Buffer.toString.

String Extraction

Switching between these two functions is one way to improve performance. However, what if we could do a batch call of C++ string decoding to minimize C++ call overhead, and to get the best of both worlds? This is what we have done with an “extraction” module. The msgpackr-extract and cbor-extract packages that do a single pass scan of a MessagePack/CBOR binary document and extract all the strings at once and return them in an array. With this technique, we can minimize the number of C++ calls, and take advantage of the speed of C++ to decode as much data as possible at once.

These extraction modules also leverage an addition technique. First, it is important to understand how V8 internally represents strings. There are actually two different (primary) string representations, one-byte and two-byte strings. One-byte strings use one byte per character, and two-byte string use two bytes per character. Two-byte strings are internally UTF-16 encoded strings, and this is needed and used anytime a string contains characters outside the latin/ASCII character set. However, if strings that only Latin characters, V8 will try very hard to use the one-byte string encoding. Of course this is much more memory efficient. And in typical JS application, it is very common that overwhelming majority of strings in memory will actually be one-byte string.

The extraction module takes advantage of these representations. As it scans binary data for strings, when it encounters a string that requires two-byte representation, it will call the necessary V8 function to create the two-byte string. And when it encounters strings that only have Latin characters it will generate one-byte strings. But, we can also take advantage of another property of one-byte strings: the characters and bytes positions deterministically align. Because of this we can actually combine all consecutive Latin strings that do not have any non-latin strings between them (non-latin bytes for the CBOR/MessagePack are fine), and make single bulk strings. This further reduces the overhead of string construction. And then on the JavaScript side, we can take advantage of the byte/character alignment and easily call slice to get individual string from the bulk one-byte string that is returned from the extraction module. As you may have noted from above the slice function is by far the fastest, and so using this technique provides the best combination of minimal C++ calls with super-fast string retrieval of strings. This is also preserves the V8 optimization to use one-byte string representation of strings with only latin characters (and use two-byte representation as needed). For larger data structures that with many strings, this can provide substantial performance improvement.

Property Name Caching

The final technique for improving the performance of decoding strings is caching strings. With this technique we stored decoded strings that we expect to be frequently accessed, and then match them based on their binary bytes. In msgpackr/cbor-x, this is only used for property names, when not using record extensions, and only used for smaller objects where string extraction is not applied. However, for smaller objects, with frequently repeated structures, (without the record extension), this also provides a nice performance boost.

Encoding Strings

We have mainly discussed decoding strings, but encoding strings is also important. With encoding, there are similar options available with Buffer.write (which uses utf8Write for UTF-8 strings), and using JavaScript to directly translate strings to bytes. And here again performance is dependent on string length, where large strings are faster with Buffer.write and smaller strings are faster with JS. Here we basically use the switching technique, and choose the fastest function based on string length.

Object Creation

Another significant bottleneck in decoding is object creation. With single-pass decoding, creating objects mean incrementally adding properties to an object as they are read. V8 uses hidden classes, which means that it needs to change the class of the object for each property as it is added, which is a complicated procedure. Fortunately, V8 has sophisticated slack tracking to help deal with this, but creating objects by incrementally adding properties is still expensive and sub-optimal. Another possibility is to use two-pass object creation, but this would require constructing two arrays, one for property names and one for property values, which is overall slower than single-pass object creation.

This is where the use of record extensions is extremely valuable. The record extensions used in msgpackr and cbor-x allow us to define an object or record structure once, and then reference that record structure for encoding/decoding future objects with the same properties. This provides tremendous space saving since we don’t need to constantly serialize the property names of objects for every instance of a particular class or object structure. However, it also has tremendous performance implications as well.

When msgpackr/cbor-x encounters a reference to a record structure, rather than incrementally adding properties, it will actually generate code (and cache it) for constructing the object with object literal syntax. An object literal specifies all the properties at once (they are all known from the record structure definition), and the property values are read directly into this object literal. V8 can compile an object literal in a way that the final target class can be immediately created and no transitions between hidden classes are needed. This is typically about twice as fast as creating objects by incrementally adding properties, and so this can significantly improve the performance of decoding objects.

The record structures also benefit encoding simply due to the fact that less time needs to be spent writing property names.

Buffer Management

Another common bottleneck with serialization libraries is with buffer management. Creating new buffers (a NodeJS Buffer, ArrayBuffer, or the ArrayBufferView classes) is relatively expensive. The msgpackr and cbor-x libraries try to avoid creating instances of these as much as possible.

During encoding, extra buffers can be avoided by simply retaining the position in the current buffer whenever an encoding is finished, and reusing the last buffer, from the last position, to perform the next encoding. Buffer’s can (more cheaply) be slice ‘d from the parent buffer and returned. The encoders starts with a default buffer size of 8Kb (larger new buffers may be created as needed). So 8 1Kb objects could be encoded using the same buffer instance before a new one needs to be created.

With decoding, the buffers are generally provided to the decoder. However, the decoder is designed to support the reuse of buffers. In lmdb-store, we reuse buffers repeatedly, copying database contents into the same buffer for each read. The decoder is well-optimized to handle this case and can read from the same buffer without requiring any new buffer instances for most database reads.

DataViews

Another important part of encoding and decoding data is handling numbers. While JSON encodes numbers as base 10 characters, MessagePack and CBOR use machine-friendly number encodings, with 8, 16, 32, and 64-bit signed, unsigned integers and floats. These are computationally much faster to encode and decode than base 10 conversions. And, in this case, JavaScript provides a native interface for conversion that is both easy and fast!

The DataView class provides this conversion mechanism. A DataView is connected with the binary/buffer data, and then numbers can be quickly encoded and decoded by position. This API nicely aligns with both CBOR and MessagePack format and so it is fast and efficient.

The most expensive part of using a DataView is instantiating it. Consequently, when creating a DataView it is cached on the buffer, so if a buffer is reused, the DataView can be reused as well.

Built for Speed

This covers the main specific techniques used to achieve exceptional performance. Beyond these techniques, these libraries are coded with very direct, straightforward techniques that eschew any unnecessary abstractions and extra objects or indirection.

All of these algorithms and techniques come together to provide the ultra-fast serialization and decoding. At the time of this writing, msgpackr and cbor-x are faster (and usually significantly faster) than any other known JavaScript serialization libraries, including native JSON and libraries for other formats. Let’s take a look at some benchmarks:

Benchmarks

Our benchmark is a data structure with nested arrays and objects and good mix of strings and numbers (including both integers and floats) that is about 7.5Kb in JSON form. Here are the benchmark results, showing operations per seconds. This shows both standard encoding and encoding with record extensions. This benchmark is for msgpackr, but cbor-x has very similar performance.

This next benchmark shows performance with smaller data, this time a small ten property object with no nesting. The performance scales well to this size also, still easily outperforming (just comparing to native JSON, which is the fastest competitor, for brevity here):

Again, this demonstrates the culmination of the techniques and tremendous performance advantages they provide. These libraries are just plain fast, and with full extensions, often vastly faster than even native JSON.

Future Work

We are also investigating the use of V8’s new “Fast Call API”, and leveraging external strings with slicing, as a means of further improving the speed. We hope to continue to ensure that these libraries provide the fastest serializing and decoding capabilities possible.