LMDB in Node

Kris Zyp
Doctor Evidence Development
7 min readDec 4, 2019

--

We switched to Lightning DB (LMDB) from LevelDB over a year ago for our caching server, and we have been thoroughly impressed with the performance, scalability, efficiency, integrity, and concurrency of LMDB. Simply put, LMDB is probably the most efficient key-value store available, and it also supports multi-process concurrency and has a robust, crash-proof design. We initially were interested in LMDB because it provided multiple-process support, which is the most robust mechanism for parallel execution in Node.js (worker threads are available, but has many limitations) something not offered by LevelDB, but we have been impressed with the performance benefits and design it provided as well.

However, LMDB has a number of sharp edges to deal with. As we dealt with these issues, we put together a simple library, lmdb-store, to help make it easier to use LMDB in Node.js, with modern idiomatic JavaScript, and really leverage its performance. In addition, we (through lmdb-store) use the node-lmdb package as the main JS bridge to LMDB, which is a fantastic package.

The lmdb-store is promise and iterator-based as well, providing a clean interface that works elegantly with new async/await operators and queries that work with for-of loops.

Database Size Management

With most databases, you don’t need to know the potential size of table or store before using it. However, with LMDB, a single memory-map is created for each database,which requires specifying a fixed size when the database is created. Of course, manually setting this is cumbersome, especially if you are managing multiple databases.

lmdb-store solves this by monitoring for any failed attempts to write to the database due to lack of space in the fixed size, and then will automatically resize the database and rerun the attempted operation. In this way, lmdb-store effectively provides dynamic database sizing, like one would expect from a database.

Commit Performance and Batching

LMDB is capable of remarkable levels of performance; on its own, millions of writes per second can be performed if they are batched. However, there is a large performance bottleneck that can be easy to hit. If you do a single or a just few writes in a transaction, and then commit the transaction in sync mode, the writes can easily be performed in a few microseconds, but waiting for the synchronous I/O operation to complete (waiting for the disk flush to be confirmed) can be multiple milliseconds. If you are sequentially performing individual writes in their own transactions, waiting for commit flushes to finish can cause performance to easily become 100–1000x slower than the just the write operations themselves!

One way to mitigate this performance issue is to disable synchronous flushing on commits. However, this is actually a critical part of how LMDB achieves crash proof sequencing of writes, and disabling this can lead to database corruption if a machine crashes. And corruption is actually fairly common with crashes, without synchronous flushing, based on our tests.

Fortunately, there is a much better solution: timer/queue-based batched transactions. By queuing all write actions and executing them as a batch in a single transaction once a certain amount of time has passed, we can perform write actions asynchronously, and still use commits with disk sync/flush for data integrity. And this yields remarkably scalable performance. The faster you execute write operations, the more write operations are batched together into each transaction commit, and the faster LMDB runs. The slow part of transactions, the disk sync simply becomes a timer-based background operation (in a separate thread, amortized to constant time), and the (incredibly small) cost of the write operations stays linear with the write requests, allowing us to maintain LMDB’s tremendous performance characteristics as our application load increases.

Usage

Let’s take a look at how to get started with lmdb-store. First, we require or import the open function.

const { open } = require('lmdb-store');
// or
import { open } from 'lmdb-store';

And then we use open to create a store, indicating that we will be using versioning:

let inventoryStore = open('inventory', {
useVersions: true
});

And then we begin interacting with it as a key-value store:

let shoeCount = inventoryStore.get('shoes'); // synchronous
// asynchronous:
let promiseForShoeUpdate = inventoryStore.put('shoes', 3);
promiseForShoeUpdate.then(() => {
// if you need to be notified of when the write completes
});

And because lmdb-store uses NodeJS’s timer event queue for batching, it is easy to combine multiple writes in a single transaction. Anytime consecutive writes occur in the same event turn, they will automatically be batched together in the same transaction (as long as they are part of the same database).

Note, that lmdb-store uses binary values (and optionally keys), so you can use your preferred serialization format. In this example, we wrote out the buffer creation, but you can setup functions to serialize and store all your values as JSON, or for better performance and efficiency, we would recommend using dpack.

Atomic Transactions

With traditional databases, transactions are typically described through a set of synchronous read and write operations within that transaction, followed by a synchronous commit that is performed atomically. This ensures that write operations that are conditional or dependent on data that is read during the transaction, can safely assume that the data that has been read, won’t be changed prior to the commit. Any writes are atomically conditional on the state of any data that was read. With asynchronous writes, we can still do atomic operations, but we need to use a little different approach than sequential steps in a transaction. Fortunately, lmdb-store provides support for atomic conditional write operations.

To perform an atomic conditional write, instead of writing an imperative set of steps in a transaction, we can include a “conditional version” in our write operations. This write operation will check the provided condition inside the asynchronously processed transaction, before performing the write. Again, this means the condition can be checked with writes that are performed and committed in a separate asynchronous thread (from the main thread).

Let’s consider an example of a classic use case for transactions, tracking inventory for sales. If inventory drops to zero, we need to abort/reject the transaction, otherwise we want to atomically decrement the inventory count. To do this, we still begin by first getting the current inventory, and then updating the inventory, conditional on the expected value. If another process or thread updates the value before we write the updated value, the write will fail, and we can respond appropriately:

import { open, getLastVersion } from 'lmdb-store';
function sellShoe() {
let shoeCount = inventoryStore.get('shoes');
if (shoeCount == 0)
throw new OutOfStockError();
let version = getLastVersion();
// update shoe count, conditionally on it still being same version
inventoryStore.put('shoes', shoeCount - 1, version + 1, version);
}

But, what happens if the write fails? Let’s write this with a conditional put, by adding the expected prior value as the third argument, so we can ensure that the write occurs atomically (also writing this with the new async function and await operator to make this easier to follow):

async function sellShoe() {
do {
let shoeCount = inventoryStore.get('shoes');
if (shoeCount == 0)
throw new OutOfStockError();
let version = getLastVersion();
// update shoe count, conditionally on old shoe count
let result = await inventoryStore.put('shoes', showCount - 1, version + 1, version);
// if result is 0, it completed successfully and we are done
// otherwise the condition failed; it must have been changed
// by another process, so let's try again
} while (result !== 0)
}

In this way, we can still perform actions atomically, but we no longer need to employ slow blocking/synchronous transactions. Instead we can still perform atomic-conditional-writes, and can do so asynchronously, taking advantage of LMDB’s tremendous read/write performance within transactions, as well as idiomatic asynchronous JavaScript.

Iterable Queries

Querying can also be challenging with the direct LMDB APIs. LMDB provides a low-level cursor API for iterating through entries in the stores, based on a key range. However, when you want to do a database query for a range, cursors require a lot of effort to be properly handled, requiring the cursor state be initialized, moving the cursor for each iteration, and properly cleaning up the query afterwards. Idiomatic JavaScript, on the other hand, has moved towards modeling collections of data with iterables. Iterables are perfectly suited for handling these types of queries since with their on-demand progressive loading design.

To use the iterable queries, we use the lmdb-store getRange method. Here we create a query for everything in our store, and iterate through it:

let allOfInventory = inventoryStore.getRange({});
for (let { key, value } of allOfInventory) {
// do something with each key and/or value
}

And just like that, we can iterate through all the items in our store, without having to thinking about creating and releasing cursors. And this is efficient; the iterator is not loading all the items from the store into memory all at once, it progressively moves the cursor through the store as the loop iterates, ensuring that we don’t use excessive memory (it does actually load multiple entries at a time in batches, for performance, but this is capped to ensure minimal memory consumption).

We can also specify start, end, limit, and other parameters to the query range, so we can query for specific ranges of keys in a store.

Efficiency vs Performance

Database performance is often assessed by measuring benchmark in terms of databases operations that can be performed per wall clock time. This is a good metric if your database only ever has to complete a single series of operations for a single user. However, in the real world, typically a server is handling requests from hundred of different users, and many threads are competing for resources. Therefore, for real world application usage, a far more practical measure is how much CPU time is required for a number of database operations, rather than wall clock time. And this is where LMDB really pulls ahead of other databases/key-value stores and shows that LMDB demonstrates superior speed and and even more so, efficiency, which translates to better real-world performance on servers with numerous threads and processes all running together.

With lmdb-store, it is easy to get started and take full advantage of the power of LMDB, using idiomatic modern JavaScript with iterators and promises for clean and elegant iterable queries and await-style asynchronous writes that can easily scale to massive database size and speed with minimal effort.

--

--