Skip to content
Ben Alex edited this page May 20, 2020 · 1 revision

Introduction

LMDB generally has no structural understanding of the data being persisted. They’re just arbitrary arrays of bytes, and it’s up to you to encode them in whatever way you see fit. However because LMDB is an ordered key store, there are some important factors to consider when deciding how to structure your keys:

  1. LMDB needs to "understand" them well enough to store and retrieve them in the intended order
  2. Significant space and performance optimisations occur if keys are inserted in order
  3. Space optimisations occur if using numeric keys with native CPU Endianness and size (64-bit)

This page discusses these issues so you are better-placed to design your keys.

Simple Keys

Every entry in an LMDB database table is identified by its immutable key. A key need only be distinct within the scope of its Dbi, meaning different Dbis can have identical keys. Once inserted, a key cannot be changed (of course you can delete the entry and insert a new one with a different key).

A key is limited in length, as reported by Env.getMaxKeySize(). This key length limit defaults to 511 and is set when the LMDB native library is compiled. All LMDB libraries bundled in the LmdbJava JAR are compiled with default settings, meaning the 511 byte limit applies. You can compile and use your own LMDB library if desired. The LmdbJava Library class defines certain system properties that govern which LMDB native library is used.

LMDB internally maintains a B+ Tree that reflects the order of your keys. To decide where a key is placed relative to other keys in the Dbi, by default LMDB will simply inspect the raw bytes that were presented. For example if you had a small database which used a single byte for a key, the value 'a' would sort before 'b'. Likewise '1' would appear before '2'.

Most keys are more complex. Let's say you were using short strings as your key. For simple details like a name, "Catrina" would appear before "Sally". However "11" would appear before "2", which is probably undesirable. To resolve that you’d need to zero-pad the strings, so "02" would be a better choice so it appears before "11". Let’s ignore if you need negative values (one technique is to add a fixed offset to every number so the smallest-expected negative value is stored as "0", or more likely zero-padded like "00000000").

It is usually highly undesirable to encode numbers as decimal strings, zero-padded or otherwise. After all, the support for negative values is very error-prone, and once you reach "99999999", you are now equal to the 8 byte storage cost of a 64-bit binary number. A 64-bit binary number can store 18,446,744,073,709,552,000 different values (and with proper support for negative values as well).

Numeric Keys

For space efficiency we should always encode numeric keys in binary form. But to encode a number into a binary form, we need to consider which Endianness to use. If Little Endian is used to encode a key in the default LMDB configuration, larger numbers will be sorted in the incorrect order.

There are four ways to resolve this Endianness issue:

  1. Never use multi-byte numbers
  2. Explicitly encode numbers using Big Endian
  3. Advise LMDB you are using native integers
  4. Use a custom comparator

Avoiding the use of multi-byte numeric keys is unrealistic in the majority of cases. Nevertheless it is mentioned for completeness.

Explicitly encoding numbers using Big Endian is easy. Most buffers allow you to nominate the byte order at a buffer level (eg ByteBuffer.order(..)) or method level (eg MutableDirectBuffer.putInt(int, int, ByteOrder)). Once the number has been encoded, the buffer’s bytes will reflect an order that LMDB will correctly sort in its default configuration.

LMDB has special support for "native integers". These must be the same length as size_t and the same byte order as the CPU. In practical terms that means a Java long, as LmdbJava only supports 64-bit machines. You can therefore encode a long directly into an LMDB key using your machine’s native byte order (which is most likely Little Endian). To use this approach, create the Dbi with DbiFlags.MDB_INTEGERKEY. Note the resulting database can only be used on a machine of the same Endianness. However the resulting database will be more space efficient, as the keys are fixed-length.

Finally, you can use a custom comparator. These are specified when using Env.openDbi(String, Comparator). The custom comparator will be invoked whenever LMDB needs to determine where to place the key relative to other keys in the Dbi. You can therefore decode the presented keys in whatever manner you wish and control the resulting order. Be aware that there is an overhead for LMDB to call from C back into Java to invoke the Comparator.

Multi-Part Keys

Let’s imagine we needed to capture details about a series of web sites. The simplest key would consist of the web address string component and a "last refresh" timestamp component. Given we’ve already discussed numbers at some length above, we’ll encode the timestamp component as a Big Endian representation of the number of milliseconds since the epoch.

Because our key consists of two components, we need to decide how to separate them. It’s typical to either use a separator character (eg a pipe, comma etc) or record each component at a fixed offset. In our example a fixed offset would be problematic, as you would need to either put the web address string first and decide up-front the maximum length of any possible web address, or put the timestamp first and lose the ability to efficiently iterate the refresh times of a specific web address. A separator would be better in this case, assuming the usual workload is to iterate by web address (not timestamp).

Another alternative would be to model the data differently. We could maintain a "web address" Dbi which is keyed on the web address string component and the value reflects a UUID or some other convenient long identifier for that web address. Then the "visits by address" Dbi uses a fixed offsets key consisting of the long "web address" ID (first 8 bytes) then the visit timestamp (last 8 bytes). Maintaining these multi-Dbi relationships is safe, as Txns (transactions) are atomic. By using a key in this manner you can quickly seek to the first visit to a web address and move the cursor forward, or even to the last visit (by setting the ID correctly but using Long.MAX_VALUE for the timestamp then iterating backwards). The downside is the first seek to obtain the long ID for a given web address. The benefit is having shorter, fixed-length keys in the larger "web address" table.

If you required the ability to review which sites were refreshed over a particular period, you could extend the alternative approach to also include a "visits by time" Dbi. Here you’d have a fixed-offsets key consisting of visit timestamp (first 8 bytes) and web address ID (last 8 bytes). The value would be the same as in the "visits by address" Dbi, meaning there’s no need to perform another seek for the value. If conserving space was more important than retrieval speed, you could instead model "visits by time" to have the visit timestamp as the key and the value could be the "web address" IDs visited in that millisecond.

Conclusion

LMDB offers you substantial flexibility in how you structure your keys. However you need to be aware of the key length limit, Endianness when encoding multi-byte numbers, and decide how to separate components of multi-part keys. Each Dbi accepts DbiFlags which can optimise for native integer keys (along with many other settings such as allowing duplicate keys etc) and you can also specify a custom Comparator if required.

Deciding how to structure your keys (and Dbis) is much easier if you know the expected access patterns. It’s optimal to seek to a given key and then move a Cursor forward (or backwards) from there without needing to fetch records from other Dbis. Ideally you would also use keys that permit you to insert the data into LMDB in that same order, as this significantly improves insertion speed and disk space consumption.

Of course these are "ideal" situations and application requirements often prevent them. Just bear in mind that one of the most compelling features of LMDB is its ability to efficiently iterate forward or backwards, and key design is the way you achieve that.

Clone this wiki locally