-
Notifications
You must be signed in to change notification settings - Fork 122
Value Serialisation and Compression
Every "entry" stored in LMDB consists of a key and a value.
The Keys page discusses key-specific considerations such as maximum length, Endianness, multi-part composition and custom comparators. It’s generally necessary to encode keys in a relatively manual manner given their importance. While it may be possible to encode a key in a careful, space-efficient manner (eg using smaller numeric fields, maintaining separate lookup tables etc), it is usually infeasible to use general-purpose compression algorithms on keys. This is because keys are typically quite short (limiting the frequency of duplicate bytes), and the result of compression would render key ordering meaningless.
Values are an entirely different matter. They are mutable, they do not have any size limitation, you can usually use a mainstream serialisation library to conveniently represent (and version) the data, and given values are not used for sorting, you can usually apply a general-purpose compression algorithm.
Because LMDB does not require structural understanding of the values it is storing, you are free to encode values in whatever way you prefer.
For simple situations you can just write a string or number directly into the value buffer and store that. For more complex situations you probably want a way to conveniently distinguish fields, handle repetition, and still be able to read old values after their structure has evolved.
The pragmatic way to achieve this is through a mainstream serialisation format, and there are countless options available for Java. You probably have experience with many, with popular formats currently including JSON, XML and Protocol Buffers. Major differentiators include serialisation and deserialisation speed, size of the resulting serialised message, whether a schema is required (or even supported), compatibility with other languages, what effort is required to bind the serialised data to Java types, and whether any related tooling easily integrates into your broader build ecosystem (eg does it require a native program to be installed on every machine).
From an LMDB perspective it makes no difference which serialisation format that you use, but given that many LmdbJava applications are focused on low latency, it would be remiss to not mention that most Java serialisation libraries are not designed for low latency applications. In particular there are considerable object allocation and subsequent garbage collection (GC) costs to use most libraries. If evaluating libraries, be sure to run a benchmark that is reasonably representative of your workload so that the impact of allocation and deallocation is clear. A good place to start is the JVM Serializers Wiki.
For a specific recommendation, Simple Binary Encoding (SBE) is heavily used in applications by one of the current LmdbJava maintainers. SBE uses an XML file to define long-term evolvable message schemas, and resulting messages generally reflect the way an aligned C struct
would be laid out in memory. This means deserialisation is effectively free, as there’s no intermediate format to decode. Its tooling is all Java-based, meaning it works fine from Maven or Gradle without native dependencies. A key differentiator is that it uses the flyweight pattern for message encoding and decoding, meaning there are no additional allocation and GC costs for each decoded message. The downsides include the relatively limited number of languages it officially supports (currently Java, C++, C# and Go), the encoded messages use fixed-length offsets (so there’s a lot of empty - but easily compressible - bytes in most messages), and messages must be laid out and interacted with in a stream-friendly manner (ie repeating elements at the end of a message). Nevertheless it’s an exceptionally low latency, production-grade option.
Some serialisation libraries (and indeed key-value stores) offer an inbuilt lossless compression option. While this is convenient, often times the provided compression algorithms will not be optimal for your application. That is because the optimal algorithm for your specific application will depend on a wide range of factors, including:
- Overall database size
- Probability of entries being in RAM versus storage
- Storage configuration (SSD, NVMe, magnetic, RAID, remote block store etc)
- Compressibility of the actual data (empty bytes, repeating bytes)
- Size of individual values and layout options (merging into larger blocks might assist)
- Ratio of reads to writes (compression is usually far slower than decompression)
Given the optimal compression approach is heavily workload-dependent, it is difficult to make specific recommendations. So please take the following as simple suggestions, as only you know your workload characteristics.
What we have found generally works is to start with the simplest thing that could possibly work, which is no compression at all. Populate your database with some real-world data. To reduce the impact of free space in pages, insert the data sequentially and use MDB_APPEND
(as discussed on the Storage Efficiency page). Then take the database file and compress it using standalone (command line) database compression tools. This will give you a ballpark estimate of how much your data could compress without considerable effort (such as laying out the data in a materially different way). You may find early on that the data simply cannot be effectively and easily compressed. If instead the compression ratios were found to be attractive, copy the compressed database files to a memory-based file system (usually /tmp
is backed by memory on Linux) and run each decompression tool several times over to determine the maximum effective decompression throughput for your data (in MB/s or GB/s).
The next step is to quantify how quickly your storage subsystem can deliver compressed data. Use a tool like dd
or bonnie++
to quantify the sequential read performance of the storage subsystem. Be careful to perform tests of sufficient size that operating system caching does not distort the results.
By this stage you know the compression ratios provided by different algorithms for your data, the decompression speed (in MB/s or GB/s) for in-memory decompression of your data, and the effective read throughput of your storage subsystem (in MB/s or GB/s). Because storage almost always imposes more engineering constraints than compute workloads (eg latency, recurring cost even while data is not being accessed, backup and disaster recovery, replication, cluster scheduling flexibility etc) it is usual to ensure that whatever storage you provision is always utilised at its maximum space efficiency and throughput. In practical terms that means selecting the compression algorithm that offered the highest compression ratio while still remaining faster than the storage subsystem.
It’s worth emphasising that the above approach is intended to quickly and easily discover which specific compression algorithm is potentially feasible for your database. Naturally compressing individual values will not achieve the same results as compressing the entire database, but it still offers a reasonable ballpark and indicates what could be achieved if the database layout is modified (eg making LMDB values contain more messages). You would still need to ensure the availability and performance of the algorithm from Java and consider access patterns (eg if random reads dominate, and the database is stored on slow storage like magnetic disks or remote block stores, the storage latency will be such that a more space-efficient but less time-efficient algorithm is probably better).
For a specific recommendation, LZ4 is generally a good option and its decompression performance exceeds even the newest NVMe drives.
While LMDB allows you to encode values in any way that you wish, using a mainstream serialisation format and compression algorithm for values is highly recommended. Careful selection of each will ensure your application code is more readable and more maintainable, plus requires less overall storage while operating faster. SBE and LZ4 are recommended for consideration.