MongoDB Deployment Tips on NetApp All-Flash FAS Systems – Part 4

featured_image_mongodb_series

Hey All,

Logwriter here! How are you?

Evaluating storage efficiency isn’t an easy task. In my opinion, the hardest part is to choose a dataset that best represents a real system.

For example, I wouldn’t say that measuring storage savings from YCSB (Yahoo Cloud Serving Benchmark tool) dataset is the best thing to do, unless your data set looks like an YCSB data set.

The Data Set

For my testing I’ve decided to use a public data set. I found a public data set from Reddit. It’s a data set with public comments from May and June 2016. It is a single collection with 131,079,747 documents. Figure 1 shows a sample document and its fields.

part4_doc_sample

Figure 1. Sample document

The data was ingested to MongoDB using the command: mongoimport

MongoDB Cluster Topology

I have a MongoDB cluster made of a single Replica Set. My Replica Set has 3 members: 1 Primary, 1 Secondary and 1 Arbiter.

It means that the Primary has a copy of the database and he handles requests of reads/writes to it.

Secondary has also a copy of the database that is sent by the primary (replication).

Arbiter is a member that helps to elect a new primary in case of the failure of the current primary.

part4_mdb_cluster_diagram

Figure 2. MongoDB Cluster Topology

WiredTiger (WT) Compression

WiredTiger (WT) is the default storage engine for MongoDB 3.2. It’s completely different compared to MMAPv1. It delivers document level concurrency and it stores data in a more efficient way then MMAPv1 because it supports compression.

You can choose between two different compression libraries: snappy or zlib.  The default is the snappy lib.

Snappy is a data compression/decompression library written by Google. Further information about it here.

Zlib is a data compression/decompression library written by Jean-loup Gailly and Mark Adler. More info about it here.

All-Flash FAS (AFF) Storage Efficiency

ONTAP 9.0 data-reduction technologies, including inline compression, inline deduplication, and inline data compaction, can provide significant space savings. Savings can be further increased by using NetApp Snapshot® and NetApp FlexClone® technologies.

ONTAP 9.0 has introduced a new storage efficiency technique known as “inline data compaction”. It gets more than 1 logical block and if possible, stores them in a single physical 4KB block.

Let’s say your NetApp AFF running ONTAP 9.0 gets the following write requests:

part4_objpack_01

Figure 3. Example of Write Requests to the storage system.

How these write requests will be handled by inline compression and inline compaction? Take a look on Figure 4.

part4_objpack_02

Figure 4. ONTAP 9.0 inline compression and inline compaction.

In this example, after it breaks the requests in WAFL blocks (4KB blocks) it would need 11 blocks to store the data. Applying inline adaptive compression, the number of required blocks to store the data goes down to 8 blocks, and finally after inline data compaction the data is stored in 4 physical disk blocks.

Which Storage Efficiency Technology I Should Use on My Environment?

I hate to answer a question like that with: “It depends.”, but unfortunately this is the real answer. Let me walk you through the process that has made me to conclude that “it depends”.

I’ve done 24 different tests using the Reddit public data set.

MongoDB has a parameter to control the on-disk page max size. The larger the value, the better will be your read performance in a sequential workload. The smaller the value, the better will be your read performance for a random workload. By default, leaf_page_max is set to 32K.

If we turn off all the efficiencies (in MongoDB and ONTAP) the Reddit data set has used 171GB of space.

Keeping leaf_page_max at the default, turning Compression on at MongoDB and turning efficiency off at ONTAP, the Reddit data set has used 88GB.

part4_wtcon_oseoff

Figure 5. WiredTiger Compression ON and ONTAP Efficiency OFF

WiredTiger compression is an on-disk feature, so it maintain compressed data on disk, but it doesn’t maintain compression on memory. So, you spend CPU time to compress the data before to send it to disk and when you need to read the data on disk to populate your cache you need to decompress the block and then make it available on cache.

ONTAP has been built to provide storage efficiency without impact the performance of your database. So, if you want to let ONTAP working on saving space for you, we need to change the leaf_page_max from 32K to 4K.

 Changing that setting, the Reddit data set has used 130GB. It is a little bit less saving than MongoDB, but your application will experience a consistent and predictable low latency.

part4_wtoff_oseon

Figure 6. WiredTiger Compression off and ONTAP Efficiency on.

Please, let me know if you have any questions and see you next post!

Thanks for reading it!

Logwriter

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s