Initial Performance Impressions of Nimble Storage

Recently I added a Nimble CS240G to my vSphere production storage environment.


The Nimble is affordable and packs some serious features and performance in a 3u enclosure.

The CS240G has 24TB of raw capacity, comprised of twelve 2TB 7200rpm Western Digital enterprise RE4 SATA drives and four Intel 160GB SSDs (Intel 320)


It has redundant controllers, one active, one on standby (mirrored). There are dual 10GbE ports on each controller as well as dual 1GbE.

You can choose to have the data and management sharing one network or split it out. In my case I went with a redundant management setup on the two 1GbE ports on one VLAN and iSCSI data on the two 10GbE ports on another VLAN.

Nimble does smart things around its ingestion of data. The general process is:

  • Inbound writes, cached in NVRAM and mirrored to the standby controller, write acknowledgement is given to the host at this point (latency measured in micro seconds)
  • The write IO’s are compressed inline in memory and a 4.5MB stripe is built (from possibly many thousands of small IO’s)
  • This large stripe is then written sequentially to free space on on the NL-SAS disks with only an impact to the array of around 11 IOps. Far more efficient than writing random small block data to the same type of disks.
  • The stripe is analysed by Nimbles caching algorithm and if required is also written sequentially to the SSD’s for future random read cache purposes.

For more in-depth look at the tech. Nimble has plenty of resources here:

In summary, Nimble uses the NL-SAS for large block sequential writes, SSD for random read cache and compresses data in-line at no performance loss.

To get a solid idea of how the Nimble performed, I fired up a Windows 2008 R2 VM (2 vCPU/4GB RAM) on vSphere 5.1 backed by a 200GB thin provisioned volume from the Nimble over 10GbE iSCSI.

Using the familiar IOMeter, I did a couple of quick benchmarks:

All done with one worker process, tests run for 5 minutes.

80/20 Write/Read, 75% Random, 4K Workload using a 16GB test file (32000000 sectors) and 32 outstanding I/Os per target

24,600 IOps, averaging 1.3 m/s latency, and 95MB/s

50/50 Write/Read, 50% Random, 8K Workload using a 16GB test file (32000000 sectors) and 32 outstanding I/Os per target

19,434 IOps, averaging 1.6 m/s latency and 152MB/s

50/50 Write/Read, 50% Random, 32K Workload using a 16GB test file (32000000 sectors) and 32 outstanding I/Os per target

9,370 IOps, averaging  3.6 m/s latency and 289MB/s

Initial thought is – Impressive throughput while keeping latency incredibly low

That’s all I have had time to play with at this point, if anyone has any suggestions for benchmarks or other questions, I am quite happy to take a look.

3 thoughts on “Initial Performance Impressions of Nimble Storage

  1. Hi,

    It appears, you are testing with a highly compressible dataset consisting of zeros. What you need to do is grab a large database file, or a large vmdk or vhd which has real production data in it, rename them to iobt.tst, save them at the root on the LUN and run your tests again. If you have multiple datasets, then grab these files, use tar and tar them up, rename the output file to iobt.tst and run your tests again. I’m sure you will get different results and more aligned with a real world scenario.

    In order to run a meaningful test, you ought need to also consider the size of the active dataset. This will vary per application. For some apps the active dataset can be small, 5-6% of the entire dataset, for others, like Exchange, can be very very large, 50-60% even more.

    Testing these scenarios will allow you to get a better idea as to the capability of this system based on *your* environment.

    FYI…I work for Netapp

    1. NOTE: I work for Nimble Storage

      We actually welcome these tests as they further validate our claims. We can absolutely deliver on the promise of both high performance AND high capacity utilization (PLUS the most granular and cost-effective data protection in the industry). The reality is that Nimble does a phenomenal job at compressing data without the performance degradation that has been associated with it on traditional storage architectures. CASL is a fundamentally new approach to storage (albeit combining several well known concepts). And because CASL is built as a variable block filesystem, we can tune each volume to the specific data set (i.e. Exchange logs versus Exchange database) to get the best of compression and performance for each and every app. Case in point, check out our 40,000 mailbox ESRP report ( 40,000 mailboxes on a single 3U Nimble Storage array. I hope you’d agree that’s quite impressive given the results of our competitors. And you’re right, working sets can be different based on applications. That’s why we offer a varying cache size. Cool thing is you can get a larger cache upfront if you need it, or you can seamlessly upgrade to it later on with absolutely zero downtime.

      And if you still have doubts, check out the Cisco/VMware published VDI reference architecture for 1,000 desktops ( This was done on one of our smallest arrays, so just imagine what Nimble can do with the real heavy hitters.

Comments are closed.