Recently I added a Nimble CS240G to my vSphere production storage environment.

The Nimble is affordable and packs some serious features and performance in a 3u enclosure.
The CS240G has 24TB of raw capacity, comprised of twelve 2TB 7200rpm Western Digital enterprise RE4 SATA drives and four Intel 160GB SSDs (Intel 320)

It has redundant controllers, one active, one on standby (mirrored). There are dual 10GbE ports on each controller as well as dual 1GbE.
You can choose to have the data and management sharing one network or split it out. In my case I went with a redundant management setup on the two 1GbE ports on one VLAN and iSCSI data on the two 10GbE ports on another VLAN.
Nimble does smart things around its ingestion of data. The general process is:
- Inbound writes, cached in NVRAM and mirrored to the standby controller, write acknowledgement is given to the host at this point (latency measured in micro seconds)
- The write IO’s are compressed inline in memory and a 512KB stripe is built (from possibly many thousands of small IO’s)
- This large stripe is then written sequentially to free space on on the NL-SAS disks, far more efficiently than writing random small block data to the same type of disks.
- The stripe is analysed by Nimbles caching algorithm and if required is also written sequentially to the SSD’s for future random read cache purposes.
For more in-depth look at the tech. Nimble has plenty of resources here:
http://www.nimblestorage.com/resources/datasheets.php
http://www.nimblestorage.com/resources/videos-demos.php
In summary, Nimble uses the NL-SAS for large block sequential writes, SSD for random read cache and compresses data in-line at no performance loss.
To get a solid idea of how the Nimble performed, I fired up a Windows 2008 R2 VM (2 vCPU/4GB RAM) on vSphere 5.1 backed by a 200GB thin provisioned volume from the Nimble over 10GbE iSCSI.
Using the familiar IOMeter, I did a couple of quick benchmarks:
All done with one worker process, tests run for 5 minutes.
80/20 Write/Read, 75% Random, 4K Workload using a 16GB test file (32000000 sectors) and 32 outstanding I/Os per target
24,600 IOps, averaging 1.3 m/s latency, and 95MB/s
50/50 Write/Read, 50% Random, 8K Workload using a 16GB test file (32000000 sectors) and 32 outstanding I/Os per target
19,434 IOps, averaging 1.6 m/s latency and 152MB/s
50/50 Write/Read, 50% Random, 32K Workload using a 16GB test file (32000000 sectors) and 32 outstanding I/Os per target
9,370 IOps, averaging 3.6 m/s latency and 289MB/s
Initial thought is – Impressive throughput while keeping latency incredibly low
That’s all I have had time to play with at this point, if anyone has any suggestions for benchmarks or other questions, I am quite happy to take a look.