Jumbo frames performance with 10GbE iSCSI

This is yet another voice to add to the opinion that jumbo frames are worth the effort.

This is especially true in a 10GbE iSCSI environment doing a hardware refresh of hosts, storage and switches or a pure greenfield deployment.

The ease of configuring jumbo frames across your environment is helped by a couple of things:

Most 10GbE switches ship with an MTU of 9214 or 9216 out of the box, so are ready for host and storage to be tuned to match. The Arista 7150 I have tested are configured out of the box this way.

Changing vSphere virtual switches and VMkernel interfaces to a large MTU size is now very easy via the vSphere client and Web Client.

On with the testing.

Hardware I tested with:

vSphere host: Cisco UCSC-C220-M3S, 16-core, 196GB Memory, Intel X520 Dual Port 10Gb SFP+ Adapter. This was running ESXi 5.1 Update 1.

Switch: Arista 7150S-24 (24 SFP+ ports)

Storage: Nimble Storage CS260G (10GbE iSCSI)

This was not meant to be an all encompassing test across all known workloads. Just some synthetic sequential benchmarks to see what difference existed between a default MTU of 1500 bytes vs 9000 bytes for iSCSI.

I set up a VM with Windows Server 2012, 4 x vCPU, 8GB RAM, 100GB system disk and a 100GB data disk (for iometer). This was on a 1TB iSCSI VMFS5 datastore presented from the Nimble array.

Using the good old IOMeter, I setup a basic sequential read test of a 2GB file with 32 outstanding I/O’s and then ran 4K, 8K and 32K tests for 5 minutes (once with ESXi vSwitches/VMkernel interfaces and Nimble ‘s data interfaces at 1500 MTU and then once with them set to 9000 MTU). The Arista was set to an MTU of 9214 across all ports at all times (the default)

jumbo frames

The results showed a consistent 10%+ performance increase in both throughput and bandwidth.

It would be difficult not to recommend jumbo frames as a default configuration for new 10GbE iSCSI or NAS environments.

For more great in-depth reading on jumbo frames and vSphere, make sure to check out Michael Wesbter’s (@vcdxnz001) posts on his blog:




iSCSI configuration on Extreme Networks switches

This is a production configuration I have used with 1GbE and 10GbE Extreme Networks switches for iSCSI storage, specifically a Nimble Storage CS260G with 1GbE management interfaces and 10GbE iSCSI data interfaces.

10GbE iSCSI is on one stacked pair of Extreme x670’s and the 1GbE management interfaces are on an x460 stack.


This is the ExtremeOS configuration I have for both networks.

#iSCSI data ports (10GbE) Extreme Networks x670 (stacked pair)

configure ports 1:1, 1:2, 2:1, 2:2 display-string “Nimble01-Prod-iSCSI”
enable flow-control tx-pause ports 1:1, 1:2, 2:1, 2:2
enable flow-control rx-pause ports 1:1, 1:2, 2:1, 2:2
configure vlan “iSCSI_Data” tag 140

configure vlan “iSCSI_Data” ipaddress
configure vlan “iSCSI_Data” add ports 1:1,1:2,2:1,2:2

#Mgmt ports (1GbE) Extreme Networks x460 (stacked pair)

configure ports 1:1,1:2,2:1,2:2 display-string “Nimble01-Prod-Mgmt”
configure vlan “Storage_Mgmt” tag 130
configure vlan “Storage_Mgmt” ipaddress
enable ipforwarding vlan “Storage_Mgmt”
configure vlan “Storage_Mgmt” add ports 1:1,1:2,2:1,2:2

I have no spanning tree or rate-limiting (also known as broadcast storm control) configured on any iSCSI or management ports. These are not recommended for iSCSI performance reasons.

The important architectural points are:

  • iSCSI and management have dedicated VLAN’s for isolation, performance and security/policy etc .
  • The switch ports carrying iSCSI data have send/receive flow control enabled, for performance consistency.
  • Switches carrying management and iSCSI data are stacked for redundancy.



What makes up a Nimble Storage array?

I have been asked a couple of times about what is inside a Nimble Storage array in terms of hardware.

People have assumed that there may have been proprietary components in a high performance array, but Nimble’s architecture is completely leveraged off a commodity chassis, off the shelf drives and multi-core Intel CPU’s.

Most of the hardware is best illustrated with some photos:

First up – the chassis. Nimble uses a SuperMicro unit with hot swap controllers, power supplies and fans: http://www.supermicro.com/products/nfo/sbb.cfm


This is the front view of the chassis (in this case a Nimble CS260G). It has twelve 3TB Seagate Enterprise NL-SAS drives (model: Constellation ES.2)

It also has four 300GB Intel 320 SSD’s for read cache purposes.

WP_20130511_010The rear of the chassis, showing the redundant power modules and controllers (with PCIe 10GbE adapters) and dual 1GbE NICs.

Looking at the controller itself:


It’s a Supermicro board (X8DTS-F), with 12GB Kingston ECC DDR3 memory and with what appears to be an Intel 55xx Nehalem family CPU (four core/8 threads). On the far back side are the PCIe slots. Far left is an LSI SAS adapter.


In the top PCIe slot, sits the NVRAM card (for fast write caching on inbound I/O). It appears to be an NVvault 1GB DDR2 module, that is built by Netlist:

WP_20130511_007In the bottom PCIe slot sits the 10GbE dual port network adapter, which from the model number would appear to be the PE210G2SPi9 component from Silicom:

That about wraps up the commodity components inside a Nimble array.

What is really apparent, it that cost effective, reliable, high performance/capacity commodity disks, multi-core CPU’s, standard chassis have enabled array vendors like Nimble (and others) to concentrate on engineering their software architecture around data services, file systems, user interfaces and seamless array metrics collection enabling support and data mining opportunities.


Renaming vSphere Datastore device display names

In vCenter, storage devices or volumes/LUNs present themselves with a unique identifier usually prefixed with eui. or naa.

These are known as Network Address Authority Identifiers and Extended Unique Identifiers. (part of FC and iSCSI naming format standards for LUNs)


However, when looking at them in vSphere and via the esxcli, these unique identifiers are less than useful for identifying which VMFS datastores you are dealing with when looking at paths selection etc.

For that reason, I usually rename each of these via the vSphere client as I add them.


This also benefits when filtering NMP devices from the esxcli

If you have connected to the command line of an ESXi host, you can grep for your renamed display name on a storage device


esxcli storage nmp device list | grep -A 9 VMFSProd

3You can see that it picks up the Device Display Name set via the vSphere client as well as returning the 9 rows of available information about that NMP device

Optimising vSphere path selection for Nimble Storage

THIS POST IS NOW OBSOLETE. Use Nimble Connection Manager instead

This post is mostly for documentation or a quick how-to for changing the default path selection policy (PSP) so it is more suitable for Nimble arrays.

For Nimble Storage by default, ESXi 5.1 uses the ALUA SATP with a PSP of most recently used (MRU)

Nimble recommends that this be altered to round robin.

From the CLI of each ESXi host (or from the vMA):

esxcli storage nmp satp set -s VMW_SATP_ALUA -P VMW_PSP_RR

After this point any volume presented from the Nimble array and configured as a VMFS datastore will appear with round robin load balancing and I/O active on all paths.


If you had previously added volumes before changing the default PSP, you can change all existing Nimble volumes to round robin with PowerCLI

#Get all hosts and store in a variable I have called $esxihosts

 $esxihosts = get-vmhost

#Gather all SCSI luns presented to hosts where the vendor string in the LUN description equals “Nimble” and set the multi-pathing policy

Get-ScsiLun -VmHost $esxihosts | where {$_.Vendor -eq "Nimble"} | Set-ScsiLun -MultipathPolicy "roundrobin"

Nimble also make a recommendation around altering the IOps per path setting for round robin. This by default is 1000 IOps before changing data paths.

The official recommendation it to set it to zero (0) for each volume on each host.

Nimble support confirms that this ignores IOps per path and bases path selection on queue depth instead.

So from esxcli on each host, do the following which will search for the string Nimble on all storage devices presented to the host and then loop through and set IOps to zero

for i in `esxcli storage nmp device list | grep Nimble` ; do esxcli storage nmp psp roundrobin deviceconfig set --type "iops" --iops=0 --device=$i; done

I haven’t tested or implemented this as of yet.

I believe Nimbles internal support community has a single powercli script to cover everything here.

It would not be much work to take the per host esxcli stuff into powercli with get-esxcli and aggregate it all together.

Storage Efficiency with Nimble Compression

In my last post I talked about performance of the Nimble Storage platform I was implementing.

In a contended virtualised environment, storage is often the performance sore point or Achilles heal due to insufficient IOps and poor latency of the storage design.

The other major consideration is how efficient the storage platform is at managing the available capacity and what can be done to “do more with less”

With the Nimble platform I am using – this problem is addressed by using:

  • In-line compression on all workloads

Nimble uses a variable block compression algorithm they say can yield a 30-70% saving without altering performance or latency.

  • Thin provisioning

Ubiquitous across most array vendors, variations in block size can result in differences in efficiency.

  • VAAI UNMAP for space reclaimation

Nimble supports the VAAI primitive UNMAP for block reclaimation. This is done from the esxcli of a vSphere host as per this VMware KB:


For testing I’ll concentrate on the compression and how it fares for a SQL server in a vSphere 5.1 environment.

I took a test SQL server running Windows 2008/SQL 2008 that had disk utilisation around 88%  (130GB of data on 149GB of available disk)


I then performed a storage vMotion migration from the original VMFS datastore on an HP LeftHand volume to a 200GB thin provisioned VMFS datastore presented from the Nimble array.

The result was a 2.25 x saving in space (compressed from 130GB to 56GB)


Other VMs such as file/print, remote desktop hosts, infrastructure VMs in the test environment have anywhere between 1.7x to 2.0x compression levels.

In addition to the savings on primary volumes, compression is also applied to snapshots.

Who can say no to close to double the effective capacity on all virtualised workloads with no extra datacenter footprint?

Initial Performance Impressions of Nimble Storage

Recently I added a Nimble CS240G to my vSphere production storage environment.


The Nimble is affordable and packs some serious features and performance in a 3u enclosure.

The CS240G has 24TB of raw capacity, comprised of twelve 2TB 7200rpm Western Digital enterprise RE4 SATA drives and four Intel 160GB SSDs (Intel 320)


It has redundant controllers, one active, one on standby (mirrored). There are dual 10GbE ports on each controller as well as dual 1GbE.

You can choose to have the data and management sharing one network or split it out. In my case I went with a redundant management setup on the two 1GbE ports on one VLAN and iSCSI data on the two 10GbE ports on another VLAN.

Nimble does smart things around its ingestion of data. The general process is:

  • Inbound writes, cached in NVRAM and mirrored to the standby controller, write acknowledgement is given to the host at this point (latency measured in micro seconds)
  • The write IO’s are compressed inline in memory and a 4.5MB stripe is built (from possibly many thousands of small IO’s)
  • This large stripe is then written sequentially to free space on on the NL-SAS disks with only an impact to the array of around 11 IOps. Far more efficient than writing random small block data to the same type of disks.
  • The stripe is analysed by Nimbles caching algorithm and if required is also written sequentially to the SSD’s for future random read cache purposes.

For more in-depth look at the tech. Nimble has plenty of resources here:

In summary, Nimble uses the NL-SAS for large block sequential writes, SSD for random read cache and compresses data in-line at no performance loss.

To get a solid idea of how the Nimble performed, I fired up a Windows 2008 R2 VM (2 vCPU/4GB RAM) on vSphere 5.1 backed by a 200GB thin provisioned volume from the Nimble over 10GbE iSCSI.

Using the familiar IOMeter, I did a couple of quick benchmarks:

All done with one worker process, tests run for 5 minutes.

80/20 Write/Read, 75% Random, 4K Workload using a 16GB test file (32000000 sectors) and 32 outstanding I/Os per target

24,600 IOps, averaging 1.3 m/s latency, and 95MB/s

50/50 Write/Read, 50% Random, 8K Workload using a 16GB test file (32000000 sectors) and 32 outstanding I/Os per target

19,434 IOps, averaging 1.6 m/s latency and 152MB/s

50/50 Write/Read, 50% Random, 32K Workload using a 16GB test file (32000000 sectors) and 32 outstanding I/Os per target

9,370 IOps, averaging  3.6 m/s latency and 289MB/s

Initial thought is – Impressive throughput while keeping latency incredibly low

That’s all I have had time to play with at this point, if anyone has any suggestions for benchmarks or other questions, I am quite happy to take a look.