More P4000 G2 IOMeter Benchmarking (with vSphere RoundRobin tweak)

Here we go with some more IOMeter madness. This time using a much debated vSphere round-robin tweak.

This debate has gone on in the blogosphere in recent weeks about the number of IOPS per path tweak that can be changed from the ESX command line.

The debate is simply; Does it work?

By default, using the round-robin path selection,  1000 IOPS goes down the first path and then the next 1000 on the next path.

From the commandline of the ESX host we can set the round robin parameters:

esxcli nmp roundrobin setconfig –device LUNID  –iops 3 –type iops

Where LUNID is your volume identifer in vSphere e.g naa.6000eb39539d74ca0000000000000067

This setting effectively means that the path will switch every 3 IOPS.

It  appears to greatly help throughput intensive workloads i.e large block sequential reads (restores and file archives) where the iSCSI load balancing of the HP P4000 nodes (ALB) can take effect.

It appears that small block (4k, 8K) are not really helped by the change – so I am only looking at larger block (32k and 256K)

The Setup
Three node P4500 G2 cluster, with twelve 450GB 15K SAS drives in each node. All running SAN/iQ 8.5. Node NICs using ALB.

Windows Server 2003 R2 VM Guest (Single vCPU, 2GB RAM) with a virtual disk on 50GB Volume @ Network RAID-10 (2-way replica)

ESX 4.0 Update 1, with two VMkernel ports on a single vSwitch (two Intel Pro/1000 NICs) bound to the software iSCSI adapter with Round Robin multi-path selection

IOMeter Settings

One worker
64 Outstanding requests (mid-heavy load)
8000000 sectors (4GB file)
5 Minute test runs

The Results

(100% Sequential, 100% Read, 32K Transfer size)

Default IOPS setting :110 MB/s sustained with 3500 IOPS
Tweaked IOPS setting: 163MB/s sustained with 5200 IOPS

(100% Sequential, 100% Read, 256K Transfer size)

Default IOPS setting :113 MB/s sustained with 449 IOPS
Tweaked IOPS setting: 195MB/s sustained with 781 IOPS

The NMP tweak does indeed seem to make a difference for larger block read operations from  iSCSI volumes where the array can do it’s outbound load balancing magic

I would be interested to see if others can get similar results on the HP/Lefthand arrays as well as other iSCSI and NFS arrays (Equallogic, EMC, NetApp)

The real question is; does this help in the real world or only under IOMeter “simulation” conditions.