HP P4000 G2 IOMeter Performance

Some quick benchmarks on a three node P4500 G2 cluster, with twelve 450GB 15K SAS drives in each node. All running SAN/iQ 8.5

The Setup:
Windows Server 2003 R2 VM Guest (Single vCPU, 2GB RAM) with a virtual disk on 50GB Volume @ Network RAID-10 (2-way replica)

ESX 4.0 Update 1, with two VMkernel ports on a single vSwitch (two Intel Pro/1000 NICs) bound to the software iSCSI adapter with Round Robin multi-path selection (shown below)

More info on the vSphere iSCSI config can be found on Chad Sakacs blog or from the PDF on HPs P400o site

IOMeter Settings

One worker
64 Outstanding requests (mid-heavy load)
8000000 sectors (4GB file)

The Results

Max Read: (100% Sequential, 100% Read, 64K Transfer size)
156 MB/s sustained

Max Write: (100% Sequential, 100% Write, 64K Transfer size)
105 MB/s sustained

Max IOPS (100% Sequential, 100% Read, 512byte Transfer size)
31785 IOPS

Harsh Test IOPs (50/50 Random/Sequential, 50/50 Read/Write, 8k Transfer size)
3400 IOPS and 34MB/s

Some new “real world” tests
(50/50 Random/Sequential, 50/50 Read/Write, 4k Transfer size)
3200 IOPS and 23MB/s

(50/50 Random/Sequential, 50/50 Read/Write, 32k Transfer size)
2700 IOPS  and 78MB/s

Overall:

Excellent throughput and IO for three 1Gb iSCSI  nodes on a software initiator and with replicated fault tolerant volumes.

And remember, IOPS scale linearly with each extra node added. And 10Gbe NICs are available should you really want to get that throughput flying.

28 thoughts on “HP P4000 G2 IOMeter Performance

  1. Thanks for posting this data. It always strikes me as odd that when one discusses storage, particularly in a virtual and thus shared infrastructure, that one would execute tests such as 100% reads or writes when these workloads are rarely observed.

    The block sizes also seem to be an odd choice as well. 64 KB or 512 bytes, why such a values? VMs tend to run with 4KB IOPs as NTFS & EXT3 formats VMDKs larger than 2GB in 4KB blocks. Maybe 8KB, as this is the default block size for Exchange 2003, SQL 2005, and Oracle 11g. Heck even Exchange 2010 runs at 32KB.

    I do endorse the “harsh test” as being much closer to real world.

    I realize my comments are critical. Please understand I am advocating for data that is useful in most cases and not in edge use cases. Again, thanks for sharing the data.

    1. Thanks for your comment Vaughn from Netapp

      Why odd? They are just basic throughput and IOPS tests to satisfy what you can get out of the gear i.e. edge case benchmarking

      However, I have added a couple more “real word” tests for 4k and 32k

  2. For round robin, did you change the “iops” setting as well?

    This array has 1Gb of cache doesn’t it?

    What if you would add more VMkernels?

    From esxtop did you see any interesting things?

    Only more questions🙂

    Thanks for the info though. Always nice to see numbers like these,

    1. Thanks for commenting Duncan🙂

      Don’t think I used the NMP IOPS tweak for this test – as it was done in parts at different times.

      The P4500 G2’s have an HP Smart Array P410 on-board for low level array control – that only has 512MB BBC.
      But as they are just a standard Intel Nehalem box, they have 4GB of 1333Mhz DDR3 ECC RAM as well, which is used as cache.

      I don’t have any more NICs to spare for iscsi interfaces🙂

      I only looked at the software iscsi hba in ESXTOP – not the split between vmkernel nics (if that’s possible?)

      Thanks

  3. Question, how long did the test run for ?? interested to see how much part the cache played in the results..

  4. Out of interest, can you try setting up an access patern as below. I use this as my standard “run sheet” when gathering a storage performance baseline.

    Number of outstanding I/O’s = 32
    Custom Access Pattern (File Server) as below:
    % of Access Specification Transfer Size % Reads % Random
    10% 512 Bytes 80 100
    5% 1K 80 100
    5% 2K 80 100
    50% 4K 80 100
    2% 8K 80 100
    8% 16k 80 100
    10% 32k 80 100
    10% 64k 80 100

  5. Great post btw, i have been wanting to see what these HP boxes can do for some time now. Was there any noticable performance increase if you turned off Network Raid-10?

    1. Ran tests 4k, 8k, 32k 50/50 RW – (at Network RAID-0 and Network RAID-10) – no appreciable difference in terms of IOPS or throughput

  6. Excellent post Barrie. When you say sustained, do you mean MBps from your results? The reason I am asking is because I did the same Read test with my 2 node configuration and I recorded 664MBps. The difference between our configurations is I have 4 vmkernel ports and ALB configured. I wouldn’t think the results would vary that much considering you have 12 more drives in your than I.

  7. I stand corrected Barrie. I re-ran my tests and I am in line with your numbers. I need to compare my original tests with my new ones to see where the discrepency is.

    Do you use any “ramp up time” in you tests?

  8. Very Interesting test. How does the enviroment look like? are you running multiple VM´s when running this test?
    Or are this VM running alone in this SAN?

  9. Hi barrie,

    We have a multisite HP P4500G2 (4shelves).
    This is configured with 4*HP procurve 2910al on iSCSI (jumbo frames/flow control enabled on the lefthand,switch and ESXI host adaptor)

    Now when I test (50/50 Random/Sequential, 50/50 Read/Write, 32k Transfer size) I have good IOps ~ 3150 and 97mbps.

    But when I look on my switch port statics I see many dropped TX packets from the switch to the lefthand.
    No dropped packets from esxi host to switch.

    I’ve tested with 4K and that was not a problem. 8K generates a few drops.
    But on 32K I get the following.

    total bytes:1.946.667.947
    total frames:1.251.205
    errors rx:0
    drops TX:1605

    Do have the same drops when simulating with IOmeter?

    1. Sorry, no dropped packets here.
      Would suggest making sure flow control is enabled correctly all the way through.
      I don’t recommend jumbo frames with 1GbE – does not make any difference in my testing or environment – use flow control only.

      Other than that, maybe check with the switch vendor for any known issues and firmware. I’m not familiar with the 2910 Procurves. Got Extreme Networks Summit X460’s here – great for iSCSI performance

      1. Hi Barrie,

        I overlooked a setting on one of my switches.
        Flow control was not enabled on one of my uplinks connecting the 4 iscsi switches. I probable didn’t press apply when changing flow control towards “enable”🙂

  10. Hi Barrie,
    What RAID level do you use within each node, you don’t speak about it. I use P4300 with 8 SAS disks 15 krpm in RAID 6.
    I find that perf are not very good, Do you think the gain would be important if i move each to RAID 5 or 10.
    Thks

    1. I use the default of RAID-5 on each node. This offers the best mix of capacity and performance.

      I believe RAID-6 is only recommended when using a single node. I would highly recommend two plus nodes to take advantage of network RAID and greater performance and resiliency

  11. Hi Barrie,

    Awesome blog, thanks for sharing with us all. It’s good to see how other setup’s are performing out there.

    I have a question myself, I downloaded your IO Meter Profile and tried it on my setup and could not get anywhere near the numbers you got.

    My Setup 2 x DL380 G6 with 48GB RAM, 2 x 1GB nics dedicated for ISCSI, 2x 2510-24G switches for ISCSI only and 2 x P4500 running San IQ 9.5. I have other switches for VM Lan traffic which I won’t go into.

    I setup the SAN with RAID 5 locally and Network RAID 10 for replication.I then setup a VM on SAN and presented a new disk and ran the IO Meter profile on the new disk.

    I have setup ESXi 5 with the NMP (1 IO setting) and Round Robin Setup to run via 2 x 1GB nics and here are the results I am getting:

    50/50 Random/Sequential, 50/50 Read/Write, 4k Transfer size) – Nodes Setup as RAID 5
    1403 IOPS and 5.48MB/s

    (50/50 Random/Sequential, 50/50 Read/Write, 32k Transfer size)- Nodes Setup as RAID 5
    1022 IOPS and 31MB/s

    50/50 Random/Sequential, 50/50 Read/Write, 4k Transfer size) – Nodes Setup as RAID 10
    2860 IOPS and 11.17MB/s

    (50/50 Random/Sequential, 50/50 Read/Write, 32k Transfer size) – Nodes Setup as RAID 10
    2084 IOPS and 61.13MB/s

    Any ideas what I could be doing wrong? I have followed the ISCSI setup doc’s to the T and I have read a lot of white papers from HP about the setup but I am puzzled as to why I can’t even get close to your setup. I have nothing running on the SAN at present as it is still to go into produciton.

    Thanks

    1. Here’s what I got today:

      Max Read: (100% Sequential, 100% Read, 64K Transfer size)
      118 MBs

      Max Write: (100% Sequential, 100% Write, 64K Transfer size)
      1777IOPS & 111MBs

      Max IOPS (100% Sequential, 100% Read, 512byte Transfer size)
      20156

      50/50 Random/Sequential, 50/50 Read/Write, 4k Transfer size)
      4143 IOPS and 16.19MB/s

      (50/50 Random/Sequential, 50/50 Read/Write, 32k Transfer size)
      2965 IOPS and 92.69MB/s

      4 Node P4500 w/ 450GB SAS using RAID 10 on the node & network raid10. HP Procurve 2810-48G x2 for iSCSI network w/ Flow Control on. No Jumbo frames.

      VM running on Xenserver 5.6 SP2.

      I’m right in line with Barrie on everything except slightly lower max read test and 10k lower max IOPS test. I wonder what causes that. Node RAID level?

    2. Hello,

      We have 12 LeftHand nodes. Four clusters, long story.

      These are our primary VMware SAN. We have about 350 VM’s running on these nodes. Currently 8 nodes are SATA or SAS MDL and 4 nodes / 1 cluster is SAS.

      I don’t know if performance was every great, especially on our SATA nodes, but it was acceptable and cost effective especially for Network RAID 10. Overall we were very happy.

      We recently started adding a substantially write heavy workload (VDI’ish) and we are running into IOPS issues. Simple answer is we need more disks.

      We started doing some due diligence and started talking to other storage vendors as well. One interesting topic came out of this that is relevant to this thread.

      When you do a read I/O, regardless of the underlying RAID, RAID 10 or RAID5, a read is only one I/O from one disk somewhere in the cluster. When you do a write to a RAID 10 set, the write costs 2 I/O’s one for each side of the RAID 10 mirror. When you write to a RAID 5 array, the write I/O is the number of disks in the array.

      So we are familiar with the older 12 drive SATA nodes and the newer 8 drive SAS nodes. I am not sure the layout of the 12 drive SAS nodes. The newer 8 drive SAS nodes have one RAID controller, so these arrays in RAID 5 have a write penalty of 8 I/O.

      If the volume you are writing to is Network RAID 10, then that write has to happen on two nodes in the cluster so the write penalty is doubled. So that would be 16 I/O’s per write for the 8 drive SAS nodes.

      Back to the 12 drive SATA nodes, those had two RAID controllers per node. So a RAID 5 array in those nodes have a write cost of 6 I/O or 12 I/O’s for network RAID 10.

      We sent this information to HP/LeftHand earlier this week for confirmation and we have not received an answer yet, but we believe our understand is correct.

      With that being said, any write intensive workloads should have a substantial I/O penalty when using RAID 5 and Network RAID 10. Especially in the 8 drive nodes. I would expect IOMeter to have substantially higher IOPS for RAID 10 when the write ration is fairly high. When the write ratio is lower, my guess is RAID 5 / RAID 10 will be negligible. It would be interesting to see at what % of writes it makes sense to switch to RAID 10.

      We have two more p4500 g2 8 drive Starter SAS nodes en route. Maybe I will do some performance testing on these when they arrive.

      Kevin

  12. We finally got a chance to run some tests on a new p4500 g2 SAS Starter SAN (two nodes, 8 – 15K SAS drives per node). We ran IOmeter workload with the nodes configured as RAID 5, then again as RAID 1 + 0.

    Our test setup was a Windows 2008 HP DL385 with 2 gig nics configured for MPIO using the HP DSM. At the time of our testing, we are running SAN IQ 9.5.

    Our IOmeter workload was configured similar to the original tests:

    – Disk were left unformatted and the physical drive was selected
    – 1 worker thread
    – 8000000 sectors
    – 64 ourstanding IO’s

    For each test the access specifications were setup as following:
    – 100% access
    – Request size 4K
    – 100% Random

    Tests were run for 45 minutes with 30 second ramp up time. I also did the right click run as administrator when running the tests in Windows 2008. Apparently you can get inaccurate results without doing this.

    The results were as we suspected but far from what HP engineers advertised to us. I am curious what the real word implications are but we have proven with one of our workloads that RAID 1 + 0 is substantially faster. Now IOmeter appears to be backing up this up.

    4K, 70% Read, RAID 5 — 2792 IOPS, 10.91 MB/s, 18.24 Avg I/O, 5.66% cpu
    4k, 70% Read, RAID 10 — 6827 IOPS, 26.67 MB/s, 7.94 Avg I/O, 11.59% cpu

    4K, 50% Read, RAID 5 — 1952 IOPS, 7.63 MB/s, 38.21 Avg I/O, 4.36% cpu
    4K, 50% Read, RAID 10 — 5178 IOPS, 20.23 MB/s, 9.7 Avg I/O, 8.35% cpu

    4K, 15% Read, RAID 5 — 1437 IOPS, 5.61 MB/s, 88.42 Avg I/O, 2.67% cpu
    4K, 15% Read, RAID 10 — 3703 IOPS, 14.47 MB/s, 13.34 Avg I/O, 5.82% cpu

Comments are closed.