Very slow READ performance with MD3200i

Storage

Storage
Information and ideas on Dell storage solutions, including DAS, NAS, SAN and backup.

Very slow READ performance with MD3200i

This question has suggested answer(s)

I'm reposting a thread here from another forum regarding extremely slow read performance with MD3200i in hope for some insights. I apologize for the length, but I'm hoping to be thorough.

Network Info:

In our network, there are two SANs, an MD3000i (15 disk 15k SAS) and a MD3200i (12 disk 10k SAS). They're both connected to the same two switches dedicated just to the SANs.

We have 4 physical servers, each with 2 dedicated Broadcom iSCSI TOE NICs. MPIO is enabled.

We're running Windows 2008 R2 with Hyper-V on the physical servers, and we have a bunch of VMs on the SANs.

The MD3200i was installed some months ago, and a few VMs were created on it - but the bulk of them still remain on the MD3000i. We are not using jumbo frames at this point.

The problem:

A few days ago I noticed very slow I/O performance on the VMs on the MD3200i. Read speeds were as slow as 2MB/s while copying files off the VMs stored on the MD3200i.

I started investigating so I made a new 1GB LUN on the array and attached it to one of the physical servers. Same ~2MB/s read speed, but the write speed seemed ok.

I took the following benchmark at that point:

Note how the read speed drops with 128KB block sizes but seems fine with 32KB blocks.

 

After reading further forums, I found that some people disabled MPIO and that fixed the problem for them. So I proceeded by disabling MPIO and reran the test:

The read speed is slightly improved without MPIO, but now the slowdown occurs at 32KB block size

 

Here is the exact same test in the same conditions using our other MD3000i array:


So I started from scratch:

I started by reconfiguring my iSCSI host ports from scratch in the controller:

 

Controller 0

Port 0: 192.168.133.110 (Switch 1)

Port 1: 192.168.131.110 (Switch 1)

Port 2: 192.168.132.110 (Switch 2)

Port 3: 192.168.130.110 (Switch 2)

 

Controller 1:

Port 0: 192.168.133.111 (Switch 1)

Port 1: 192.168.131.111 (Switch 1)

Port 2: 192.168.132.111 (Switch 2)

Port 3: 192.168.130.111 (Switch 2)

 

After reconfiguring the IPs however, even though I could ping them, the iSCSI initiator would throw errors and was unable to connect.


I don't have physical access to the SAN (it's colocated) to reboot it. So I had to use the SMCLI to reset both controllers on the SAN:

 

smcli.exe -n san2 -c "reset controller [0];"

smcli.exe -n san2 -c "reset controller [1];"

 

After both controllers were reset, the iSCSI Initiators were able to connect again fine.

Strangely, the read speed jumped up a bit:

However, now I can't seem to get it past 25 MB/s or so sequential read speed. The write speed has never been an issue.

 

I also ran an IOMeter test on LUNs from both arrays with the following results: (Max Throughput - 100% Read)

MD3200i - 699 IOPS / 21 MB/s

MD3000i - 5519 IOPS / 172 MB/s

 

Here are a few other resources showing more or less the same problem:

http://en.community.dell.com/support-forums/storage/f/1216/t/19351578.aspx

http://www.experts-exchange.com/Hardware/Servers/Q_26719712.html

http://www.passmark.com/forum/showthread.php?t=2773

And the thread that this post is based from: http://communities.vmware.com/thread/304340?start=0&tstart=0


Any ideas on how to get this fixed? We'd open a support ticket but from our experience, that won't help us much at all, so I'd rather ask here and see if someone has any ideas.

 

All Replies
  • Bizar one....     You would expect writes to have the issue before reads.     

    I only have experience with the MD3000i....

     

  • I'm getting the same issues with my MD3200i. I can write files over the link at nearly 500MB/s. Copying the same file back gives me less than 5 MB/sec. I've tried all sorts of settings, such as disabling TCP offload Engine, removing the 2nd iSCSI link, thinking it was a MPIO issue, played with jumbo frames ... i just can't get anything to work.

    I found this thread, where they were playing with block sizes, but I'm not sure what the gentleman's solution was exactly.

    http://groups.google.com/group/open-iscsi/browse_thread/thread/37741fb3b3eca1e4?pli=1

    I'm really curious as to the solution as well.

  • I ran the same tool you did after creating a new 2 GB lun. These are the results I'm seeing

  • Man, that is messed up..    Wacked as they say.......

    We are just now signing a deal to move away from dell storage.  As our MD3000i's age and are now well beyond end of life as far as the Dell sales channel is concerned.   The decision to move is looking better and better for us.   :)   It's a shame, the MD3000i's really were great for us overall and we can't afford any dell offerings "better" than the powervaults

    Some sort of issue in their cache programming or something.   Like only fixed with an FW update or something.  

    The fact you guys's writes are as expected really ruels out any sort of config issue as being the cause as far as I am concerned.

     

  • I lowered the Jumbo Frames size, as a test, to 2000 and noticed considerable improvement in speed. I tried multiple settings to try to find a sweet spot. 7000+ definitely takes a performance hit. I settled around 6500 right now.

    This is with a Jumbo Frame setting of 6500

  • Interesting.  I would assume you've got the MD3200i Jumbo Frame setting at 9000?  Were these changes being done on the 3200i or the NIC?

    Also, apologies if I missed it in a previous post, but what switches are you using here?  I've seen varying results from different switches (good example, the PowerConnect 5324 is NOT rated for iSCSI traffic, due to rather small buffers), and I'd like to know the entire environment.

    It makes some sense that larger I/O sizes are benefitting from jumbo frames - since those will be more efficiently transferring data, whereas the smaller I/O sizes would be just fine with the fragmentation that would occur.

    When trying to use larger than 7000 byte frames, have you considered using wireshark to see if you're getting a lot of packet retransmits?  That's usually what we've seen when things start to drag.

    Btw, to speak to an early comment about engaging support - we really *would* appreciate it if you did contact us.  I know that performance issues can be a bear to work, in that they take time, a lot of data collection has to be done, and there's quite a bit of trial and error involved.  But if there's a problem discovered in the firmware code, configuration, etc. that needs to be addressed - support really does need to know if there are problems that we need to bring up with engineering/development.

    if you've got a service tag for the system, I can take a look at the history and see if there's anything of note.  If you've got an open case with support on this, please let me know and I can also look into that.

    Regards,

    Andrew

  • I did contact Dell Enterprise support myself (SR# 832508837). I've been working with Andrew B from Dell.

    I changed the jumbo frame size at both ends .. the NIC and the 3200i NICs. Right now, i'm using a MTU of 6500.

    I am not using a switch. I direct connect the 2 servers to the 3200i.

    I did make one of the engineers i originally worked with to spec this system out know of the issue. he was communicating with the product group as well.

  • Note: in our case (the original post) we are NOT using Jumbo Frames. However, we have an older MD3000i also not using jumbo frames that works properly right beside it.

    We'd open a support case, but is there any way we can do so online? We'd rather not sit on hold, then try to explain the problem for a couple of hours to a technician on the line, we have a relatively small team and our time is valuable. (last time we called support for a backup issue regarding CSVs, none of the technicians we talked to could understand what a CSV is, so unfortunately our patience is wearing pretty thin)

    Thanks.

  • Did you also get the High Performance Tier option?

    What is your Cache Block size on the 3200i? (found under Storage Array -> Change->Cache Settings) I'm reading that for sequential workloads, use 32K, and for Random, use 16K. For mixed, try starting at 16K. I believe this can be changed on the fly (not 100% sure, but that's what the HPT white paper says). I think the default is 4KB.

    That might get you some improvements in performance.

  • we struggled with this same problem for a while.  turned out all that was needed was this on all hosts:

    netsh int tcp set global chimney=disabled

    netsh int tcp set global rss=disabled


    and we could then read from the md3200i at the same rate we could write to it, ie circa 100MBps across a gigabit network.  no fancy switch config, no jumbo frames, no updated broadcom drivers, just those two lines.

     

    problem is simply that broadcom 5709c nics don't really support toe in any shape or form...

     

    hope this helps a few people sidestep the blood sweat and tears we went through to get here.

     

    cheers

    c.

     

     


  • Hopefully that helps someone, but it isn't the case here.

    We're using Intel NICs, and we have Broadcom hardware iSCSI in a couple other servers and all exhibit the same problem. It's definitely not the NICs or settings, and just read is affected. Write is fine.

     

  • Once you entered these netsh command did the server need a reboot or the NICs disabling and reenabling to take effect?  Thanks.

  • I had the same problem!!!!!!!!!!!! ~2MB/s read speed, write speed was okay


    The solution for us was, to disable the Flow Control

    Disable Flow Control on the Switch!!!

    I hope, is helpful for you.
    Thomas

  • Hi, Thomas

    My MD3200i have same issue.

    Just follow your advice, I have disable flow control on the switch ports where MD3200i connected.

    But I have not see any performance difference.

    Should I disable flow control on all ports include ESXi connected?

    Thanks you.

  • We struggled with this same issue for a couple of months, for us the resolution was the switch is the culprit, it comes with some very unintuitive configuration changes necessary. PowerConnect 5524 that was only being used as an iSCSI switch (no handoffs to other networks)

    Switch Commands:

    (config mode)

    no iscsi enable

    no spanning-tree

    copy run start

    Then upgrade the firmware to the latest version and bounce the switch

    It went from 2mb/s to ~120mb/s