Forums

Storage

Storage
Information and ideas on Dell storage solutions, including DAS, NAS, SAN and backup.

Best pratice for vSphere 5

This question has suggested answer(s)

 

Hi all,

Considering of the following enviroment:

     1) 4 Physical Dell R710 Servers,each server has 16 Nics (Gigabit) will install
vSphere 5 std.

     2) 2 Switches with stack function for SAN Storage

     3) 2 Dell Equallogic PS4000xv SAN (Dual controller )

     4) 2 Switches for virtual machine traffic

Regarding to the networking, I plan do create some vSwitches on each physical
server as follows

     1. vSwitch0 - used for iSCSI storage

         6 NICs Teamed, with IP-hash teaming policy,  multipathing with iSCSI
Storage;and the stroage load balancing is Round Rodin(vmware)

         ( vmware suggests use 2
NICs for 1 IP storage traget, I am not sure)

     2. vSwitch1 - used for virtual machine

         6 NICs Teamed for virtual machine traffic, with IP-hash policy

     3. vSwitch2 - for managment

         2 NICs Teamed

     4. vSwitch3 - vMotion

          2 NICs teamed

Would you like kindly give me some suggestions?

All Replies
  • Here's a document www.equallogic.com/.../DownloadAsset.aspx  that covers how to configure iSCSI with EQL arrays.

    It's highly unlikely that you will need 6x GbE NICs for iSCSI with your configuration.  Two for ESX iSCSI use and two more for guest iSCSI initiators is best.  I.e. if you want to take advantage of ASM/ME for SQL or Exchange, directly connecting to the array from inside the Windows VM is the only way to accomplish that.  

    If you are going to use VMwares Round Robin you need to change the 'iops' value from 1000 to 3.   Otherwise you will not get max benefit from multiple NICs.

    Solution Title

    HOWTO: Change IOPs value // Round Robin for MPIO in ESXi v5.x

    Solution Details

    Setting default policy for EQL devices to Round Robin. So new volumes discovered will be set to Round Robin

    #esxcli storage nmp satp set --default-psp=VMW_PSP_RR --satp=VMW_SATP_EQL

    **These new volumes will still need to have the IOPs value changed.

    To gather a list of devices use:

    #esxcli storage nmp device list

    You'll need the naa.<number> that corresponds to the EQL volumes in that list. That the "device number" that is used in the nextx command.

    Existing volumes can be changed to Round Robin

    #esxcli storage nmp device set -d naa.6090a098703e30ced7dcc413d201303e --psp=VMW_PSP_RR

    You can set how many IOs are sent down one path before switching to the next. This is akin to rr_min_io under Linux.

    NOTE: This will only work if the policy has been changed to Round Robin ahead of time.

    The "naa.XXXXXXXXXXXXX" is the MPIO device name.

    You can get a list of devices with:

    #esxcli storage nmp device list

    naa.6090a098703e5059e3e2e483c401f002

    Device Display Name: EQLOGIC iSCSI Disk (naa.6090a098703e5059e3e2e483c401f002)

    Storage Array Type: VMW_SATP_EQL

    Storage Array Type Device Config: SATP VMW_SATP_EQL does not support device configuration.

    Path Selection Policy: VMW_PSP_RR

    Path Selection Policy Device Config: {policy=iops,iops=3,bytes=10485760,useANO=0;lastPathIndex=3: NumIOsPending=0,numBytesPending=0}

    Path Selection Policy Device Custom Config:

    Working Paths: vmhba36:C0:T1:L0, vmhba36:C1:T1:L0, vmhba36:C2:T1:L0, vmhba36:C3:T1:L0

    This also lets you confirm the path policy "VMW_PSP_RR" Which is VMware, Path Selection Policy, Round Robin" And not the IOPs value has already been set to '3'.

    #esxcli storage nmp psp roundrobin deviceconfig set -d naa.6090a098703e30ced7dcc413d201303e -I 3 -t iops

    #esxcli storage nmp psp roundrobin deviceconfig get -d naa.6090a098703e30ced7dcc413d201303e

    Byte Limit: 10485760

    Device: naa.6090a098703e30ced7dcc413d201303e

    IOOperation Limit: 3

    Limit Type: Iops

    Use Active Unoptimized Paths: false

    Lastly you need to disable 'DelayedAck' and 'LRO'

    Configuring Delayed Ack in ESX 4.0 or 4.1

    To implement this workaround in ESX 4.0 or 4.1, use the vSphere Client to disable delayed ACK.

    Disabling Delayed Ack in ESX 4.x or 5.0

    1. Log in to the vSphere Client and select the host.

    2. Navigate to the Configuration tab.

    3. Select Storage Adapters.

    4. Select the iSCSI vmhba to be modified.

    5. Click Properties.

    6. Modify the delayed Ack setting using the option that best matches your site's needs, as follows:

    Modify the delayed Ack setting on a discovery address (recommended).

    A. On a discovery address, select the Dynamic Discovery tab.

    B. Select the Server Address tab.

    C. Click Settings.

    D. Click Advanced.

    Modify the delayed Ack setting on a specific target.

    A. Select the Static Discovery tab.

    B. Select the target.

    C. Click Settings.

    D. Click Advanced.

    Modify the delayed Ack setting globally.

    A. Select the General tab.

    B. Click Advanced.

    (Note: if setting globally you can also use vmkiscsi-tool

    # vmkiscsi-tool vmhba41 -W -a delayed_ack=0)

    7. In the Advanced Settings dialog box, scroll down to the delayed Ack setting.

    8. Uncheck Inherit From parent. (Does not apply for Global modification of delayed Ack)

    9. Uncheck DelayedAck.

    10. Reboot the ESX host.

    Solution Title

    HOWTO: Disable LRO in ESX v4/v5

    Solution Details

    Within VMware, the following command will query the current LRO value.

    # esxcfg-advcfg -g /Net/TcpipDefLROEnabled

    To set the LRO value to zero (disabled):

    # esxcfg-advcfg -s 0 /Net/TcpipDefLROEnabled

    a server reboot is required.

    Also make sure you Dell switches, (6224's?) have current firmware and flowcontrol is enabled, ports, spanning tree portfast is set, and you are NOT using the default VLAN.   Put all the ports on another VLAN like 11 for example.   Jumbo frames don't work well on default VLAN.

    That should give you a great start.

    Regards,

    -don

  • With 16 NICs and a single PS4000, my recommendation would be (per host):

    - 2 NICs for iSCSI (the PS4000 only has 2 active iSCSI ports, so why 'waste' more than 2 NICs for iSCSI unless you plan to add an additional PS-array some time soon, or you want to do guest-attached-iSCSI (2 more NICs, but in 2 separate vSwitches))

    - 2 NICs for management (vSwitch0 typically)

    - 2 NICs for vMotion (2 VMkernels too, for more bandwidth (single VMkernel would use the 2 NICs for failover)

    - if you have Enterprise or Enterprise Plus, and want to use FT, 2 NICs reserved for this

    - rest for VM LAN access. In a flat network, you could split them up some. If you have a VLAN'd network and a need to have different VMs access different VLANs, you'd set up the vSwitch with multiple VLANs (multiple Virtual Machine Port Groups) and your physical switch should be set up to accommodate this.

    Check TR1074 (may have been linked by Don) on a proper vSphere5 setup with an iSCSI heartbeat.

    Member since 2003

  • One of my favorite series of posts is from a gentleman by the name of Ken Cline (who now works for VMware).  He put out a series of posts called, "The Great vSwitch Debate"  (You know it is a good post when you have to read it several times over to let it all soak in).  In part 7 kensvirtualreality.wordpress.com/.../the-great-vswitch-debate-part-7  he outlines configurations for various numbers of physical NICs.  While it was written in the middle of 2009, many of the practices still hold up very well today.  Take a look.

    If you are working with PowerConnect 6224 switches, you may see a mix of conflicting configuration information.  It will drive you nuts, and ultimately things won't work correctly.  Go to itforme.wordpress.com/.../reworking-my-powerconnect-6200-switches-for-my-iscsi-san where I outline the EXACT steps for building up the 6224's correctly.  

  • Thank you very much.

    regarding to 2 NICs for iSCSI, if I have TWO ps4000, then should I prepare 4 NICs for iSCSI for two ps4000? and if these 2 PS 40000 aren't in the same group, how many NICs do host need?

  • I find that the limit for iSCSI is on the host side.  Even with two PS4000's regardless if they are in the same group or not.

    ESX servers are rarely about hundreds of MB/sec.   They're about IOs/sec and latency.   You can monitor the network usage and add NICs later if needed.  

    -don

  • Hy Don,

    Thanks for advising and supporting the EqualLogic community.

    Why you configure iSCSI with RR and don't use MEM for vSphere?

    Using MEM will set a better RR policy? Or do you you prefer RR for other reasons?

    Regards,

  • Hello Afurster,

    I don't prefer RR at all. However, not everyone has an Enterprise or Enterprise+ license which VMware requires for PSP support.  MEM will definitely produce better results, especially in multi-member pools.  However, with more than 2x NICs it can also increase the connection count, so that too must be factored in.  Depending on your environment.

    My comment was "*IF* you are going to use RR, then adjust the IOPs to get the best performance.  The default of 1000 IOPs per path won't do that.

    Regards,

    -don

  • Hy Don,

    Thanks for you explenation! I forget PSP is only for Enterprise(+)... And yes we have multi-member pools (also benefits in Load Balancing). And 2 iSCSI nics per host. Total amount of connections is indeed an item to keep in mind (max 1024 per pool).

    Do you advice also change IOPs for MEM or does it only applies to RR because of the intelligent MEM technics?

  • Hy Don,

    (I can't edit my previous post, so I create a new reply)

    I read another post of you on this community site (reply in: iSCSI PS6100 Error - iSCSI login to target from initiator failed) and you said: "Lastly, if you are not using the EQL MPIO driver for ESXi v5, MEM 1.1,  then you should change the IOPs value in the VMware Round Robin from 1000 IOs between switching to next path, to 3.  Otherwise you won't get full benefit out of multiple interfaces."

    So it answers my question about the IOPs value for MEM ;).

    Do you have any other performance / optimizing tips for vSphere and iSCSI? I just found the White Paper "Latency-Sensitive Workloads in vSphere Virtual Machines" what also describes some interesting tuning tips.

    Regards,

    Arjen

  • Will Urban wrote this nice post on a few things that are of value with regards to your question.

    en.community.dell.com/.../data-drives-in-vmware.aspx   Dell also puts out a few other nice documents.

  • Re: MEM.  No you don't have to change that.  That's handled automatically by MEM.  In fact it uses a more efficient algorithm than just switching paths in rotation.  

    Also if you are using Raw Device Mapped (RDM) disks in your VM's, make sure the pathing for that is changed to RR and IOPs 3 as well.  (MEM would also set this connections up properly as well)

    Other common tweaks are disable DelayedACK  (its in the iSCSI initiator properties) and disable Large Receive Offload (LRO)

    Within VMware, the following command will query the current LRO value.

    # esxcfg-advcfg -g /Net/TcpipDefLROEnabled

    To set the LRO value to zero (disabled):

    # esxcfg-advcfg -s 0 /Net/TcpipDefLROEnabled

    NOTE: a server reboot is required.

    Info on changing LRO in the Guest network.

    docwiki.cisco.com/.../Disable_LRO

    ESXI v5.0 should also have the Login Timeout changed to 60 seconds.  (default is 5).  However, you have to be at build 514841 or better.  

    See VMware KB 2007680 kb.vmware.com/.../2007680

    One more common enhancement that gets missed often is creating a virtualized SCSI adapter for each VMDK (or RDM) in each VM.  (Up to 4x controllers)     If you look in the VM settings, each VMDK/RDM shows a "Virtual node"  

    It will say 0:0, for first drive, then 0:1, 0:2, etc...   Shutdown the VM, change the other VMDK/RDMs from 0:1 to 1:0, then 0:2 to 2:0 and so on.   This will tell ESX to create new SCSI controllers for each disk.  Really helps with Exchange/SQL/Sharepoint.   Things that have databases and logs or fileservers that have multiple VMDKs/RDMs

    *************************************************************************************

    *** Script to change EQL volumes to RR and set the IOPs to 3 in ESXi v5.0. ****

    *************************************************************************************

    This is a script you can run to set all EQL volumes to Round Robin and set the IOPs value to 3.  (Datastores and RDMs)

    #esxcli storage nmp satp set --default-psp=VMW_PSP_RR --satp=VMW_SATP_EQL ; for i in `esxcli storage nmp device list | grep EQLOGIC|awk '{print $7}'|sed 's/(//g'|sed 's/)//g'` ; do esxcli storage nmp device set -d $i --psp=VMW_PSP_RR ; esxcli storage nmp psp roundrobin deviceconfig set -d $i -I 3 -t iops ; done

    After you run the script you should verify that the changes took effect.

    #esxcli storage nmp device list

    Regards,

    -don

  • Good summary of considerations Don.

    I thought the changing of the SCSI adapter for each VMDK was one of the more interesting tidbits I pulled from Will's post.  I looked for some additional technical information that supports/explains the reasoning behind that in more detail, but couldn't find any.

  • The additional adapters allow the OS to be more multi threaded doing I/O.  Each disk has a negotiated Command Tag Queue depth, so more drives == more outstanding IO.  However, with 'SCSI/SAS" drives, only one "disk" can talk to the controller at a time.  If you hang multiple disks (VMDK/RDMs) off one controller they have to share the controller resource.  If you have one controller for each drive, more IO can be processed at once by the OS.   Even in the "old" IDE days you got best performance if each disk/cdrom was connected to one IDE/SATA/SAS controller.   If you notice they lay it out like SCSI, with 15 targets per controller.  

    -don

  • Well put.  Thanks Don.

  • You are very welcome.  

    Take care.

    -don