So on Tuesday we finally had the chance to set up our PS6100 that has been sitting in a box for several months. I connected it to the SAN, logged in via the serial interface and was able to successfully add it as a member into our group ( we have a PS6000). Everything seemed to go well and as expected-it ran through the RAID verification, started moving volumes over automatically, etc. I programmed the management interface and placed it on our production network so we could connect to it from our workstations...that all seemed to go fine. Later that night I updated both members to firmware v6.0.2. That went as expected. Then the fun began. Random things started dropping off our network..almost like either a network loop or broadcast storm. I quickly shut down the port that the management card was connected to but things escalated from there. after an hour or two and resetting various network switches etc everything quieted down. Later that night I re-enabled the port for the management nic and observed no problems for an hour or two and then went to bed. The next morning I woke up to hundreds of alerts that random devices were going offline again. So again, I shut down the management interface port and removed it. I just got around to looking at the port stats on the switch and see that there was multicast traffic being generated....why? Is it possible the management interface of the PS6100 flooded the network with multicast traffic? I noticed the Data Center Bridging feature after all of this happened....would that being enabled have caused the issue? I'm a little apprehensive about plugging this thing back in to manage it. Here's the output from a sh interface command on our Cisco 2960 switch that it was connected to-note the output drops and multicast traffic:
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 565803Queueing strategy: fifoOutput queue: 0/40 (size/max)5 minute input rate 0 bits/sec, 0 packets/sec5 minute output rate 0 bits/sec, 0 packets/sec 47774 packets input, 33031106 bytes, 0 no buffer Received 1908 broadcasts (1893 multicasts) 0 runts, 0 giants, 0 throttles 0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
DCB is only applicable to 10GbE environments.
is Dedicated Mgmt configured on both arrays or just the new one? It has to be all members in a group or none.
What is the interconnect bandwidth on your 2960s? Are they directly connected or go to a core switch? When adding additional members to a group, the load on the network greatly increases. The 2960 is not ideal for iSCSI SAN use.
The dedicated mgt was configured for only the new member-we were using all 4 interfaces on the old member for iSCSI use.
I have since removed the old member from the group in order to relocate it to another site and set up replication between the two devices.
The 2960 I was referring to was what the management interfaces of the new member was attached to-no iSCSI traffic was being carried over this switch. We have a separate, isolated network of two switches handling the iSCSI traffic between the members and servers.
So maybe not having dedicated management configured across both members created the issue?
What kind of switches are you using for iSCSI?
I knew that question was coming. We are using two Cisco 2960G gigabit switches interconnected via an etherchannel link that spans 4 ports on each switch.
When we bought the first member we were working with a Dell partner vendor. Their sales support was pretty lousy and they did not explain to us at the time that the 2960G did not support the flow control requirements that the SAN needed. They gave us the option for Dell switches but we told them we preferred Cisco 2960's since we had experience with them. One would think that their sales engineers would raise a red flag before allowing us to go down that route.
Regardless, we have been up and running on that configuration since 2011 without issue. I'm trying to get funding to replace the Cisco's with a couple of Dell 6xxx series switches...hopefully I can get that accomplished soon.
Everything is stable ever since removing the management NIC's from our production network. Now that the group is just a single member I am considering re-configuring the management interface settings and trying it out again.
I'm not aware of any flowcontrol issues with the 2960G specifically. There is an IOS issue fixed in recent builds where flowcontrol wouldn't always be properly negotiated.
Array / server ports should be set for flowcontrol receive desired and spanning tree portfast. The trunked ports should have flowcontrol off, since the 2960 can't send flowcontrol, only receive them. Maybe that's what they were referring to?
Flowcontrol is important with iSCSI, the traffic pattern is bursty and without proper buffering on the switch during a flowcontrol pause, retransmission rates go up and impact performance. Which is one of the weaknesses of the 2960G series.
Re: Mgmt port. Just make sure you properly enable dedicated management per the admin guide instructions. Just putting the last network port onto a different subnet isn't enough.
Yes we had the settings (i.e. flowcontrol receive desired and spanning tree portfast) applied to ports for the array and servers. The trunk ports do not have flowcontrol enabled.
We re-enabled the management interface yesterday morning and things have been fine ever since. I'm not sure if us not applying the mgt settings across both arrays caused the original issue or not, but it's not a factor now since we deleted the original member and can connect via the dedicated management port on the new array.
On a related note-if we get around to building a new server and installing SAN HQ, does the it need to be connected to the SAN network in order to interpret data on the array, or can it do so using the management interface? I guess if it needs to be attached to the SAN network I could just plug a second NIC into our production network for alerts and management.
Thanks for all of information..greatly appreciated!