I'm trying to configure a MD3220i to go into production soon. But I hit an issue which I'm not sure whether it is hardware related or configuration related. I spent many days, trying various changes without success. I hope one of you might be able to help :o)
I've connected the array as per Dell recommendations - all the 8 ports are connected to a Dell PowerConnect 5324 switch which was configured to handle iSCSI traffic alone, again as per the recommendations. Six NICs of a Dell PowerEdge R710 are connected to the switch. I've configured 3 LUNs on the array and only 2 of them are mapped to the ESX 6 host running on the R710. I can see in the ESX host that the 3 ports that are connected to the RAID controller in slot 1 resets and loses connection roughtly every 13 to 15 minutes. As a result the LUNs all degrade and then get back to normal and the cycle repeats.
From the usage point of view, any access to the LUNs from ESX host seems to burden the host a lot and fail to respond for a couple of minutes. When I browse the datastores on the array, it seems to take a long time.
When I look at the array events, the RAID controller in slot 1 seems to reset itself roughly about every 15 minutes which drops all the NIC connections which makes the ESX server to degrade and the cycle continues. But there are no actionable errors.
In order to rule out cable and swich issue, I used a different set up network cables and I've connected the ESX host directly to the 6 ports at the back of the MD 3220i. But the problem remains :o(
I've attached the support data with this. I had to remove the trace-buffers from the archive as the original archive is 1.4 MB which is well over the attachment limit of 1MB.
Can someone confirm whether this is a RAID controller fault or a configuration fault?
I am going to send you a private message. Can I get you to email me the full support bundle so that I can review it? The one you uploaded I can’t unzip it as it states it is not complete. Also have you changed your timeout settings for ESX to longer than 30seconds or no? Do you also have your multipathing for iSCSI set to Round Robin?
Please let us know if you have any other questions.
DELL-Sam LDell | Social Outreach Services - EnterpriseDownload the Dell Quick Resource Locator app today to access PowerEdge support content on your mobile device! (iOS, Android, Windows)Get Support on Twitter @DellCaresPro
I've sent the complete support bundle in reply to your email.
I haven't touched the ESX timeout settings and the multipathing setting is MRU. Do you think these could cause the issues I'm seeing?
I reviewed both support bundles that you sent over. I am not seeing any messages other than informational messages. There are no errors that are reported or warning messages. When we see this that normally points to something in your setup is not correct.
You stated that you are using MRU for your multipathing. You will want to change that to Round Robin as that is the best practice for MD3xxx systems. I would also adjust your timeout settings as well & see if you are still getting the same issue happening.
I've seen quite a few warnings before I cleared them all. All of them are about "Virtual Disk Not On Preferred Path".
I've increased the timeout to 30 secs which didn't make any difference. I haven't tried the RR multipathing yet.
On a different note, another technician suggested using a 255.255.255.224 subnet as opposed to the 255.255.255.0 subnet I'm using. I was trying to change the iSCSI port settings using MDSM and the array became unresponsive from time to time. The RAID controller in slot 1 responds to ping requests but MDSM is unable to establish communication with it at all. It always connects using the IP address of the controller in slot 0. Also the amber battery LED in the RAID controller 1 briefly comes on when it resets. I've attached a couple of screenshots. Is this normal?
The alert on the system is due to a virtual disk not being on the preferred path. When virtual disks are created, they are assigned to one of the two controllers as a preferred communication path. If that path is not available – due to a network issue, controller reset, etc. – the array will communicate via the alternate controller. In some cases the communication does not shift back to the primary path when it becomes available again. To correct this you just need to manually redistribute the virtual disks back to their preferred paths.
To redistribute virtual disks:
1. Open Modular Disk Storage Manager (MDSM)
2. Click on the Support tab
3. Select Manage Raid Controller Modules
4. Click on Redistribute Virtual Disks
You will get an alert telling you that this will disrupt communications if you do not have the multipath drivers installed. You can ignore this message and proceed. If you still have access to the virtual disk that is not on preferred path, you have multipath drivers installed.
I would try testing using RR as that should resolve your issue.
Are you using in-band management or out-of-band management? If you are using the management ports on the controllers then you are using out- of- band management.