I recently purchased another PC6248 switch and opted to stack it with my preexisting PC6248. I then have LAGs running to 2 PC3448 and 2 PC3448P switches. I have several vlans that have routing interfaces on the stack of 6248s.
Recently, I began seeing excessive packet loss in video calls between the data center with the configuration above, and another data center. After eliminating the other possibilities, I turned my sights on the stack of 6248s and can see that when I attempt to ping endpoints local to my client, but on another vlan, ping times are very large, with frequent loss pings. I'm seeing 20-30% loss on UDP streams that traverse my stack.
I've had my share of problems with the 62xx switches and have seen countless posts from people experiencing the same problem, however, the usual suspects don't seem to be the cause of my problems.
I'm running the latest firmware on my stack.
I'm running RSTP on my network, with all client ports set to PFast. LAGs to my 3448 switches are PFAST.
Any help is greatly appreciated!
I would look at doing a traceroute along with your ping and see where the traffic is going exactly. Then you can go in and add a manual cost to a specific path to force the traffic in the right direction. It is possible that you have a loop even though you have RSTP set up.
You can also look at your routing tables to see where your switch is learning the path and verify that the route is learned thru the correct switch and LAG (Port Channel).
Hope you find this helpful.
Get Support on Twitter @DellCaresPro
Download the Dell Quick Resource Locator app today to access PowerEdge support content on your mobile device! (iOS, Android, Windows)
Not sure how a traceroute would have anything to do with RSTP. If RSTP was having issues, traceroute wouldn't show me that if the IPs aren't changing.. regardless, I haven't had a topology change in many days, so I don't see RSTP flapping as contributing to the problem.
Traffic is going to the right direction, but I'm going to try removing the Coreswitch as the default router for my test equipment and just utilize my GRE routers as the default route.. In the case, it should remove the core switch from the equation (as it pertains to routing)..
I'll post the results.. Thanks for your comments!
When you stacked the new PC6248 with the existing, did you make sure that the cabling of the two switches in the stack are as follows?
Unit 1/XG1 connects to Unit 2/XG2
Unit 1/XG2 connects to Unit 2/XG1
The connection must be in a crossed fashion or the stack will not operate properly.
I know you had mentioned this already but just wanted to make sure that all of your network switches have "spanning-tree portfast" enabled on all non-uplink ports? I had a very similar issue with PowerConnect 5348 and 5448s causing major issues when "spanning-tree portfast" was not enabled. I'm quite old school and found by manually setting portfast as opposed to using RSTP has worked quite well for me.
Thanks for the tip.. Upon checking the stack, I'm not setup in a crossed fashion as you specified. I had a tough time getting the switches to recognize that they were stacked and had altered my stack wiring several times to try and fix it..
I do have portfast enabled across the board for client switches, and thankfully, I'm not seeing topology changes unexpectedly.. I'll try the stack change and update the posting. Thanks again!
Here is a document from Dell on stacking PowerConnect 6200-series switches.
I've learned the hard way by connecting the stacking cables incorrectly (especially with more than 2 switches per stack) and you can definitely see errors and also very slow "copy running-config startup-config" times.
In some circumstances, the XG-ports on the back of the PowerConnect switches are set to Ethernet mode which will prevent you from stacking your switches. Changing the stack-port mode from Ethernet to Stacking is what you will need to do if you have issues.
So I swapped the stacking cables so they are daisy chained now. I initially saw ping times come back down to normal for pings to the management interface on the core switch, but I'm still having strange throughput issues.. Dell support has been "working" on this for over a week and they still don't have a test bed setup to mirror my current core environment (only two switches), so I'm continuing to try different things on my own... Today, the LAG group stopped operating up to one of my client switches after I disabled a single link trying to troubleshoot very slow throughput to the Internet on my client switches.... couldn't get the link to light back up to save my life.. verified spanning tree wasn't blocking the port, finally opted to reboot the switch.
For a setup as simple as mine, it seems like adding in the stack created all kinds of problems for me.. I'm fighting the urge to just scrap the stack and go back to a LAG between my core switches, but I really don't want to.
Are you experiencing high ping latencies between client endpoints connected to the stack only or when pinging to client endpoints on switches uplinked to the PowerConnect 6248 stack? If you're noticing high ping latencies to uplinked switches, check to see if there is a negotiation issue on the uplink port(s). Sometimes there are link negotiation issue on speed and duplexing. You may want to try hardcoding the speed to 1000Mbps and Full Duplex if possible on both ends and see if this works better.
In regards to your stack, did you verify after upgrading/uploading the latest firmware that the switch is actually running the latest boot code and firmware? Run the command "show bootvar" and make sure that the current-active image is the latest firmware on both switches in the stack. Also, run the command "show stack-port" and make sure Unit 1 & 2 XG1 & XG2 ports are configured as "Stack" and the link status is "Link Up" with a Link Speed of 12Gb/s.
When I run show version, I see both units running image2 for next-active and current-active. Image 2 is 220.127.116.11 on both switches. However, when I run show boot-version, it shows unit1 running 18.104.22.168, and unit 2 running 22.214.171.124...so it appears that the boot code is off?... I'm surprised the tech didn't pick up on this. but I have to imagine this is not good.
Regarding the Stack-Port command results, I'm link up on xg1 and xg2 on both switches, and link is at 12Gb/s for each one.
I will hardcode my speed and duplex. As an FYI, I didn't see any framing errors on the uplinks in the current autonegotiated state, but I'm sure it can't hurt to hardcode it since I won't be changing the configuration.
I'll need to remediate the boot code issue asap, can't believe the tech didn't pick this up in the pages of logs I sent over, unless he thought it wasn't a problem (which I doubt).
The bootcode may or may not fix the issue, but I would definitely make sure the bootcode is the same as the firmware to be on the safe side. I know for sure if you were to do a major firmware release update (e.g. v2 to v3) you have to update the bootcode or the firmware will not boot at all.
Hope this works for you. I would also do a reboot of the stack if you are able to after both boot codes are at the same level.
Bootcode update didn't fix the issue for me. I found another test case that is easier implement, yet stranger in behavior.
If I have a client connected through coreswitch2 in the stack, and run a speedtest to the Internet, download maxes at 10mbps and upload at .4mbps. I connect the client to coreswitch1 on the stack, and run the same speedtest, and see 50mbps download and 10mbps upload (the correct answer)....
Checked stack counters and ethernet counters after all tests, and the counters are green - no framing issues, collisions, etc.
Interestingly, if I test file transfer between a client on coreswitch2 and coreswitch1, speed seems to be fine..
One of the strangest issues I've seen in 10 years in IT.
The issue doesn't seem to be tied to hardware.. I failed the management unit over to coreswitch2, and now coreswitch1 is the one showing the slow download speeds to the Internet... I've had it with this stack.. Getting ready to go back to separately managed switches.. What a waste of my time.