10-19-2018 04:38 PM - edited 03-08-2019 04:25 PM
I have a bit of an odd situation. At my DR site, I used to have 2 3560 switches that were port-channeled together. I recently swapped those switches out for 2 3750X switches in a stack, and copied the identical configuration on them as the original 3560 switches.
I have a 1gb WAN connection from my main site that I use mostly for my SAN replication, and with the old 3560 switches, I was able to max out that circuit and push almost a whole 1gb bandwidth. After this swap with the new 3750X switches, I can't get it to pass more than 200mb on that port. Like I said, these switches have the same config on them, so there's nothing new there. There is also no QoS either. I've checked ports for errors, and there are none and they are negotiated properly at 1000-full. I'm out of ideas on things to check, and would greatly appreciate any guidance of things I could look at.
Thanks!
drcore01-3750x#sh int gi1/0/48 GigabitEthernet1/0/48 is up, line protocol is up (connected) Hardware is Gigabit Ethernet, address is 6c20.564d.4ab0 (bia 6c20.564d.4ab0) Description: cox 1gb metroE MTU 1500 bytes, BW 1000000 Kbit/sec, DLY 10 usec, reliability 255/255, txload 2/255, rxload 49/255 Encapsulation ARPA, loopback not set Keepalive set (10 sec) Full-duplex, 1000Mb/s, media type is 10/100/1000BaseTX input flow-control is off, output flow-control is unsupported ARP type: ARPA, ARP Timeout 04:00:00 Last input 00:00:01, output 00:00:00, output hang never Last clearing of "show interface" counters never Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 142 Queueing strategy: fifo Output queue: 0/40 (size/max) 5 minute input rate 195645000 bits/sec, 17395 packets/sec 5 minute output rate 11115000 bits/sec, 12417 packets/sec 317861817 packets input, 444836613365 bytes, 0 no buffer Received 164662 broadcasts (161104 multicasts) 0 runts, 0 giants, 0 throttles 0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored 0 watchdog, 161104 multicast, 0 pause input 0 input packets with dribble condition detected 237971247 packets output, 40530223909 bytes, 0 underruns 0 output errors, 0 collisions, 1 interface resets 0 unknown protocol drops 0 babbles, 0 late collision, 0 deferred 0 lost carrier, 0 no carrier, 0 pause output 0 output buffer failures, 0 output buffers swapped out
drcore01-3750x#sh run int gi1/0/48 Building configuration... Current configuration : 190 bytes ! interface GigabitEthernet1/0/48 description cox 1gb metroE switchport trunk allowed vlan 1,15,501,521,920,980 switchport trunk encapsulation dot1q switchport mode trunk end
Switch Ports Model SW Version SW Image ------ ----- ----- ---------- ---------- * 1 54 WS-C3750X-48 15.2(4)E6 C3750E-UNIVERSALK9-M 2 54 WS-C3750X-48 15.2(4)E6 C3750E-UNIVERSALK9-M
10-24-2018 12:18 PM
Hello,
--> Gi1/0/48 Root FWD 4 128.48 P2p
This means that the root switch is on the other side. Can you issue the same command on the HQ switch ?
10-24-2018 12:22 PM
Here you go. Thanks!
SDED01-3750#sh spanning-tree vlan 1 VLAN0001 Spanning tree enabled protocol ieee Root ID Priority 32769 Address 0022.0ca9.8900 This bridge is the root Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec Bridge ID Priority 32769 (priority 32768 sys-id-ext 1) Address 0022.0ca9.8900 Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec Aging Time 300 sec Interface Role Sts Cost Prio.Nbr Type ------------------- ---- --- --------- -------- -------------------------------- Gi1/0/28 Desg FWD 4 128.28 P2p SDED01-3750#sh spanning-tree vlan 980 VLAN0980 Spanning tree enabled protocol ieee Root ID Priority 33748 Address 0022.0ca9.8900 This bridge is the root Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec Bridge ID Priority 33748 (priority 32768 sys-id-ext 980) Address 0022.0ca9.8900 Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec Aging Time 300 sec Interface Role Sts Cost Prio.Nbr Type ------------------- ---- --- --------- -------- -------------------------------- Gi1/0/28 Desg FWD 4 128.28 P2p SDED01-3750#sh run int gi1/0/28 Building configuration... Current configuration : 240 bytes ! interface GigabitEthernet1/0/28 description cox 1gb metroE net switchport trunk encapsulation dot1q switchport trunk allowed vlan 1,15,501,521,920,980 switchport mode trunk switchport nonegotiate speed nonegotiate end
10-24-2018 12:45 PM
I guess you don't know if the old 3560 at the DR site was the root or not...?
Either way, judging from the graphs you posted earlier, you have a lot more outgoing than incoming traffic. Which of the two switches is more central to the network, the 3750 at the DR site or at the HQ site ?
10-24-2018 12:52 PM
Yeah, I couldn't say now that it's not connected anymore.
The outgoing traffic is heavy because it's all san replication traffic from HQ to the DR site. I guess you could say the 3750G at HQ is the more central switch, because that's our main datacenter and all the remote sites come back to that. All the remote sites can connect to DR through the metroE, but there's really nothing for them there because everything is hosted at HQ.
Thanks
10-24-2018 01:00 PM
In that case it make sense that the HQ switch is the root...
I wonder if the problem is the SAN traffic. Is it possible, for the purpose of testing, to send a 'regular' file of considerable size across the link and see how long that takes ?
10-24-2018 02:55 PM
We did do that once already, but I can do it again. I'll let you know.
Thanks!
10-25-2018 10:35 AM
Hello,
just to be sure that somewhere in the path MTU is not a problem, I would send a few pings to the HQ site server, with the DF bit set, with different sizes to check at what size packets get fragmented, e.g.:
ping -f -l 1472 192.168.1.1
10-25-2018 03:28 PM
10-25-2018 09:34 AM
10-25-2018 03:16 PM
10-26-2018 02:28 PM
So, it appears to be a hardware problem with the switches. We put wireshark on one of the machines at the DR location, and even doing a simple speed test to the internet shows a ton of TCP retransmits, but only on the upload. The download is much better. We also sniffed packets when doing iperf tests between machines that were both on the same switch there, and going between the two switches in the stack. All tests show multiple retransmits which dramatically affected performance. We also disabled global mls qos which actually improved performance on the upload to the internet. Uploads were going about 2mb/s, and after qos was off uploads increased to about 29mb/s.
I'm not quite sure what's going on, but at this point we have decided to go and put the old switches back in until we get new ones, probably 9300-48 ones. The only thing I can think of is that there's some kind of hardware problem on them, because they had three different ios versions on them all of which didn't make any difference.
Thanks everyone for your help.
10-26-2018 02:34 PM
Hello,
how old are these switches ? I would in any case get with TAC (if you have a service contract), see what they say...
10-26-2018 02:38 PM
I actually just bought them refurbished, so no smartnet. :( The manufacture date is 2012 so they're about 6 years old. Should still work fine, as I'm running Cisco stuff older than dirt. (those 3560's that were out there were much older than that!)
I've had good luck with refurbished stuff in the past, just not this time. You win some, you lost some, I guess.
10-26-2018 02:39 PM
you still get better one in the market, most of the vendor refurb support if you are loyal customer replacements.
10-26-2018 02:45 PM
Any new ones I buy (9300) are going to be legit new ones.
Thanks
Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: