cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
876
Views
0
Helpful
13
Replies

Need some help with Network Loop - urgent please if possible !!

Sheikh Islam
Level 1
Level 1

Hi ,

We have a large network with 50 odd edge switches of c3750x and c3560 with 2 C6509 core switches. We have STP setup. Each of our cabinets ( switches or stack of switches ) have 2 links to each core. 

Couple of days ago we had a broadcast storm and that had resulted in the couple of switches to 99% cpu usage and the core switches going to 50% cpu usage. While we were investigating and trying to find the problem our stack of 4 3750x backbone switch, that holds all comms between server, SAN etc, went upto 99% cpu usage and most of the network became unresponsive.

We tested uplinks on this stack and then restarted the stack with no results. At this point the core switches went upto 99% and was unresponsive as well. We had the core switches restarted at this point and the systems were accessible. We then started fault finding and turning off one cabinet at a time to pin point the source of the ARP requests that is flooding the core.

We then found the stack of four 3750x switches that was causing this issue by turning off the links. We then turned on one link at a time and found that having both links on will cause this issue to happen again. We have since swapped all cable and sfp's on this stack and the core and tested again - same results. having this 2 links on at the saem time cause out backbone switch go upto 99% cpu usage and core switches to go up to 50% cpu usage. eventually the cores will become unresponsive if left like this.

Now, while troubleshooting at the beginning we had unplugged 1 link from serverbackbone switch to pri-core switch. We havent plugged it back in until yesterday when I plugged it back in and that caused the CPU usage of the servebackbone switch to go 99% and we had the ARP requests coming with the mac address of the root bridge. We had ARP debug and then terminal monitor on the serverbackbone to view this. Here are some logs - 

SERVERBACKBONE#sh proc cpu sorted
CPU utilization for five seconds: 98%/6%; one minute: 98%; five minutes: 99%
 PID Runtime(ms)     Invoked      uSecs   5Sec   1Min   5Min TTY Process
 234   246760662   652863904        377 25.32% 25.68% 25.57%   0 HULC DAI Process
 158   358036994  1223826022        292 23.24% 23.19% 23.35%   0 Hulc LED Process
 243   368036878   823215038        447 15.86% 15.53% 15.44%   0 IP Host Track Pr
  80  3706752493   627459056       5907  4.64%  4.45%  4.55%   0 RedEarth Tx Mana
  79  1243780293  1046208607       1188  4.16%  3.88%  3.96%   0 RedEarth I2C dri
 122   801262125    49086849      16323  2.40%  2.13%  2.15%   0 hpm counter proc
 100    62847439  1249279689         50  1.76%  2.16%  2.19%   0 HLFM address lea
 188   129231690   399861229        323  1.12%  0.76%  0.71%   0 Auth Manager
   4   943455916    41214178      22891  0.96%  0.84%  0.80%   0 Check heaps
 227   260103995   610366929        426  0.80%  1.09%  1.13%   0 Spanning Tree
 236   129781056   489282578        265  0.80%  0.58%  0.56%   0 HRPC ip device t
 PID Runtime(ms)     Invoked      uSecs   5Sec   1Min   5Min TTY Process
   1     1142277    15160342         75  0.80%  0.27%  0.25%   0 Chunk Manager
 150    11656436   241612154         48  0.64%  0.58%  0.58%   0 Hulc Storm Contr
  38    43247367     2617923      16519  0.64%  0.41%  0.12%   0 crypto sw pk pro
  89   219361151   628845123        348  0.64%  0.73%  0.78%   0 hrpc <- response
 118    89165092   983004879         90  0.48%  0.64%  0.59%   0 hpm main process
 239    51429378    48581317       1058  0.48%  0.46%  0.47%   0 PI MATM Aging Pr
 298    36570618   107544262        340  0.48%  0.17%  0.19%   0 Marvell wk-a Pow
 172   188017913     9765876      19252  0.48%  0.42%  0.42%   0 HQM Stack Proces
 173   167841714    58364536       2875  0.32%  0.33%  0.32%   0 HRPC qos request
 240    16193895   500821238         32  0.32%  0.34%  0.30%   0 UDLD
 297    47265954   135448215        348  0.16%  0.16%  0.15%   0 Inline Power 

 

I checked logs on solarwinds monitor and this is what started around that time - 

 

22/05/2015 09:38:54hh_cab06_sw1Warning991325: Host c8f9.f958.c000 in vlan 910 is flapping between port Gi3/1/1 and port Gi1/1/1
22/05/2015 09:38:52hh_cab06_sw1Warning991324: Host c8f9.f958.c000 in vlan 900 is flapping between port Gi1/1/1 and port Gi3/1/1
22/05/2015 09:38:52hh_cab06_sw1Warning991323: Host c8f9.f958.c000 in vlan 900 is flapping between port Gi1/1/1 and port Gi3/1/1
22/05/2015 09:38:51hh_cab06_sw1Warning991321: Host c8f9.f958.c000 in vlan 900 is flapping between port Gi1/1/1 and port Gi3/1/1
22/05/2015 09:38:49hh_cab06_sw1Warning991316: Host c8f9.f958.c000 in vlan 910 is flapping between port Gi1/1/1 and port Gi3/1/1
22/05/2015 09:38:49hh_cab06_sw1Warning991317: Host c8f9.f958.c000 in vlan 902 is flapping between port Gi3/1/1 and port Gi1/1/1
22/05/2015 09:38:48hh_cab23b_sw1Warning22743: Host 00c0.b767.6b64 in vlan 902 is flapping between port Fa0/24 and port Gi0/1
22/05/2015 09:38:46hh_cab23_sw1Warning187822: Host 00c0.b767.6b64 in vlan 902 is flapping between port Gi1/1/1 and port Gi1/1/4
22/05/2015 09:38:46hh_cab23b_sw1Warning22742: Host 00c0.b767.6b64 in vlan 902 is flapping between port Gi0/1 and port Fa0/24
22/05/2015 09:38:46hh_cab23_sw2Warning16432: Host 00c0.b767.6b64 in vlan 902 is flapping between port Gi0/4 and port Gi0/2
22/05/2015 09:38:45hh_cab06_sw1Warning991313: Host c8f9.f958.c000 in vlan 900 is flapping between port Gi1/1/1 and port Gi3/1/1
22/05/2015 09:38:45hh_cab06_sw1Warning991312: Host 00c0.b767.6b64 in vlan 902 is flapping between port Gi1/1/1 and port Gi3/1/1
22/05/2015 09:38:43hh_cab06_sw1Warning991309: Host c8f9.f958.c000 in vlan 900 is flapping between port Gi1/1/1 and port Gi3/1/1
22/05/2015 09:38:43hh_cab06_sw1Warning991308: Host c8f9.f958.c000 in vlan 900 is flapping between port Gi1/1/1 and port Gi3/1/1
22/05/2015 09:38:42hh_cab06_sw1Warning991306: Host c8f9.f958.c000 in vlan 900 is flapping between port Gi1/1/1 and port Gi3/1/1
22/05/2015 09:38:35hh_cab06_sw1Warning991302: Host c8f9.f958.c000 in vlan 910 is flapping between port Gi3/1/1 and port Gi1/1/1

 

 

So, I have trying to find the root cause of this problem. We do need to turn this secondary link back the cabinets where it started - CAB06 and serverbackbone where we found the cpu usage going high first.

 

Any ideas guys ?

 

Regards,

 

Sheikh

 

13 Replies 13

Markus Benz
Level 1
Level 1

Hi Sheikh,

could you provide a Layout of your configuration.

This sounds a bit like a loop.

Regards,
Markus

HiMarkus,

here is what it look like basically.         

 

Cheers,

 

Sheikh

Is Core Switch 1 and 2 in a VSS configuration, or are the individual switches?

If they are in VSS mode. The uplinks from Cap06 must be configured as trunk.
 

I saw you have a lot of spanning-tree manipulations. What is the reason for it?
For this environment you should only make sure Core 1 and 2 are root, the rest can stay default from my point of view.
You should only tune spanning-tree if you 100% understand what your doing and why, otherwise you may easily break the network with it.

Regards,
Markus

HI Markus,

 

Core1 is the root and Core2 is standby. Individual switches. Removing STP will probably require a downtime - cant afford a downtime at the moment. I need a change control for that.

 

It is most likely a STP issue. but the question is how to be sure and how to find the rot cause/loop.

 

Regards,

 

Sheikh

 

 

 

 

ok... thats good...

Do you run PV-RSPT+ ?
Please check the status for all vlan's involved.

Also check spanning-tree on the other switches, if there is no inconsistency.
 

I would go back to a basic config an make sure if everything works as expected.
Standard PV-RSTP+ config with Core 1 and 2 as root.

If that works fine, you can start adding spanning-tree configurations step by step.

Or you troubleshoot the current environment. But as far as I understand this is a productive environment and you cannot just re-create the problem and troubleshoot, correct?

Regards,
Markus

ok, I think I understand the topology.

The uplinks are ether-channels, correct? So you have 4 uplinks in total, 2 to each core switch?

Are you a 100% sure that Core 1 and Core 2 are elected Spanning Tree Root?
Could you please verify this. (show spanning-tree, and you should see a "this bridge is the root)

If you're not 100% sure about you're spanning-tree config and root guard etc. please remove all of it and check whether it works or not.

I am almost certain it is a spanning-tree issue.

Regards,
Markus

devils_advocate
Level 7
Level 7

Can you provide the config for the ports from both the 3750 stack side (i.e the ones you narrowed down the problem to) and the switch they connect to?

Hi,

 

her are the relevant configs - 

 

this is the switch that gets hammedred when link turned on cab06

SERVERBACKBONE#sh cdp neigh
Capability Codes: R - Router, T - Trans Bridge, B - Source Route Bridge
                  S - Switch, H - Host, I - IGMP, r - Repeater, P - Phone,
                  D - Remote, C - CVTA, M - Two-port Mac Relay

Device ID        Local Intrfce     Holdtme    Capability  Platform  Port ID
THHCORE2.hilldomain.thh.nhs.uk
                 Gig 3/1/2         158             R S I  WS-C6509- Gig 9/4
THHCORE2.hilldomain.thh.nhs.uk
                 Gig 3/1/1         165             R S I  WS-C6509- Gig 9/3
SERVERBACKBONE#sh run int gi 3/1/1
Building configuration...

Current configuration : 329 bytes
!
interface GigabitEthernet3/1/1
 description THHCORE2
 switchport trunk encapsulation dot1q
 switchport trunk allowed vlan 42,43,53-57,200,300,607,620,900,902,905,910
 switchport trunk allowed vlan add 930-933,935,936,940,950-953,990,995,996
 switchport mode trunk
 spanning-tree guard loop
 channel-group 2 mode desirable
end

SERVERBACKBONE#sh run int gi 3/1/2
Building configuration...

Current configuration : 329 bytes
!
interface GigabitEthernet3/1/2
 description THHCORE2
 switchport trunk encapsulation dot1q
 switchport trunk allowed vlan 42,43,53-57,200,300,607,620,900,902,905,910
 switchport trunk allowed vlan add 930-933,935,936,940,950-953,990,995,996
 switchport mode trunk
 spanning-tree guard loop
 channel-group 2 mode desirable
end

SERVERBACKBONE#sh run int gi 1/1/1
Building configuration...

Current configuration : 339 bytes
!
interface GigabitEthernet1/1/1
 description THHCORE1
 switchport trunk encapsulation dot1q
 switchport trunk allowed vlan 42,43,53-57,200,300,607,620,900,902,905,910
 switchport trunk allowed vlan add 930-933,935,936,940,950-953,990,995,996
 switchport mode trunk
 shutdown
 spanning-tree guard loop
 channel-group 1 mode desirable
end

SERVERBACKBONE#sh run int gi 1/1/2
Building configuration...

Current configuration : 339 bytes
!
interface GigabitEthernet1/1/2
 description THHCORE1
 switchport trunk encapsulation dot1q
 switchport trunk allowed vlan 42,43,53-57,200,300,607,620,900,902,905,910
 switchport trunk allowed vlan add 930-933,935,936,940,950-953,990,995,996
 switchport mode trunk
 shutdown
 spanning-tree guard loop
 channel-group 1 mode desirable
end

SERVERBACKBONE#
SERVERBACKBONE#
SERVERBACKBONE#
SERVERBACKBONE#
SERVERBACKBONE#sh run int po1
Building configuration...

Current configuration : 287 bytes
!
interface Port-channel1
 description THHCORE1 ETHERCHANNEL
 switchport trunk encapsulation dot1q
 switchport trunk allowed vlan 42,43,53-57,200,300,607,620,900,902,905,910
 switchport trunk allowed vlan add 930-933,935,936,940,950-953,990,995,996
 switchport mode trunk
 shutdown
end

SERVERBACKBONE#sh run int po2
Building configuration...

Current configuration : 301 bytes
!
interface Port-channel2
 description THHCORE2 ETHERCHANNEL
 switchport trunk encapsulation dot1q
 switchport trunk allowed vlan 42,43,53-57,200,300,607,620,900,902,905,910
 switchport trunk allowed vlan add 930-933,935,936,940,950-953,990,995,996
 switchport mode trunk
 spanning-tree cost 200
end

SERVERBACKBONE#

 

this is the cab which we truned of links from to stabalist the netwrok.

 

HH_CAB06_SW1#sh cdp neigh
Capability Codes: R - Router, T - Trans Bridge, B - Source Route Bridge
                  S - Switch, H - Host, I - IGMP, r - Repeater, P - Phone,
                  D - Remote, C - CVTA, M - Two-port Mac Relay

Device ID        Local Intrfce     Holdtme    Capability  Platform  Port ID
THHCORE2.hilldomain.thh.nhs.uk
                 Gig 1/1/1         154             R S I  WS-C6509- Gig 7/7
HH_CAB06_SW1#sh run int gi 1/1/1
Building configuration...

Current configuration : 176 bytes
!
interface GigabitEthernet1/1/1
 description THHCORE2
 switchport trunk encapsulation dot1q
 switchport trunk allowed vlan 106,144,900,902,910,996
 switchport mode trunk
end

HH_CAB06_SW1#sh run int gi 1/1/2
Building configuration...

Current configuration : 180 bytes
!
interface GigabitEthernet1/1/2
 description UPLINK
 switchport trunk encapsulation dot1q
 switchport trunk allowed vlan 106,900,902,910,996
 switchport mode trunk
 shutdown
end

HH_CAB06_SW1#sh run int gi 3/1/1
Building configuration...

Current configuration : 186 bytes
!
interface GigabitEthernet3/1/1
 description THHCORE1
 switchport trunk encapsulation dot1q
 switchport trunk allowed vlan 106,144,900,902,910,996
 switchport mode trunk
 shutdown
end

HH_CAB06_SW1#sh run int gi 3/1/2
Building configuration...

Current configuration : 180 bytes
!
interface GigabitEthernet3/1/2
 description UPLINK
 switchport trunk encapsulation dot1q
 switchport trunk allowed vlan 106,900,902,910,996
 switchport mode trunk
 shutdown
end

HH_CAB06_SW1#

 

 

 

these are tthe links on the core switches - 

 

THHCORE1#sh run int po2
Building configuration...

Current configuration : 321 bytes
!
interface Port-channel2
 description SERVERBACKBONE ETHERCHANNEL
 switchport
 switchport trunk encapsulation dot1q
 switchport trunk allowed vlan 42,43,53-57,200,300,607,620,900,902,905,910
 switchport trunk allowed vlan add 930-933,935,936,940,950-953,990,995,996
 switchport mode trunk
 spanning-tree guard root
end

THHCORE1#sh run int gi 3/8
Building configuration...

Current configuration : 716 bytes
!
interface GigabitEthernet3/8
 description ##Connects to S06SW1##
 switchport
 switchport trunk encapsulation dot1q
 switchport trunk allowed vlan 106,144,900,902,910,996
 wrr-queue bandwidth 30 70
 wrr-queue queue-limit 40 30
 wrr-queue random-detect min-threshold 1 40 80
 wrr-queue random-detect min-threshold 2 70 80
 wrr-queue random-detect max-threshold 1 80 100
 wrr-queue random-detect max-threshold 2 80 100
 wrr-queue cos-map 1 1 1
 wrr-queue cos-map 1 2 0
 wrr-queue cos-map 2 1 2 3 4
 wrr-queue cos-map 2 2 6 7
 mls qos trust dscp
 storm-control broadcast level 50.00
 storm-control multicast level 50.00
 storm-control action trap
 spanning-tree guard root
 service-policy input SCANNER
end

THHCORE1#sh int status


Gi3/8        ##Connects to S06S notconnect   1            full   1000 1000BaseSX

Gi9/2        SERVERBACKBONE     notconnect   1            full   1000 1000BaseSX
Gi9/3        SERVERBACKBONE     notconnect   1            full   1000 1000BaseSX



Po2          SERVERBACKBONE ETH notconnect   1            auto   auto

THHCORE1#

 

 

 

 

THHCORE2#sh int status


Gi3/6        ##Connects to S06S notconnect   1            full   1000 1000BaseSX
Gi7/7        HH_CAB06_SW1 - 1/1 connected    trunk        full   1000 1000BaseSX

Gi9/3        SERVERBACKBONE     connected    trunk        full   1000 1000BaseSX
Gi9/4        SERVERBACKBONE     connected    trunk        full   1000 1000BaseSX

Po2          SERVERBACKBONE ETH connected    trunk      a-full a-1000

THHCORE2# sh run int po2
Building configuration...

Current configuration : 341 bytes
!
interface Port-channel2
 description SERVERBACKBONE ETHERCHANNEL
 switchport
 switchport trunk encapsulation dot1q
 switchport trunk allowed vlan 42,43,53-57,200,300,607,620,900,902,905,910
 switchport trunk allowed vlan add 930-933,935,936,940,950-953,990,995,996
 switchport mode trunk
 mls qos trust dscp
 spanning-tree guard root
end

 

THHCORE2# sh run int gi 9/3
Building configuration...

Current configuration : 339 bytes
!
interface GigabitEthernet9/3
 description SERVERBACKBONE
 switchport
 switchport trunk encapsulation dot1q
 switchport trunk allowed vlan 42,43,53-57,200,300,607,620,900,902,905,910
 switchport trunk allowed vlan add 930-933,935,936,940,950-953,990,995,996
 switchport mode trunk
 mls qos trust dscp
 channel-group 2 mode desirable
end

THHCORE2# sh run int gi 9/4
Building configuration...

Current configuration : 339 bytes
!
interface GigabitEthernet9/4
 description SERVERBACKBONE
 switchport
 switchport trunk encapsulation dot1q
 switchport trunk allowed vlan 42,43,53-57,200,300,607,620,900,902,905,910
 switchport trunk allowed vlan add 930-933,935,936,940,950-953,990,995,996
 switchport mode trunk
 mls qos trust dscp
 channel-group 2 mode desirable
end

THHCORE2#sh run int gi 7/7
Building configuration...

Current configuration : 684 bytes
!
interface GigabitEthernet7/7
 description HH_CAB06_SW1 - 1/1/1
 switchport
 switchport trunk encapsulation dot1q
 switchport trunk allowed vlan 106,144,900,902,910,996
 switchport mode trunk
 wrr-queue bandwidth 30 70
 wrr-queue queue-limit 40 30
 wrr-queue random-detect min-threshold 1 40 80
 wrr-queue random-detect min-threshold 2 70 80
 wrr-queue random-detect max-threshold 1 80 100
 wrr-queue random-detect max-threshold 2 80 100
 wrr-queue cos-map 1 1 1
 wrr-queue cos-map 1 2 0
 wrr-queue cos-map 2 1 2 3 4
 wrr-queue cos-map 2 2 6 7
 mls qos trust dscp
 storm-control broadcast level 50.00
 storm-control multicast level 50.00
 service-policy input SCANNER
end

 

 

 

let me know if you need anything more..

 

Many thanks for your help..

 

Regards,

 

Sheikh

 

 

 

It is a bit difficult to understand all of it without a picture.

But is is possible that you're missing an ether-channel configuration on HH_CAB06_SW1.
If you have more than one physical link between two devices or a VSS cluster, you need to aggregate them into a port channel.

As far as I can see your uplinks are not part of a port-channel.

I think a topology diagram is needed, I can't work it out from the above configuration.

Are your Core switches a VSS pair or individual units connected together via an Etherchannel?

Hi,

 

They are individulal c6509s with fible etherchannel between them.

 

Cheers,

 

Sheikh

 

HiGents,

There is sport channel between the serverbackbone and the Core switches. But the link between the cab06 and the Core switches are not port channels. They are single fibre links to each core as any other cabinets in our organization.

Also, the links between the Cores and the serverbackbone is fibre.

I agree with you guys that its may be a loop. How can we confirm this please ?

 

Regards,

Sheikh

 

Review Cisco Networking for a $25 gift card