cisco 7600 high SP utilisation

Antonio_1_2 · ‎09-29-2012

Hello everyone,

here is a brief description of a problem:

On a Cisco 7600 with SUP720 BXL sudden increase of CPU utilization (from 35% to 98%) appeared.

This utilisation happend and still happens on the SP CPU (not RP) and it is interrupt based.

router-sp#sh proc cpu

CPU utilization for five seconds: 99%/82%; one minute: 98%; five minutes: 98%

I also got output from ibc

router-sp#show ibc

Interface information:

Interface IBC0/0(idb 0x44E47588)

Hardware is Mistral IBC (revision 5)

5 minute rx rate 24000 bits/sec, 44 packets/sec

5 minute tx rate 68000 bits/sec, 122 packets/sec

1186057298 packets input, 80461312123 bytes

1179948920 broadcasts received

3254930405 packets output, 224120381448 bytes

85350832 broadcasts sent

0 Inband input packet drops

0 Bridge Packet loopback drops

0 Packets CEF Switched, 0 Packets Fast Switched

0 Packets SLB Switched, 0 Packets CWAN Switched

IBC resets = 2; last at 00:28:58.792 CET Wed Nov 30 2011

Using SPAN I collected packets that were punted to SP CPU.

Statistics showed that 96,82% of all pakets were STP BPDUs and second next were CDP packets with 0,43% share.

It can be seen from the collected packets that 70% off all STP BPDUs are from 2 source MAC addresses.

Since this layer 2 network consists of few hundreds Cisco switches (series 7600, 6500, 2960, 3560,3750, and Linksys) it is very hard to trace this 2 MAC addreses.

questions:

-----------------

1) is there a way to somehow trace these two MAC addreses/switches besides loggin into every switch in the network?

Problem is that these 2 MAC addresses don't appear in CAM or ARP tables on network, And also, as far as I know, huge problem is that 2960,3560 series for BID (bridge identifier) use special MAC address which can be seen only with "show version" (and this MAC is also source MAC address for BPDU)

2) regarding the output of "show ibc", the traffic comming and going to/from SP CPU shouldn't be significant for this CPU utilisation of 98%.

5 minute rx rate 24000 bits/sec, 44 packets/sec

5 minute tx rate 68000 bits/sec, 122 packets/sec

That leads me to conclusion that this could be some kind of loop in processing on CPU and reload could help. Have anyone seen maybe something like this? Am I misled or 122 BPDUs per second can realy overwhelm CPU?

3) Generaly have anyone experienced problem with high SP CPU utilisation on SUP720 And what was the cause usualy?

(and if someone can recommend what else should I look for)

Thanks in advance,

A.

Raju Sekharan · ‎10-04-2012

CPU is high due to interrupts

Could you attach the following outputs

1. Show tech

2. Show platform hardware capacity forwarding

3. Show mls stat

4. show ip traffic

Antonio_1_2 · ‎10-05-2012

Hi,

I attached outputs 1. from SP and 2,3,4 from RP

Thank you

A

Raju Sekharan · ‎10-05-2012

Thank you

Can you get me the show tech from RP.

Also get me the following outputs too from RP

1. show mls rate

2. Show mls rate usgae

Then loging to SP

1. Debug netdr capture rx

wait for 2 minutes

2. undebug all

3. term len o

4. show netdr captured-packets

Send me the output of "show netdr captured-packets "

Antonio_1_2 · ‎10-09-2012

Hi,

thank you for your reply.

The requested outputs are in the attachment.

Only problem is that I can't send running configuration on forum.

Thank you and regards,

A.

Raju Sekharan · ‎10-09-2012

I checked the 4096 packets of netdr. 3685 was STP packets and most of it is coming from

00.15.2B.0D.62.A1 and 00.15.63.05.17.0D.

Youn need to from where these MACs are coming. if there is any loop situation

Antonio_1_2 · ‎10-09-2012

Hi

Thak you very much.problem is that there is few hundred switches in my network.

and those two mac addresses are bridge identifiers.

I cant find them in cam tables and only way I can think of is to log in every switch and issue show version.

Is there a better/easier way to find these addresses ?

Regards

A.

Raju Sekharan · ‎10-10-2012

Let us check if there is some spanning-tree issues

can you get me

1. Show spanning-tree detail

Antonio_1_2 · ‎10-11-2012

Hi,

I attached output for show spanning-tree detail.

Thank you,

regards,

A.

Raju Sekharan · ‎10-11-2012

There is lot of topology changes for amny Vlans. Below is few

VLAN0005 is executing the rstp compatible Spanning Tree protocol

Bridge Identifier has priority 32768, sysid 5, address 0015.c630.8e00

Configured hello time 2, max age 20, forward delay 15, tranmsit hold-count 6

Current root has priority 0, address 0015.63f3.4180

Root port is 769 (TenGigabitEthernet7/1), cost of root path is 2

Topology change flag set, detected flag not set

Number of topology changes 309093 last change occurred 00:00:21 ago

from TenGigabitEthernet7/1

VLAN0013 is executing the rstp compatible Spanning Tree protocol

Bridge Identifier has priority 32768, sysid 13, address 0015.c630.8e00

Configured hello time 2, max age 20, forward delay 15, tranmsit hold-count 6

Current root has priority 0, address 0015.63f3.4180

Root port is 769 (TenGigabitEthernet7/1), cost of root path is 2

Topology change flag set, detected flag not set

Number of topology changes 309077 last change occurred 00:00:24 ago

from TenGigabitEthernet7/1

VLAN0014 is executing the rstp compatible Spanning Tree protocol

Bridge Identifier has priority 32768, sysid 14, address 0015.c630.8e00

Configured hello time 2, max age 20, forward delay 15, tranmsit hold-count 6

Current root has priority 0, address 0015.63f3.4180

Root port is 769 (TenGigabitEthernet7/1), cost of root path is 2

Topology change flag set, detected flag not set

Number of topology changes 309113 last change occurred 00:00:24 ago

from TenGigabitEthernet7/1

Check the below command multiple times and check if there is continuous spanning-tree changes and if it is always from Ten 7/1

1. show spanning-tree detail | inc compatible Spanning Tree protocol||Last change|From

If the last change is always coming from Ten 7/1 .Check what is connected to Ten 7/1 and check the same above command there

Thank you

Raju

Giuseppe Larosa · ‎10-11-2012

Hello Antonio,

there are several vlans with more then 300000 topology changes with last change in less then a minute

Example:

VLAN0005 is executing the rstp compatible Spanning Tree protocol

Bridge Identifier has priority 32768, sysid 5, address 0015.c630.8e00

Configured hello time 2, max age 20, forward delay 15, tranmsit hold-count 6

Current root has priority 0, address 0015.63f3.4180

Root port is 769 (TenGigabitEthernet7/1), cost of root path is 2

Topology change flag set, detected flag not set

Number of topology changes 309093 last change occurred 00:00:21 ago

>>> from TenGigabitEthernet7/1

VLAN0011 is executing the rstp compatible Spanning Tree protocol

Bridge Identifier has priority 32768, sysid 11, address 0015.c630.8e00

Configured hello time 2, max age 20, forward delay 15, tranmsit hold-count 6

Current root has priority 0, address 0015.63f3.4180

Root port is 769 (TenGigabitEthernet7/1), cost of root path is 2

Topology change flag set, detected flag not set

Number of topology changes 309079 last change occurred 00:00:24 ago

>>> from TenGigabitEthernet7/1

interface te7/1 points to the root bridge for these vlans.

You should move on the root bridge and use the same command to identify the interface that received TCN bpdu and recursively you can find the device(s) causing this.

I would focus on one vlan at a time then I would repeat the search for other two vlans to see if the resulting device is the same.

Likely you will find out the same device(s) causing TC in multiple Vlans.

Hope to help

Giuseppe

Antonio_1_2 · ‎10-12-2012

Hi,

thank you both for your answers

but occasional TCNs sholdn't be the reason for such a high CPU usage on sup720BXL SP, should it?

it's a relatively huge layer2 network (mixture of MST and PVST+, and CST) and from time to time I do trace the origins of TCN BPDUs (ususally due to unknown unicast flooding problem).

And further more, even in quiet periods (when there is no TCN for 5-10 minutes SP CPU doesn't drop below 90%.

All other switches (2960, 6500 series) in network also propagate those TCNs but their CPU are normal (below 40%).

I traced last ten swtches that originated TCNs and non of these had the two MAC addresses found in the debugging earlier.

Regards,

A

Raju Sekharan · ‎10-12-2012

It is possible that your netdr was taken during this topology flaps and we got only stp packets in the capture

Collect the netdr when the STP is stable and send to me. We will analyze it

Also could you please send me the following output

1. Show mls cef exception status

Antonio_1_2 · ‎10-15-2012

Hi,

I captured packets during STP stable status. I also attached outputs which confirms that there were no STP toplogy changes during the capturing.

But it seems that again there is only BPDU STP packets.

here is also requested show output:

1) router#show mls cef exCEption status
Current IPv4 FIB exception state = FALSE
Current IPv6 FIB exception state = FALSE
Current MPLS FIB exception state = FALSE

regards,

A.

Giuseppe Larosa · ‎10-12-2012

Hello Antonio,

>> I traced last ten swtches that originated TCNs and non of these had the two MAC addresses found in the debugging earlier.

there are some strange facts in your issue:

a) analysis of packet capture of traffic punted to SP cpu shows a lot of STP messages sourced by two specific MAC addresses

00.15.2B.0D.62.A1 -----> 0015.2B0D.62A1 in IOS CLI format

00.15.63.05.17.0D -----> 0015.6305.170D in IOS CLI format

STP messages sourced by these two MAC addresses should be originated by adjacent switches, as STP BPDUs do not travel in the network, but are generated switch hop by switch hop.

You say that affected node CAM table does not contain entries for the two MAC addresses listed above.

This is strange they should be there.

Those two MAC addresses should not need to be searched everywhere in the network, they should belong to switch ports of one or two adjacent switches.

Now, unless the switch does not perform MAC address learning on the source MAC addresses of STP BPDUs, the only reason for not having MAC addresses in the CAM table is that the CAM table is full.

So I would check how many MAC addresses are in the CAM table.

B) the analysis of show spanning-tree detail shows that there are many (not all) Vlans that have experienced high number of STP topology changes (more then 300000) and that for each affected vlan like vlan 5,11,13 the port on which the topology change has been received is tengiga7/1 that is also the root port pointing to the root bridge.

The command details the root bridge-id and the designated port bridge-id, but not the designated port MAC address

Port 769 (TenGigabitEthernet7/1) of VLAN0013 is root forwarding

Port path cost 2, Port priority 128, Port Identifier 128.769.

Designated root has priority 0, address 0015.63f3.4180

Designated bridge has priority 0, address 0015.63f3.4180

Designated port id is 128.2, designated path cost 0

Timers: message age 16, forward delay 0, hold 0

Number of transitions to forwarding state: 1

Link type is point-to-point by default, Peer is STP

BPDU: sent 5, received 7398335

Here, the source MAC address of these 7398335 RX BPDUs is not told to us.

Hope to help

Giuseppe