cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
15068
Views
155
Helpful
63
Replies

Ask the Expert: Troubleshooting High CPU in Catalyst Switches

Monica Lluis
Level 9
Level 9
 

This session will provide an opportunity to learn and ask questions about how to troubleshoot CPU issues in the Cisco Catalyst Switches IOS architecture.

 

Ask questions from Monday, January 25 to Friday February 5, 2016

Featured Experts

Naveen Venkateshaiah is a customer support engineer in High-Touch Technical Services (HTTS). He is an expert on Routing, LAN Switching and Data Center products. His areas of expertise include Cisco Catalyst 3000, 4000, 6500,  and Cisco Nexus 7000,Nexus 5000, Nexus 3000, Nexus 2000, UCS, and MDS SAN Switches. He has over 8 years of industry experience working with large enterprise and Service Provider networks. Venkateshaiah holds a CCNA, CCNP, and  CCDP-ARCH, AWLANFE, LCSAWLAN Certification. He is currently working to obtain a CCIE in Data Center.

 

Abhishek Soni is a customer support engineer in High-Touch Technical Services (HTTS). He is an expert on Routing, LAN Switching and Data Center products. His areas of expertise include Cisco Catalyst 3000, 4000, 6500,  and Cisco Nexus 7000. He has over 8 years of industry experience working with large enterprise and Service Provider networks. Soni holds a CCNA and CCNP Certification. He is currently working to obtain a CCIE in routing and switching.

 

Find other  https://supportforums.cisco.com/expert-corner/events.

** Ratings Encourage Participation! **
Please be sure to rate the Answers to Questions

 


 

I hope you and your love ones are safe and healthy
Monica Lluis
Community Manager Lead
63 Replies 63

Hello Naveen,

Kindly find the attached.

Best Regards,

Mishaal Ali Thabet

Hi Mishaal,

Thanks for the logs,from netdr we see from the below source the packets are punted to CPU continuously.

------- dump of incoming inband packet -------
interface Vl55, routine draco2_process_rx_packet_inline
dbus info: src_vlan 0x37(55), src_indx 0x9(9), len 0x62(98)
  bpdu 0, index_dir 0, flood 0, dont_lrn 0, dest_indx 0x380(896)
  60020400 00370000 00090000 62000000 00110030 8E0FF7FC 00000000 03800000
mistral hdr: req_token 0x0(0), src_index 0x9(9), rx_offset 0x76(118)
  requeue 0, obl_pkt 0, vlan 0x37(55)
destmac 00.1D.E6.18.78.00, srcmac 00.22.A1.10.47.55, protocol 0800
protocol ip: version 0x04, hlen 0x05, tos 0x00, totlen 80, identifier 8754
  df 0, mf 0, fo 0, ttl 124, src 195.94.31.92, dst 84.235.58.20
    udp src 56349, dst 31777 len 60 checksum 0x0

L2

===
Source mac 0022.A110.4755   2358
Dest mac     001D.E618.7800  4096
Protocol     0800    4096
Interface    Vl55     2358
Source vlan  0x37(55)   2358
Source index  0x2E(46)  1549
Dest index   0x380(896)   2367

L3
==
ipv4 source   195.94.31.92   1498

ipv4 dest       195.94.31.106   996

Action plan:
==========

++ From Mac finder this mac address 0022.A110.4755 is of Huawei device .

++ Above  are the top Talkers,Looks like source mac  0022.A110.4755  is Huawei device ,learned on vlan 55 .


++ Trace the mac address  and look for the interface it is connected to and shut the interface,.

++ Need to check why this is sending so many packets?

++ Once you shut the source, check if cpu utilization is reduce.

Regards,

Naveen Venkateshaiah

Dear Naveen,

This link is belong to VoIP services and i am wondering is this normal behavior with the VoIP, and i am afraid i can't shutdown the services because it is live and very important to us.

+ if this normal behavior how many session did the 6500 can handle is there a limit to that.

+ if this ports pass traffic more than my gateways can handle this interrupt will occur " Ex. if this ports send around 2 G of traffic while i have only 2 STM 1 as the gateway for their final destination ", why the interrupt process happing in the first place is there any doc to read what cause the interrupt process to go high.

Best Regards,

Mishaal Ali Thabet

Hi Mishaal,

If we  look at the netdr capture, Out of 4k packets more than 2k are for srcmac 00.22.A1.10.47.55
and now most of the destination is 84.235.x.x 2k packets,need to check  where is this destination? what is the mls config for this destination?

These are the destination ip IPV4 packets sorted based on

Dest IPv4 Address                                       Number of Packets
195.94.31.106                                                         996
195.94.31.107                                                         725
 84.235.58.4                                                            336
 84.235.60.231                                                        306
 84.235.58.20                                                          241
 84.235.44.244                                                        209
 84.235.60.215                                                        204

 This traffic is a L2 traffic, and we should forward in hardware,I am not expecting to see them here,but we need the full config, then show mls cef in order to understand the hardware config.

 For these destination , packet is going to CPU,its not getting Hardware switched ,following link will provide you an idea on common reason why the packets are  being software switched rather than hardware switched.

 https://supportforums.cisco.com/document/59926/troubleshooting-high-cpu-6500-sup720

Let me know if you have any further questions, also  capture this commands for above destination ip.

Show run

show ip cef

sh mls cef <destination ip >detail

sh mls cef adjacency entry

show ip route <destination ips>

show mls cef  lookup destination_ip_network detail

show ip cef destination_network


Regards,
Naveen Venkateshaiah

Hello Naveen,

Kindly find the attached.

and find the below

i was running a BGP on that switch but due to it is only 3B sup  i was only accepting 100K entries, than i suspected that could cause the problem. i sopped the BGP neighboring and moved to 0.0.0.0/0 route to the gateways.

my gateways connection is around 4X STM1

most of time a got traffic in/out from the above mentioned x.107 and x.106 in that switch trying to go outside by the 0.0.0.0/0 higher than the STM1 some time it reach around 1.2 G is this has any thing to do with High CPU.

i would like also to know is it normal when i use monitoring software and monitor the interfaces ether those it came from my network that related to the above IPs or the STM1 it is like

-getting hits per 1 second bigger than STM1 and could reach 800Mbps.

-traffic on my local port for that services is going beyond 1 G.

the monitoring software shows that at second 1 hits reached 500Mb second 2 and 3 there is no hits second 4 a hits of around 900Mbps and so on  is this considered normal if not could this points be the real effect of high CPU and i need to upgrade my gateways links to STM4 and STM16.

Hi Mishaal,

As we know CPU is due to interrupts and all traffic coming on vlan 55 and vlan4 which is getting punted to CPU.


++ We need to check below things
++ what is destination ip, does it belong to router? Is packet destined to router
++ Any feature applied on interface which is forcing traffic to get process switched.
++  Is all traffic coming on that vlan getting processed switched or only few packets?
++ All the time traffic getting punted or only when traffic exceeding a particular rate.
++ Is any copp applied?
++ Since when CPU is high, MRTG graph for CPU and interface utilization may help
++ An ELAM capture can also help here.
++ We need to perform extensive webex troubleshooting here ,can you open a SR to further investigate on this .


Common scenario when traffic gets punted to CPU


1. TTL=1
2. Destination not in routing table
3. Packet destined to router
4. A feature applied on ingress or egress interface causing packet to be process switched
5. A field set on the packet which is not supported in hardware and requires process switching.

Regards,

Naveen Venkateshiah.

Hello Naveen,

Thank you for the extensive work you did with me.

++ what is destination ip, does it belong to router? Is packet destined to router

0.0.0.0/0 destined to IP within the Vlan 4 belong to a gateway router 195.94.11.41/42/21/22 , all Dst  gose via those router except 195.94.x.x  this our local.

Any feature applied on interface which is forcing traffic to get process switched.

ACL  on vlan some acl with logging

++  Is all traffic coming on that vlan getting processed switched or only few packets?

not sure could you please guide on how can verify this.


++ All the time traffic getting punted or only when traffic exceeding a particular rate.

i haven't implement any punt policy or monitoring, can you guide how can i get this statistics the interrupt process is always above 20-28% 

++ Is any copp applied?

ACL


++ Since when CPU is high, MRTG graph for CPU and interface utilization may help

will it was since long time ago 3,4 years not sure, we just noticed the effect with raise UP the VoIP session in last 6 month
++ An ELAM capture can also help here.

i am not enabling the "service internal on my 6500 is there any impact to that command if i use it.


++ We need to perform extensive webex troubleshooting here ,can you open a SR to further investigate on this .

Sure i will open one, but could you please tell me what is SR and how can i open one :)

kindly the below command that enabled

ip flow-cache timeout active 5 // net flow are not enabled in any interface
no ip bootp server
ip domain-name xxxxxxxx

ip host fw xxxxxxxxxx
ip name-server xxxxxxxxxxx

ipv6 mfib hardware-switching replication-mode ingress
mls ip multicast flow-stat-timer 9
mls flow ip full
no mls flow ipv6
mls nde sender
mls rate-limit all ttl-failure 15 10
no mls acl tcam share-global
mls cef error action reset
mls cef maximum-routes ip 220

redundancy
 mode sso
 main-cpu
  auto-sync running-config
spanning-tree mode pvst
spanning-tree extend system-id
diagnostic cns publish cisco.cns.device.diag_results
diagnostic cns subscribe cisco.cns.device.diag_commands
fabric required
fabric buffer-reserve queue
!
vlan internal allocation policy ascending
vlan access-log ratelimit 2000

ip flow-export source Loopback0
ip flow-export destination xxxxxxxxxx  // net flow not enabled in any interface
ip flow-export destination xxxxxxxxxx  //// net flow not enabled in any interface
no ip http server

Hi Mishaal,

I went through the questions and replies and understand your concern. Naveen advised to open a SR, it means opening a case (Service Request) with Cisco to have live troubleshooting on the box.

By looking at the Netdr capture, we are not able to conclude why the packets are getting process switched rather than being HW switched.

I hope you understand.

Please let me know if you have any question. We will be more than happy to assist you.

Ragards,

Abhishek Soni

Dear Naveen & Abhishek,

Thank you, this was a very help and beneficial topic.

regarding the Service Request i will do my best in opening one.

Best Regards,

Mishaal Ali Thabet

No problem Mishaal. Thanks for raising interesting questions.

Best Regards,

Abhishek Soni

I suspect that the "#remote login switch" takes you to a tty line to the other shell of the catalyst motherboard. A few years ago, when I looked up documentations about converting between calalyst hybrid mode and ios mode, it said that auxiliary line for PSTN connection through console port is lost after the conversion.

But, it will be hard to find those documentation about the hybrid mode now.

Diburaj K P
Level 1
Level 1

HI Naveen / Abhishek

We are having a WS-C3750G-48PS in our organization and we are always observing HIgh CPU due to Hulc LED process . Its always constant around 15 - 20 % always . 

Can you please let me know what could be the issue 

Regards

Diburaj

Hi Diburaj,

Hulc LED Process covers a bunch of functions which including Link status monitoring, Management  interface handling etc and its consuming around 10-15% of CPU cycles can be expected on this platform.

Basically the "Hulc LED" process does following tasks:

- Check Link status on every port
- If the switch supports POE, it checks to see if there is a Power Device (PD) detected
- Check the status of the transceiver
- Update Fan status
- Set Main LED and ports LEDs
- Update both Power Supplies and RPS
- Check on system temperature status

CPU utilization for Hulc LED Process of around 10-15% is seen on Cisco Catalyst 3750 and 3560
switches, even with no ports connected.

show process cpu shows higher than expected CPU utilization.

 This issue may be seen on a Catalyst 3750 or 3560 switch, typically with no ports connected.

 I would suggest you that this is a minor issue and does not affect the hardware forwarding performance of the switch.

 Below is the known Bug for your reference.

https://tools.cisco.com/bugsearch/bug/CSCsi78581/?reffering_site=dumpcr

Please do not hesitate to contact me in case you have any queries.

Regards,
 
Naveen Venkateshaiah.

Alex Pfeil
Level 7
Level 7

I was wondering if it is possible to use broadcast/multicast/unicast storm control for troubleshooting high CPU utilization? I was thinking about using SNMP traps to determine if there is a storm, but I was not sure if there is a specific percentage to start at or if that is even a good idea. I appreciate your response in advance.

Thanks,

Alex

Thanks Alex for raising question.

A traffic storm can cause the CPU to go high for example: ARP broadcast.

You can proactively suppress the broadcast and also configure traffic storm control to generate an SNMP trap when a storm is detected on the port. 

Following link provides a detailed explanation:

http://www.cisco.com/c/en/us/td/docs/switches/lan/catalyst6500/ios/12-2SX/configuration/guide/book/storm.html

Please let me know if you have any further question.

Best Regards,

Abhishek Soni

Review Cisco Networking for a $25 gift card