01-19-2016 03:04 PM - edited 03-08-2019 03:27 AM
This session will provide an opportunity to learn and ask questions about how to troubleshoot CPU issues in the Cisco Catalyst Switches IOS architecture.
Ask questions from Monday, January 25 to Friday February 5, 2016
Featured Experts
Naveen Venkateshaiah is a customer support engineer in High-Touch Technical Services (HTTS). He is an expert on Routing, LAN Switching and Data Center products. His areas of expertise include Cisco Catalyst 3000, 4000, 6500, and Cisco Nexus 7000,Nexus 5000, Nexus 3000, Nexus 2000, UCS, and MDS SAN Switches. He has over 8 years of industry experience working with large enterprise and Service Provider networks. Venkateshaiah holds a CCNA, CCNP, and CCDP-ARCH, AWLANFE, LCSAWLAN Certification. He is currently working to obtain a CCIE in Data Center.
Abhishek Soni is a customer support engineer in High-Touch Technical Services (HTTS). He is an expert on Routing, LAN Switching and Data Center products. His areas of expertise include Cisco Catalyst 3000, 4000, 6500, and Cisco Nexus 7000. He has over 8 years of industry experience working with large enterprise and Service Provider networks. Soni holds a CCNA and CCNP Certification. He is currently working to obtain a CCIE in routing and switching.
Find other https://supportforums.cisco.com/expert-corner/events.
** Ratings Encourage Participation! **
Please be sure to rate the Answers to Questions
02-01-2016 05:10 AM
02-01-2016 07:43 AM
Hi Mishaal,
Thanks for the logs,from netdr we see from the below source the packets are punted to CPU continuously.
------- dump of incoming inband packet -------
interface Vl55, routine draco2_process_rx_packet_inline
dbus info: src_vlan 0x37(55), src_indx 0x9(9), len 0x62(98)
bpdu 0, index_dir 0, flood 0, dont_lrn 0, dest_indx 0x380(896)
60020400 00370000 00090000 62000000 00110030 8E0FF7FC 00000000 03800000
mistral hdr: req_token 0x0(0), src_index 0x9(9), rx_offset 0x76(118)
requeue 0, obl_pkt 0, vlan 0x37(55)
destmac 00.1D.E6.18.78.00, srcmac 00.22.A1.10.47.55, protocol 0800
protocol ip: version 0x04, hlen 0x05, tos 0x00, totlen 80, identifier 8754
df 0, mf 0, fo 0, ttl 124, src 195.94.31.92, dst 84.235.58.20
udp src 56349, dst 31777 len 60 checksum 0x0
L2
===
Source mac 0022.A110.4755 2358
Dest mac 001D.E618.7800 4096
Protocol 0800 4096
Interface Vl55 2358
Source vlan 0x37(55) 2358
Source index 0x2E(46) 1549
Dest index 0x380(896) 2367
L3
==
ipv4 source 195.94.31.92 1498
ipv4 dest 195.94.31.106 996
Action plan:
==========
++ From Mac finder this mac address 0022.A110.4755 is of Huawei device .
++ Above are the top Talkers,Looks like source mac 0022.A110.4755 is Huawei device ,learned on vlan 55 .
++ Trace the mac address and look for the interface it is connected to and shut the interface,.
++ Need to check why this is sending so many packets?
++ Once you shut the source, check if cpu utilization is reduce.
Regards,
Naveen Venkateshaiah
02-02-2016 02:25 AM
Dear Naveen,
This link is belong to VoIP services and i am wondering is this normal behavior with the VoIP, and i am afraid i can't shutdown the services because it is live and very important to us.
+ if this normal behavior how many session did the 6500 can handle is there a limit to that.
+ if this ports pass traffic more than my gateways can handle this interrupt will occur " Ex. if this ports send around 2 G of traffic while i have only 2 STM 1 as the gateway for their final destination ", why the interrupt process happing in the first place is there any doc to read what cause the interrupt process to go high.
Best Regards,
Mishaal Ali Thabet
02-02-2016 08:35 AM
Hi Mishaal,
If we look at the netdr capture, Out of 4k packets more than 2k are for srcmac 00.22.A1.10.47.55
and now most of the destination is 84.235.x.x 2k packets,need to check where is this destination? what is the mls config for this destination?
These are the destination ip IPV4 packets sorted based on
Dest IPv4 Address Number of Packets
195.94.31.106 996
195.94.31.107 725
84.235.58.4 336
84.235.60.231 306
84.235.58.20 241
84.235.44.244 209
84.235.60.215 204
This traffic is a L2 traffic, and we should forward in hardware,I am not expecting to see them here,but we need the full config, then show mls cef in order to understand the hardware config.
For these destination , packet is going to CPU,its not getting Hardware switched ,following link will provide you an idea on common reason why the packets are being software switched rather than hardware switched.
https://supportforums.cisco.com/document/59926/troubleshooting-high-cpu-6500-sup720
Let me know if you have any further questions, also capture this commands for above destination ip.
Show run
show ip cef
sh mls cef <destination ip >detail
sh mls cef adjacency entry
show ip route <destination ips>
show mls cef lookup destination_ip_network detail
show ip cef destination_network
Regards,
Naveen Venkateshaiah
02-03-2016 07:49 AM
Hello Naveen,
Kindly find the attached.
and find the below
i was running a BGP on that switch but due to it is only 3B sup i was only accepting 100K entries, than i suspected that could cause the problem. i sopped the BGP neighboring and moved to 0.0.0.0/0 route to the gateways.
my gateways connection is around 4X STM1
most of time a got traffic in/out from the above mentioned x.107 and x.106 in that switch trying to go outside by the 0.0.0.0/0 higher than the STM1 some time it reach around 1.2 G is this has any thing to do with High CPU.
i would like also to know is it normal when i use monitoring software and monitor the interfaces ether those it came from my network that related to the above IPs or the STM1 it is like
-getting hits per 1 second bigger than STM1 and could reach 800Mbps.
-traffic on my local port for that services is going beyond 1 G.
the monitoring software shows that at second 1 hits reached 500Mb second 2 and 3 there is no hits second 4 a hits of around 900Mbps and so on is this considered normal if not could this points be the real effect of high CPU and i need to upgrade my gateways links to STM4 and STM16.
02-04-2016 04:59 AM
Hi Mishaal,
As we know CPU is due to interrupts and all traffic coming on vlan 55 and vlan4 which is getting punted to CPU.
++ We need to check below things
++ what is destination ip, does it belong to router? Is packet destined to router
++ Any feature applied on interface which is forcing traffic to get process switched.
++ Is all traffic coming on that vlan getting processed switched or only few packets?
++ All the time traffic getting punted or only when traffic exceeding a particular rate.
++ Is any copp applied?
++ Since when CPU is high, MRTG graph for CPU and interface utilization may help
++ An ELAM capture can also help here.
++ We need to perform extensive webex troubleshooting here ,can you open a SR to further investigate on this .
Common scenario when traffic gets punted to CPU
1. TTL=1
2. Destination not in routing table
3. Packet destined to router
4. A feature applied on ingress or egress interface causing packet to be process switched
5. A field set on the packet which is not supported in hardware and requires process switching.
Regards,
Naveen Venkateshiah.
02-04-2016 07:53 AM
Hello Naveen,
Thank you for the extensive work you did with me.
++ what is destination ip, does it belong to router? Is packet destined to router
0.0.0.0/0 destined to IP within the Vlan 4 belong to a gateway router 195.94.11.41/42/21/22 , all Dst gose via those router except 195.94.x.x this our local.
Any feature applied on interface which is forcing traffic to get process switched.
ACL on vlan some acl with logging
++ Is all traffic coming on that vlan getting processed switched or only few packets?
not sure could you please guide on how can verify this.
++ All the time traffic getting punted or only when traffic exceeding a particular rate.
i haven't implement any punt policy or monitoring, can you guide how can i get this statistics the interrupt process is always above 20-28%
++ Is any copp applied?
ACL
++ Since when CPU is high, MRTG graph for CPU and interface utilization may help
will it was since long time ago 3,4 years not sure, we just noticed the effect with raise UP the VoIP session in last 6 month
++ An ELAM capture can also help here.
i am not enabling the "service internal on my 6500 is there any impact to that command if i use it.
++ We need to perform extensive webex troubleshooting here ,can you open a SR to further investigate on this .
Sure i will open one, but could you please tell me what is SR and how can i open one :)
kindly the below command that enabled
ip flow-cache timeout active 5 // net flow are not enabled in any interface
no ip bootp server
ip domain-name xxxxxxxx
ip host fw xxxxxxxxxx
ip name-server xxxxxxxxxxx
ipv6 mfib hardware-switching replication-mode ingress
mls ip multicast flow-stat-timer 9
mls flow ip full
no mls flow ipv6
mls nde sender
mls rate-limit all ttl-failure 15 10
no mls acl tcam share-global
mls cef error action reset
mls cef maximum-routes ip 220
redundancy
mode sso
main-cpu
auto-sync running-config
spanning-tree mode pvst
spanning-tree extend system-id
diagnostic cns publish cisco.cns.device.diag_results
diagnostic cns subscribe cisco.cns.device.diag_commands
fabric required
fabric buffer-reserve queue
!
vlan internal allocation policy ascending
vlan access-log ratelimit 2000
ip flow-export source Loopback0
ip flow-export destination xxxxxxxxxx // net flow not enabled in any interface
ip flow-export destination xxxxxxxxxx //// net flow not enabled in any interface
no ip http server
02-05-2016 07:11 AM
Hi Mishaal,
I went through the questions and replies and understand your concern. Naveen advised to open a SR, it means opening a case (Service Request) with Cisco to have live troubleshooting on the box.
By looking at the Netdr capture, we are not able to conclude why the packets are getting process switched rather than being HW switched.
I hope you understand.
Please let me know if you have any question. We will be more than happy to assist you.
Ragards,
Abhishek Soni
02-05-2016 07:55 AM
Dear Naveen & Abhishek,
Thank you, this was a very help and beneficial topic.
regarding the Service Request i will do my best in opening one.
Best Regards,
Mishaal Ali Thabet
02-05-2016 08:27 AM
No problem Mishaal. Thanks for raising interesting questions.
Best Regards,
Abhishek Soni
02-02-2016 04:25 PM
I suspect that the "#remote login switch" takes you to a tty line to the other shell of the catalyst motherboard. A few years ago, when I looked up documentations about converting between calalyst hybrid mode and ios mode, it said that auxiliary line for PSTN connection through console port is lost after the conversion.
But, it will be hard to find those documentation about the hybrid mode now.
02-01-2016 12:07 AM
HI Naveen / Abhishek
We are having a WS-C3750G-48PS in our organization and we are always observing HIgh CPU due to Hulc LED process . Its always constant around 15 - 20 % always .
Can you please let me know what could be the issue
Regards
Diburaj
02-01-2016 01:48 AM
Hi Diburaj,
Hulc LED Process covers a bunch of functions which including Link status monitoring, Management interface handling etc and its consuming around 10-15% of CPU cycles can be expected on this platform.
Basically the "Hulc LED" process does following tasks:
- Check Link status on every port
- If the switch supports POE, it checks to see if there is a Power Device (PD) detected
- Check the status of the transceiver
- Update Fan status
- Set Main LED and ports LEDs
- Update both Power Supplies and RPS
- Check on system temperature status
CPU utilization for Hulc LED Process of around 10-15% is seen on Cisco Catalyst 3750 and 3560
switches, even with no ports connected.
show process cpu shows higher than expected CPU utilization.
This issue may be seen on a Catalyst 3750 or 3560 switch, typically with no ports connected.
I would suggest you that this is a minor issue and does not affect the hardware forwarding performance of the switch.
Below is the known Bug for your reference.
https://tools.cisco.com/bugsearch/bug/CSCsi78581/?reffering_site=dumpcr
Please do not hesitate to contact me in case you have any queries.
Regards,
Naveen Venkateshaiah.
02-03-2016 06:35 AM
I was wondering if it is possible to use broadcast/multicast/unicast storm control for troubleshooting high CPU utilization? I was thinking about using SNMP traps to determine if there is a storm, but I was not sure if there is a specific percentage to start at or if that is even a good idea. I appreciate your response in advance.
Thanks,
Alex
02-03-2016 07:50 AM
Thanks Alex for raising question.
A traffic storm can cause the CPU to go high for example: ARP broadcast.
You can proactively suppress the broadcast and also configure traffic storm control to generate an SNMP trap when a storm is detected on the port.
Following link provides a detailed explanation:
http://www.cisco.com/c/en/us/td/docs/switches/lan/catalyst6500/ios/12-2SX/configuration/guide/book/storm.html
Please let me know if you have any further question.
Best Regards,
Abhishek Soni
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide