10-22-2015 06:04 AM - edited 03-08-2019 02:20 AM
Dear Community members,
I need your advise for troubleshooting a high CPU usage problem on a Catalyst 3750X L3 switch, which is the core one in my network. This problem is producing high latency and data lost that is driving my network slow and unstable.
So, the evidence. This is the output of "show proc cpu sorted" command
#show proc cpu sorted CPU utilization for five seconds: 98%/28%; one minute: 99%; five minutes: 99% PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process 214 41003600 13944778 2940 24.15% 25.21% 25.35% 0 IP Input 169 34067372 3928624 8671 16.31% 14.55% 14.37% 0 Hulc LED Process 232 20500311 2892288 7087 8.95% 8.43% 8.43% 0 Spanning Tree 125 19816534 4237944 4675 5.59% 7.24% 7.74% 0 hpm main process 12 3744640 2199415 1702 2.87% 2.56% 2.45% 0 ARP Input 85 1858596 630601 2947 1.27% 0.95% 0.89% 0 RedEarth Tx Mana 129 2207888 176060 12540 0.95% 1.04% 1.03% 0 hpm counter proc 212 2836209 795249 3566 0.79% 1.46% 1.32% 0 IP ARP Adjacency 53 137483 4075 33738 0.79% 0.09% 0.06% 0 Per-minute Jobs 245 608854 166106 3665 0.79% 0.64% 0.54% 0 PI MATM Aging Pr 121 68235 11682 5841 0.63% 0.07% 0.01% 0 Strider Tcam Mem 91 1099121 68040 16154 0.47% 0.52% 0.48% 0 Adjust Regions 340 471153 649411 725 0.31% 0.27% 0.23% 0 VLAN Manager 84 686262 849745 807 0.31% 0.34% 0.36% 0 RedEarth I2C dri 51 278373 3518118 79 0.31% 0.30% 0.28% 0 Net Input 170 398886 133716 2983 0.15% 0.14% 0.15% 0 HL3U bkgrd proce
As you can see, the CPU is completely utilized and the top process is IP Input. Googleing I learned that the idea is to use interrupt-level switching (Fast,CEF, between others) instead of proccess-level one, so there is no CPU payload on switching. Checking this...
#show cef interface brief Interface IP-Address Status Switching Vlan1 10.100.10.1 up no dCEF FastEthernet0 unassigned down no dCEF GigabitEthernet1/0/1 unassigned up CEF GigabitEthernet1/0/2 unassigned down CEF GigabitEthernet1/0/3 unassigned up CEF GigabitEthernet1/0/4 unassigned up CEF GigabitEthernet1/0/5 unassigned up CEF GigabitEthernet1/0/6 unassigned up CEF GigabitEthernet1/0/7 unassigned up CEF GigabitEthernet1/0/8 unassigned down CEF GigabitEthernet1/0/9 unassigned up CEF GigabitEthernet1/0/10 unassigned down CEF GigabitEthernet1/0/11 unassigned up CEF GigabitEthernet1/0/12 unassigned up CEF GigabitEthernet1/0/13 unassigned up CEF GigabitEthernet1/0/14 unassigned up CEF GigabitEthernet1/0/15 unassigned up CEF GigabitEthernet1/0/16 unassigned up CEF GigabitEthernet1/0/17 unassigned up CEF GigabitEthernet1/0/18 unassigned up CEF GigabitEthernet1/0/19 unassigned up CEF GigabitEthernet1/0/20 unassigned up CEF GigabitEthernet1/0/21 unassigned up CEF GigabitEthernet1/0/22 unassigned up CEF GigabitEthernet1/0/23 unassigned up CEF GigabitEthernet1/0/24 unassigned up CEF GigabitEthernet1/1/1 unassigned down CEF GigabitEthernet1/1/2 unassigned down CEF GigabitEthernet1/1/3 unassigned down CEF GigabitEthernet1/1/4 unassigned down CEF TenGigabitEthernet1/1/1 unassigned down CEF TenGigabitEthernet1/1/2 unassigned down CEF Null0 unassigned up no CEF Vlan2 X.X.171.126 up CEF Vlan3 unassigned up CEF Vlan4 unassigned up CEF Vlan6 X.X.165.65 up CEF Vlan7 X.X.166.190 up CEF Vlan8 X.X.152.2 up CEF Vlan9 X.X.166.62 up CEF Vlan10 unassigned up CEF Vlan12 X.X.161.126 up CEF Vlan13 X.X.171.254 up CEF Vlan14 X.X.167.254 up CEF Vlan15 X.X.166.65 up CEF Vlan16 unassigned up CEF Vlan17 X.X.167.62 up CEF Vlan19 X.X.161.190 up CEF Vlan21 X.X.167.126 up CEF Vlan22 X.X.171.158 up CEF Vlan27 X.X.162.226 up CEF Vlan28 X.X.162.242 up CEF Vlan30 10.201.6.67 up CEF Vlan60 192.168.60.1 up CEF Vlan61 unassigned up CEF Vlan100 192.168.100.1 up CEF Vlan200 unassigned up CEF Vlan201 unassigned up CEF Vlan240 X.X.168.253 up no dCEF Vlan250 X.X.171.190 up CEF Vlan251 192.168.101.1 up CEF Vlan252 192.168.102.1 up CEF Vlan253 192.168.103.30 up CEF Vlan260 X.X.162.190 up CEF Vlan280 unassigned up CEF Vlan281 unassigned down CEF Vlan298 unassigned down CEF Vlan301 unassigned down CEF StackPort1 unassigned down CEF Virtual1 unassigned up - Virtual2 unassigned up -
I can see that CEF is enabled on almost every interface, but checking "show interface switching" (output cropped for readability)
#show interface switching
Vlan1
Throttle count 0
Drops RP 1139 SP 0
SPD Flushes Fast 0 SSE 0
SPD Aggress Fast 0
SPD Priority Inputs 0 Drops 0
Protocol IP
Switching path Pkts In Chars In Pkts Out Chars Out
Process 339540 75121303 57097 4611829
Cache misses 0 - - -
Fast 0 0 11 1053
Auton/SSE 0 0 0 0
Protocol ARP
Switching path Pkts In Chars In Pkts Out Chars Out
Process 1596666 95800784 72413 4344780
Cache misses 0 - - -
Fast 0 0 0 0
Auton/SSE 0 0 0 0
NOTE: all counts are cumulative and reset only after a reload.
Vlan2
Protocol IP
Switching path Pkts In Chars In Pkts Out Chars Out
Process 2816137 201283125 10400 1470747
Cache misses 0 - - -
Fast 1205 183277 10 2542
Auton/SSE 0 0 0 0
Protocol ARP
Switching path Pkts In Chars In Pkts Out Chars Out
Process 132743 7964580 86383 5182980
Cache misses 0 - - -
Fast 0 0 0 0
Auton/SSE 0 0 0 0
NOTE: all counts are cumulative and reset only after a reload.
Vlan3
Protocol ARP
Switching path Pkts In Chars In Pkts Out Chars Out
Process 68260 4095600 0 0
Cache misses 0 - - -
Fast 0 0 0 0
Auton/SSE 0 0 0 0
Protocol Other
Switching path Pkts In Chars In Pkts Out Chars Out
Process 2 120 0 0
Cache misses 0 - - -
Fast 0 0 0 0
Auton/SSE 0 0 0 0
NOTE: all counts are cumulative and reset only after a reload.
Vlan4
All statistics for this interface are zero.
Vlan6
Throttle count 0
Drops RP 9 SP 0
SPD Flushes Fast 0 SSE 0
SPD Aggress Fast 0
SPD Priority Inputs 0 Drops 0
Protocol IP
Switching path Pkts In Chars In Pkts Out Chars Out
Process 5140914 356471282 23015 2993665
Cache misses 0 - - -
Fast 428 39212 6 1135
Auton/SSE 0 0 0 0
Protocol ARP
Switching path Pkts In Chars In Pkts Out Chars Out
Process 13803 828180 16550 993000
Cache misses 0 - - -
Fast 0 0 0 0
Auton/SSE 0 0 0 0
NOTE: all counts are cumulative and reset only after a reload.
Vlan7
Protocol IP
Switching path Pkts In Chars In Pkts Out Chars Out
Process 151934 14734706 7539 1527065
Cache misses 0 - - -
Fast 156 14866 6 1315
Auton/SSE 0 0 0 0
Protocol ARP
Switching path Pkts In Chars In Pkts Out Chars Out
Process 10696 641760 9251 555060
Cache misses 0 - - -
Fast 0 0 0 0
Auton/SSE 0 0 0 0
NOTE: all counts are cumulative and reset only after a reload.
Vlan8
Throttle count 0
Drops RP 53150 SP 0
SPD Flushes Fast 0 SSE 0
SPD Aggress Fast 0
SPD Priority Inputs 0 Drops 0
Protocol IP
Switching path Pkts In Chars In Pkts Out Chars Out
Process 76358777 5367493319 759413 66953170
Cache misses 0 - - -
Fast 110151 10095563 74 13479
Auton/SSE 0 0 0 0
Protocol ARP
Switching path Pkts In Chars In Pkts Out Chars Out
Process 779028 46741694 459708 27582480
Cache misses 0 - - -
Fast 0 0 0 0
Auton/SSE 0 0 0 0
Protocol Other
Switching path Pkts In Chars In Pkts Out Chars Out
Process 127 7620 0 0
Cache misses 0 - - -
Fast 0 0 0 0
Auton/SSE 0 0 0 0
NOTE: all counts are cumulative and reset only after a reload.
... I notice that almost all the switching is done by process, so I don't know how to work around this.
To put into context. This switch connect directly or indirectly almost 190 other devices, between switches (2960s, 2950s, 3560, and 3750) and access points. It is the default gateway for a lot of VLAN and directly connect our WAN access.
The switch is running IOS 12.2(58)SE2. I have red that in other post (lik this one https://supportforums.cisco.com/discussion/11628666/cisco-3750x-24se-12258se2-cpu-utilization-high) that it's advice to to downgrade to 12.2(55)SE8 for stability reason, but this post is 3 years old so I'd like to know if this remains as a valid solution or is another preferable IOS version to work around this problem.
From the "show proc cpu" command I can see other 2 proccess eating CPU resource, HULC LED and Spanning Tree, but I'd like to troubleshoot the IP Input one first, cause is the top most one.
Please, advise for where should I look for solving this problem.
Thanks very much.
10-22-2015 07:08 AM
Hello
Can you also post the output of show platform tcam utilization ?
Is it possible that you've exceeded your TCAM limitations, or that you use an incorrect SDM template on your switch (show sdm prefer) ?
It's possible that you have a lot of traffic that needs to be process switched (check what traffic is destined to the switch itself and if you see any excessive communication - e.g. ARP). Check your routing (especially static routes) configuration.
Best would be if you could share the entire running-config.
Also this link might be helpful: http://www.cisco.com/c/en/us/support/docs/switches/catalyst-3750-series-switches/68461-high-cpu-utilization-cat3750.html
Best regards,
Martin
10-22-2015 08:59 AM
Hello Martin. Sorry ... forgot to post tcam utilization. I checked that, and there is no limit problem.
#show platform tcam utilization
CAM Utilization for ASIC# 0 Max Used
Masks/Values Masks/values
Unicast mac addresses: 6364/6364 2212/2212
IPv4 IGMP groups + multicast routes: 1120/1120 1/1
IPv4 unicast directly-connected routes: 6144/6144 772/772
IPv4 unicast indirectly-connected routes: 2048/2048 180/180
IPv4 policy based routing aces: 452/452 12/12
IPv4 qos aces: 512/512 21/21
IPv4 security aces: 964/964 36/36
Running "show sdm prefer" show me that SDM is set to "desktop default"
#show sdm prefer
The current template is "desktop default" template.
The selected template optimizes the resources in
the switch to support this level of features for
8 routed interfaces and 1024 VLANs.
number of unicast mac addresses: 6K
number of IPv4 IGMP groups + multicast routes: 1K
number of IPv4 unicast routes: 8K
number of directly-connected IPv4 hosts: 6K
number of indirect IPv4 routes: 2K
number of IPv4 policy based routing aces: 0
number of IPv4/MAC qos aces: 0.5K
number of IPv4/MAC security aces: 1K
I had already check that link for torubleshooting high CPU on cat3750 and paid attention to the static route ARP problem, but they pointed out the cases where a static route is configured for a broadcast interface, which is not my case. But, anyway, this is my running config (some sensitive data will be omitted or modified)
#sh run
Building configuration...
Current configuration : 11031 bytes
!
! Last configuration change at 16:55:22 UTC Wed Oct 21 2015
!
version 12.2
no service pad
service timestamps debug datetime msec
service timestamps log datetime msec
no service password-encryption
service sequence-numbers
!
!
boot-start-marker
boot-end-marker
!
!
no aaa new-model
clock timezone UTC -3 0
switch 1 provision ws-c3750x-24s
system mtu routing 1500
ip routing
!
!
mls qos
!
crypto pki trustpoint TP-self-signed-298484864
enrollment selfsigned
subject-name cn=IOS-Self-Signed-Certificate-298484864
revocation-check none
rsakeypair TP-self-signed-298484864
!
!
crypto pki certificate chain TP-self-signed-298484864
certificate self-signed 01
/* omited */
quit
spanning-tree mode pvst
spanning-tree extend system-id
!
!
!
!
vlan internal allocation policy ascending
!
!
!
!
!
!
!
interface FastEthernet0
no ip address
no ip route-cache cef
no ip route-cache
shutdown
!
interface GigabitEthernet1/0/1
switchport trunk encapsulation dot1q
switchport trunk allowed vlan 1-19,21-4094
switchport mode trunk
!
interface GigabitEthernet1/0/2
switchport trunk encapsulation dot1q
switchport trunk allowed vlan 1-19,21-280,282-4094
switchport mode trunk
shutdown
!
interface GigabitEthernet1/0/3
switchport trunk encapsulation dot1q
switchport mode trunk
!
interface GigabitEthernet1/0/4
switchport trunk encapsulation dot1q
switchport mode trunk
!
interface GigabitEthernet1/0/5
switchport trunk encapsulation dot1q
switchport mode trunk
!
interface GigabitEthernet1/0/6
switchport trunk encapsulation dot1q
switchport mode trunk
!
interface GigabitEthernet1/0/7
switchport trunk encapsulation dot1q
switchport mode trunk
!
interface GigabitEthernet1/0/8
switchport mode access
shutdown
!
interface GigabitEthernet1/0/9
switchport access vlan 28
switchport mode access
spanning-tree portfast
!
interface GigabitEthernet1/0/10
!
interface GigabitEthernet1/0/11
switchport trunk encapsulation dot1q
switchport trunk allowed vlan 1,16,200-203
switchport mode trunk
!
interface GigabitEthernet1/0/12
switchport trunk encapsulation dot1q
switchport trunk allowed vlan 1,16,200-203
switchport mode trunk
!
interface GigabitEthernet1/0/13
switchport trunk encapsulation dot1q
switchport mode trunk
!
interface GigabitEthernet1/0/14
switchport trunk encapsulation dot1q
switchport mode trunk
!
interface GigabitEthernet1/0/15
switchport trunk encapsulation dot1q
switchport mode trunk
!
interface GigabitEthernet1/0/16
switchport trunk encapsulation dot1q
switchport mode trunk
!
interface GigabitEthernet1/0/17
switchport trunk encapsulation dot1q
switchport mode trunk
!
interface GigabitEthernet1/0/18
switchport trunk encapsulation dot1q
switchport mode trunk
!
interface GigabitEthernet1/0/19
switchport access vlan 3
switchport mode access
!
interface GigabitEthernet1/0/20
switchport access vlan 211
switchport trunk encapsulation dot1q
switchport mode access
speed 100
!
interface GigabitEthernet1/0/21
switchport access vlan 3
switchport trunk encapsulation dot1q
switchport mode access
speed 100
!
interface GigabitEthernet1/0/22
switchport access vlan 230
switchport trunk encapsulation dot1q
switchport mode access
!
interface GigabitEthernet1/0/23
switchport trunk encapsulation dot1q
switchport mode trunk
!
interface GigabitEthernet1/0/24
switchport trunk encapsulation dot1q
switchport trunk native vlan 8
switchport mode trunk
duplex full
!
interface GigabitEthernet1/1/1
shutdown
!
interface GigabitEthernet1/1/2
shutdown
!
interface GigabitEthernet1/1/3
shutdown
!
interface GigabitEthernet1/1/4
shutdown
!
interface TenGigabitEthernet1/1/1
shutdown
!
interface TenGigabitEthernet1/1/2
shutdown
!
interface Vlan1
ip address 10.100.10.1 255.255.255.0
no ip route-cache cef
no ip route-cache
no ip mroute-cache
!
interface Vlan2
ip address X.X.171.126 255.255.255.128
!
interface Vlan3
no ip address
!
interface Vlan4
no ip address
!
interface Vlan6
ip address X.X.165.65 255.255.255.192
!
interface Vlan7
ip address X.X.166.190 255.255.255.224
!
interface Vlan8
ip address X.X.161.254 255.255.255.224 secondary
ip address X.X.163.2 255.255.255.0 secondary
ip address X.X.152.2 255.255.255.0
!
interface Vlan9
ip address X.X.166.62 255.255.255.224
!
interface Vlan10
no ip address
!
interface Vlan12
ip address X.X.161.126 255.255.255.128
!
interface Vlan13
description *** Red Pabellon C ***
ip address X.X.171.254 255.255.255.192
!
interface Vlan14
ip address X.X.167.254 255.255.255.240
!
interface Vlan15
ip address X.X.166.65 255.255.255.192
!
interface Vlan16
no ip address
!
interface Vlan17
ip address X.X.167.62 255.255.255.192
!
interface Vlan19
ip address X.X.161.190 255.255.255.192
!
interface Vlan21
ip address X.X.167.126 255.255.255.192
!
interface Vlan22
ip address X.X.171.158 255.255.255.224
!
interface Vlan27
ip address X.X.162.226 255.255.255.252
!
interface Vlan28
description *** Red Borde Filtrada ***
ip address X.X.162.242 255.255.255.240
!
interface Vlan30
ip address 10.201.6.67 255.255.255.248
!
interface Vlan60
ip address 192.168.60.1 255.255.255.0
!
interface Vlan61
no ip address
!
interface Vlan100
ip address 192.168.100.1 255.255.255.0
!
interface Vlan200
no ip address
!
interface Vlan201
no ip address
!
interface Vlan240
ip address X.X.168.253 255.255.255.0
no ip route-cache cef
no ip route-cache
!
interface Vlan250
ip address X.X.171.190 255.255.255.240
!
interface Vlan251
ip address 192.168.101.1 255.255.255.0
!
interface Vlan252
ip address 192.168.102.1 255.255.255.0
!
interface Vlan253
ip address 192.168.103.30 255.255.255.224
!
interface Vlan260
ip address X.X.162.190 255.255.255.192
!
interface Vlan280
description *** VDI PB ***
no ip address
!
interface Vlan281
description *** VDI AyF ***
no ip address
!
interface Vlan298
no ip address
!
interface Vlan301
description *** Equipos de Borde ***
no ip address
!
router rip
redistribute connected
network 10.0.0.0
network X.X.0.0
no auto-summary
!
ip default-gateway 10.100.10.1
!
ip http server
ip http secure-server
!
ip route 0.0.0.0 0.0.0.0 X.X.162.225
ip route 10.83.14.0 255.255.255.192 X.X.162.225
ip route 10.201.6.0 255.255.255.0 X.X.162.225
ip route X.X.160.0 255.255.255.128 X.X.162.225
ip route X.X.160.128 255.255.255.240 X.X.162.225
ip route X.X.160.144 255.255.255.240 X.X.162.225
ip route X.X.160.160 255.255.255.224 X.X.162.225
ip route X.X.160.192 255.255.255.192 X.X.162.225
ip route X.X.161.192 255.255.255.224 X.X.162.225
ip route X.X.162.0 255.255.255.128 X.X.162.225
ip route X.X.162.128 255.255.255.192 X.X.162.225
ip route X.X.162.232 255.255.255.248 X.X.162.225
ip route X.X.162.240 255.255.255.240 X.X.162.241
ip route X.X.165.0 255.255.255.192 X.X.162.225
ip route X.X.165.128 255.255.255.128 X.X.162.225
ip route X.X.166.0 255.255.255.224 X.X.162.225
ip route X.X.166.128 255.255.255.224 X.X.162.225
ip route X.X.166.192 255.255.255.192 X.X.162.225
ip route X.X.167.128 255.255.255.192 X.X.162.225
ip route X.X.168.0 255.255.255.0 X.X.162.225
ip route X.X.169.0 255.255.255.0 X.X.162.225
ip route X.X.170.0 255.255.255.0 X.X.162.225
ip route X.X.171.160 255.255.255.240 X.X.162.225
ip route X.X.172.0 255.255.255.128 X.X.162.225
ip route X.X.172.128 255.255.255.240 X.X.162.225
ip route X.X.172.144 255.255.255.240 X.X.162.225
ip route X.X.172.160 255.255.255.224 X.X.162.225
ip route X.X.172.192 255.255.255.240 X.X.162.225
ip route X.X.172.208 255.255.255.240 X.X.162.225
ip route X.X.172.224 255.255.255.224 X.X.162.225
ip route 172.29.57.0 255.255.255.0 X.X.162.225
ip route 192.168.61.0 255.255.255.0 192.168.60.3
ip route 192.168.62.0 255.255.255.0 192.168.60.3
!
ip sla enable reaction-alerts
logging esm config
logging X.X.161.237
access-list 10 permit 10.83.14.18
access-list 120 deny ip host 10.83.14.18 any
access-list 120 permit tcp host X.X.168.46 any eq www
access-list 120 deny ip any any
!
snmp-server community <comunity> RO
snmp-server host X.X.162.152 <comunity>
!
!
!
!
monitor session 1 source interface Gi1/0/24
monitor session 1 destination interface Gi1/0/10
ntp server X.X.162.152
end
10-22-2015 02:25 PM
Hello Dago,
Last time I experienced a similar issue it was due to a loop somewhere in the LAN..
A couple of questions : Is the CPU at 99% most of the time ? 'show proc cpu history'..
Is there any specific logs on your core and access switches ?
Please find some ideas :
- Try to check on which interfae(s) high traffic is coming.
- I know that it is your central core switch, but I would advise to shut the interfaces going to each access switch then see if the CPU decreases. If not renable and test for another interface.
Thank you.
Karim
10-23-2015 07:09 AM
Hello krahmani323,
#sh proc cpu history
999999999999999999999999999999999999999999999999999999999999
999999999999999999999999999999999999999999999999999999999999
100 **********************************************************
90 **********************************************************
80 **********************************************************
70 **********************************************************
60 **********************************************************
50 **********************************************************
40 **********************************************************
30 **********************************************************
20 **********************************************************
10 **********************************************************
0....5....1....1....2....2....3....3....4....4....5....5....6
0 5 0 5 0 5 0 5 0 5 0
CPU% per second (last 60 seconds)
1 11 1 1 11 1 1 11 11 11
999090099099999099900999099999999099999900999999999900999009
999090099099999099900999099999999099999900999999999900999009
100 ##########################################################
90 ##########################################################
80 ##########################################################
70 ##########################################################
60 ##########################################################
50 ##########################################################
40 ##########################################################
30 ##########################################################
20 ##########################################################
10 ##########################################################
0....5....1....1....2....2....3....3....4....4....5....5....6
0 5 0 5 0 5 0 5 0 5 0
CPU% per minute (last 60 minutes)
* = maximum CPU% # = average CPU%
111 1111111111 111111111 1111111
000999999999999900000000009998999999999990000000009999999999999990000000
000699299599489900000000009886292299799990000000009998888899999890000000
100 ##**** ***** ****#########*** * ******##########***************#####
90 ##*************###########**************##########***************#####
80 ###**********##############************############*************######
70 #############################**#*#**##################################
60 ######################################################################
50 ######################################################################
40 ######################################################################
30 ######################################################################
20 ######################################################################
10 ######################################################################
0....5....1....1....2....2....3....3....4....4....5....5....6....6....7..
0 5 0 5 0 5 0 5 0 5 0 5 0
CPU% per hour (last 72 hours)
* = maximum CPU% # = average CPU%
001505: Oct 23 13:13:08.168: %SW_MATM-4-MACFLAP_NOTIF: Host b84f.d527.bbd3 in vlan 200 is flapping between port Gi1/0/12 and port Gi1/0/1
I'll keep on troubleshooting this issue.
Thanks
10-23-2015 07:23 AM
The HULC is a cosmetic bug so you can deduct that from your total not that it really helps in your case but best not to waste time tracking it , known issue on 3750/3560/2960s etc just put 3750 hulc bug into google you will see most platforms/versions are effected.
your process switching vlan 1 & 240 so everything from that subnet is being punted through the cpu you should turn cef back on to prevent that , remove the no ip route-cache cef from under the svis
10-23-2015 11:31 AM
Hello Mark,
Thanks for the HULC procces advice.
About VLAN 1 and 240, I did correct the "no ip route-cache cef" command and at least some package started to be CEF-switched, but not all of them. Eventually the CPU problem persist.
As you can see almost all package are process-switched
#show interfaces vlan 240 switching
Vlan240
Throttle count 0
Drops RP 12209 SP 0
SPD Flushes Fast 0 SSE 0
SPD Aggress Fast 0
SPD Priority Inputs 0 Drops 0
Protocol IP
Switching path Pkts In Chars In Pkts Out Chars Out
Process 71295805 5034710305 86603 7877115
Cache misses 0 - - -
Fast 633 45994 351 101781
Auton/SSE 0 0 0 0
Protocol ARP
Switching path Pkts In Chars In Pkts Out Chars Out
Process 771524 46291440 195426 11725560
Cache misses 0 - - -
Fast 0 0 0 0
Auton/SSE 0 0 0 0
Protocol Other
Switching path Pkts In Chars In Pkts Out Chars Out
Process 7 420 0 0
Cache misses 0 - - -
Fast 0 0 0 0
Auton/SSE 0 0 0 0
NOTE: all counts are cumulative and reset only after a reload.
This behavior present in all SVIs interfaces.
More evidence that a lot of package aren't CEF-switched
#show ip cef switching statistics
Reason Drop Punt Punt2Host
RP LES No route 7 0 3
RP LES No adjacency 261761 0 0
RP LES Incomplete adjacency 8241 0 0
RP LES TTL expired 0 0 4
RP LES IP options set 0 0 1669
RP LES IP redirects 0 0 1
RP LES Neighbor resolution req 673552 553 0
RP LES Total 943561 553 1677
All Total 943561 553 1677
What should I look for now?
Thanks
10-23-2015 05:09 PM
What IOS is the stack running on?
IP Input normally means the switch (stack) is being hammered by the client, like a SAN server pushing so much data. If this is the case, then a command like "sh interface counter errors" will show which ports are are incrementing "Total output drops".
10-26-2015 06:32 AM
Hi Leo,
The switch (it just one switch, not a stack) is running IOS 12.2(58)SE2. As seen in an old post (3 years ago) you encourage to downgrade to 12.2(55)SE8. It's still as a valid solution in your opinion? At this time is there another stable IOS version?
Running "sh int counter errors" command made me notice a lot of error coming through another cat 3750 switch (L2) which connect a couple of host.
#sh int count err Port Align-Err FCS-Err Xmit-Err Rcv-Err UnderSize OutDiscards Gi1/0/1 0 0 0 0 0 0 Gi1/0/2 0 0 0 0 0 0 Gi1/0/3 0 0 0 0 0 0 Gi1/0/4 0 0 0 0 0 0 Gi1/0/5 0 0 0 0 0 0 Gi1/0/6 0 0 0 0 0 0 Gi1/0/7 0 0 0 0 0 0 Gi1/0/8 0 0 0 0 0 0 Gi1/0/9 0 0 0 0 0 0 Gi1/0/10 0 0 0 0 0 0 Gi1/0/11 0 0 0 0 0 0 Gi1/0/12 0 0 0 0 0 0 Gi1/0/13 0 0 0 0 0 0 Gi1/0/14 0 3 0 4 0 26587 Gi1/0/15 0 0 0 0 0 0 Gi1/0/16 0 0 0 0 0 0 Gi1/0/17 0 0 0 0 0 0 Gi1/0/18 0 0 0 0 0 0 Gi1/0/19 0 0 0 0 0 0 Gi1/0/20 0 0 0 0 0 0 Gi1/0/21 0 0 0 0 0 4411 Gi1/0/22 0 0 0 0 0 0 Gi1/0/23 0 0 0 0 0 43 Gi1/0/24 0 0 0 0 0 0 Port Single-Col Multi-Col Late-Col Excess-Col Carri-Sen Runts Giants Gi1/0/1 0 0 0 0 0 0 0 Gi1/0/2 0 0 0 0 0 0 0 Gi1/0/3 0 0 0 0 0 0 0 Gi1/0/4 0 0 0 0 0 0 0 Gi1/0/5 0 0 0 0 0 0 0 Gi1/0/6 0 0 0 0 0 0 0 Gi1/0/7 0 0 0 0 0 0 0 Gi1/0/8 0 0 0 0 0 0 0 Gi1/0/9 0 0 0 0 0 0 0 Gi1/0/10 0 0 0 0 0 0 0 Gi1/0/11 0 0 0 0 0 0 0 Gi1/0/12 0 0 0 0 0 0 5 Gi1/0/13 0 0 0 0 0 0 0 Gi1/0/14 0 0 0 0 0 0 1 Gi1/0/15 0 0 0 0 0 0 0 Gi1/0/16 0 0 0 0 0 0 0 Gi1/0/17 0 0 0 0 0 0 0 Gi1/0/18 0 0 0 0 0 0 0 Gi1/0/19 0 0 0 0 0 0 0 Gi1/0/20 0 0 0 0 0 0 0 Gi1/0/21 0 0 0 0 0 0 0 Gi1/0/22 0 0 0 0 0 0 0 Gi1/0/23 0 0 0 0 0 0 0 Gi1/0/24 0 0 0 0 0 0 0
As a test, I shutdown this interface, but high CPU usage remains. I'm aware that I must workarround this OutDiscard package problem, but I'll leave it for after solving the Ip Input one first.
Keep on troubleshooting. Any new idea?
Thanks
10-26-2015 08:05 PM
As seen in an old post (3 years ago) you encourage to downgrade to 12.2(55)SE8. It's still as a valid solution in your opinion? At this time is there another stable IOS version?
Yes. Upgrade to the latest 12.2(55)SE-train, which is 12.2(55)SE10.
I've tested 12.2(58)SE2 about four years ago and I wasn't impressed with this version at all. It's all got something to do with CPU spiking.
10-26-2015 11:49 PM
Hi Dago,
You could upgrade to 15.0(2)SE8 too, instead of a downgrade to 55SE10.
Also the MAC flap you mentioned above might not be expected. Are you sure the MAC addresses are that of the wireless clients? The wireless client MAC addresses are not learned on the 3750X, it would be learned on the controller. I feel there could be a loop in the network, causing the flap?
Regards,
Roopa
10-27-2015 08:10 AM
Hi roor,
About changing IOS version. Is there any problem with licenses level? Right now is "ipservice" which is permanent. Do I need another license for upgrade or downgrade? This switch was bought to a service provider but we don't have a cisco account with permissions for download this IOS software. Do we have to ask the service provider for the new IOS software? Should be a purchase involved?
About MAC flap associated to wifi clients. I am pretty sure that at least MACFLAP messages are coming from wireless client, cause it always mention our wifi VLAN (200,201,202) flapping through interface that indirectly connect access points.
As an overall, first we started with a series of standalone APs (some Cisco and Ubiquiti ones) broadcasting 3 SSID which were catch by a captive portal running ontop of a pfSense box, which provide NAT, routing and firewall capabilities also. Attached is a simplified diagram of wireless network (wifi_simplificado1.png).
Now we add a Cisco 5508 WLC and change most of the APs to controller based AIRCAP 2700 series (wifi_simplificado2.png attached file). Right now we are in the procces of replacing all the APs so the wlc is just administering AP functionallities but wifi client traffic steel pass through pfsense box and, eventually, through the 3750x switch. PfSense based captive portal is still there for backward compatibility.
In this scenario. Do you still think that MAC flap message would be somehow a problem related to the high CPU usage?
Thanks very much,
Dago
10-28-2015 12:29 PM
Hi Dago,
IPServices license is fine. You should be able to just download the new software.
I still feel the MAC Flap would have to do with the High CPU. Since 5508 is the controller used, the MACs flapping on 3750X might not be that of the wireless clients as the wireless client MAC should be learned over capwap between the APs and the controller.
HTH,
Roopa
11-06-2015 04:40 AM
So ... keeping on troubleshooting this problem, I was able to determinate that most of CPU work load was comming form an interface which directly connect another 3750X (L2) and underectly almost a thirth of our network devices. Shuting down that interface low down CPU usage to 50% aprox.
So I did a package capture with wireshark that interface. Whith just about 100 seconds of capture I got a 2GB file which had a grate ammount of ARP broadcast packages (check wireshark_iograph.pn attached file).
Will workarround this issue. Any suggestion onhow should I proceed?
Thanks
11-06-2015 06:36 AM
you could turn off gratuitous arps
no ip gratuitous-arps
sometimes though when you see way to many of these you could have an infrected host
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide