02-28-2012 06:40 PM - edited 03-07-2019 05:14 AM
Hi There,
We are experiencing continuing high CPU issues on our 6500 (generally 80+% most times).
#show proc cpu sorted | exc 0.00
CPU utilization for five seconds: 75%/67%; one minute: 69%; five minutes: 68%
PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
273 2830496584 994142083 2847 2.95% 1.16% 1.10% 0 Port manager per
77 1364 820 1663 1.59% 0.33% 0.10% 1 SSH Process
118 19546227921584654679 1233 1.35% 1.96% 1.49% 0 IP Input
9 8752212681261616535 693 0.55% 0.79% 0.83% 0 ARP Input
170 134354752 30707951 4375 0.31% 0.17% 0.16% 0 Adj Manager
206 5639136443371313851 0 0.23% 0.50% 0.23% 0 Standby (HSRP)
171 145656920 222502762 654 0.23% 0.19% 0.19% 0 CEF process
3 7512089321019079255 737 0.15% 0.22% 0.57% 0 IP-EIGRP(4): PDM
299 84202084 83653622 1006 0.15% 0.07% 0.08% 0 IPC LC Message H
124 47784372 96042068 497 0.15% 0.06% 0.04% 0 ARP HA
#sh ver
Cisco Internetwork Operating System Software
IOS (tm) s72033_rp Software (s72033_rp-ADVIPSERVICESK9_WAN-M), Version 12.2(18)SXF10, RELEASE SOFTWARE (fc1)
.
.
cisco WS-C6513 (R7000) processor (revision 1.0) with 458720K/65536K bytes of memory.
Processor board ID TSC072300KA
SR71000 CPU at 600Mhz, Implementation 0x504, Rev 1.2, 512KB L2 Cache
I have performed a "debug netdr capture rx" and "debug netdr capture tx" and can see that the majority of packet captured are Netbackups (TCP 13724). Can anyone please tell me why this sort of traffic is being punted to the CPU from the captures below and how we might stop it being punted to the CPU???
------- dump of incoming inband packet -------
interface Te9/8, routine draco2_process_rx_packet_inline
dbus info: src_vlan 0x3FB(1019), src_indx 0x207(519), len 0x59E(1438)
bpdu 0, index_dir 0, flood 0, dont_lrn 0, dest_indx 0x380(896)
28020401 03FB0400 02070005 9E000000 00060520 09000040 00000000 03800000
mistral hdr: req_token 0x0(0), src_index 0x207(519), rx_offset 0x76(118)
requeue 0, obl_pkt 0, vlan 0x3FB(1019)
destmac 00.07.B3.0B.B7.40, srcmac 00.0E.D6.0B.9D.C0, protocol 0800
protocol ip: version 0x04, hlen 0x05, tos 0x00, totlen 1420, identifier 2675
df 1, mf 0, fo 0, ttl 127, src 172.17.3.51, dst 10.61.78.59
tcp src 13724, dst 50486, seq 3079014292, ack 4193993640, win 65524 off 5 checksum 0x9D45 ack
------- dump of outgoing inband packet -------
interface Te9/2, routine send_one_bufhdr_pkt
dbus info: src_vlan 0x5DC(1500), src_indx 0x207(519), len 0x59E(1438)
bpdu 0, index_dir 0, flood 0, dont_lrn 0, dest_indx 0x380(896)
00020000 05DC2C00 02070005 9E000000 00060520 00000040 00000000 03800000
mistral hdr: req_token 0x0(0), src_index 0x207(519), rx_offset 0x76(118)
requeue 0, obl_pkt 0, vlan 0x3FB(1019)
destmac 00.1A.30.2A.0A.40, srcmac 00.07.B3.0B.B7.40, protocol 0800
protocol ip: version 0x04, hlen 0x05, tos 0x00, totlen 1420, identifier 12067
df 1, mf 0, fo 0, ttl 126, src 172.17.3.51, dst 10.61.78.59
tcp src 13724, dst 50486, seq 1264642152, ack 4193993640, win 65524 off 5 checksum 0x98C ack
Thanks.
Andy
02-28-2012 08:39 PM
Hi Andy,
Te9/2 and Te9/8 - are they part of ether-channel or there is routing load-balancing one hop before thise router? I see that same IP packets (same source and destination) are coming from both these interfaces.
Can you first of all check the routing for these packets. What interface should they be sent out:
show ip cef 10.61.78.59
and
show ip route 10.61.78.59
If those are sent out of same ints they were receievd on it would trigger the packet to be sent to CPU for ICMP redirect. If there is no routing then it will trigger those to be sent to CPU for ICMP unreachable to be generated.
Please also check the MTU on the outgoing interface - if that is lower than 1438 then packets will be sent to CPU and dropped as Dont Fragment bit is set.
So please check the above first to agree on next plan.
Nik
02-28-2012 10:19 PM
Hi Nik,
Thank you for the reply...
1/ Te9/8 and Te9/2 are not part of any Ether-channel.
The traffic flow looks like this:
172.17.3.51 -> 17rsw01 (Vlan120) -> Te9/8 [14rsw02] Te9/2 -> core17lsr01 -> [cloud] -> 10.61.78.59
The 6500 has the hostname of 14rsw02.
So the packet is received (rx) on Te9/8 and then sent out (tx) Te9/2.
2/ cef and routing table below.
#show ip cef 10.61.78.59
10.61.64.0/19, version 1845943, epoch 2, cached adjacency 10.42.255.197
0 packets, 0 bytes
Flow: AS 0, mask 19
via 10.42.255.197, 0 dependencies, recursive
next hop 10.42.255.197, TenGigabitEthernet9/2.1500 via 10.42.255.197/32 (Default)
valid cached adjacency
#show ip route 10.61.78.59
Routing entry for 10.61.64.0/19
Known via "bgp 64610", distance 20, metric 0
Tag 64600, type external
Redistributing via eigrp 10
Advertised by eigrp 10 metric 10000000 10 255 1 1500
Last update from 10.42.255.197 2w0d ago
Routing Descriptor Blocks:
* 10.42.255.197, from 10.42.255.197, 2w0d ago
Route metric is 0, traffic share count is 1
AS Hops 2
Route tag 64600
3/ MTU on outgoing interface
#show int TenGigabitEthernet9/2
TenGigabitEthernet9/2 is up, line protocol is up (connected)
Hardware is C6k 10000Mb 802.3, address is 0007.b30b.b740 (bia 0007.b30b.b740)
Description: Trunk interface to core17lsr01-TenGig1/1
MTU 1500 bytes, BW 10000000 Kbit, DLY 10 usec,
interface TenGigabitEthernet9/2.1500
description Internal Subinterface
encapsulation dot1Q 1500
ip address 10.42.255.198 255.255.255.252
ip access-group test in
ip access-group test out
#sh access-lists test
Extended IP access list test
10 permit ip host 172.17.2.22 host 10.91.32.10 log-input
20 permit ip host 172.17.2.22 host 10.91.32.11 (615426 matches)
30 permit ip host 10.91.32.10 host 172.17.2.22 log-input
40 permit ip host 10.91.32.11 host 172.17.2.22 (7 matches)
50 permit ip any any (2000903900 matches)
What might be the next step???
thanks.
Andy
02-28-2012 10:35 PM
Ok,
So the problem seems to be first of all these packets:
------- dump of outgoing inband packet -------
interface Te9/2, routine send_one_bufhdr_pkt <=============== Incoming int Te9/2
dbus info: src_vlan 0x5DC(1500), src_indx 0x207(519), len 0x59E(1438)
bpdu 0, index_dir 0, flood 0, dont_lrn 0, dest_indx 0x380(896)
00020000 05DC2C00 02070005 9E000000 00060520 00000040 00000000 03800000
mistral hdr: req_token 0x0(0), src_index 0x207(519), rx_offset 0x76(118)
requeue 0, obl_pkt 0, vlan 0x3FB(1019)
destmac 00.1A.30.2A.0A.40, srcmac 00.07.B3.0B.B7.40, protocol 0800
protocol ip: version 0x04, hlen 0x05, tos 0x00, totlen 1420, identifier 12067
df 1, mf 0, fo 0, ttl 126, src 172.17.3.51, dst 10.61.78.59
tcp src 13724, dst 50486, seq 1264642152, ack 4193993640, win 65524 off 5 checksum 0x98C ack
See they arrive at Te9/2 while they should arive at Te9/8
Then as per routing they again sent to Te9/2
#show ip cef 10.61.78.59
10.61.64.0/19, version 1845943, epoch 2, cached adjacency 10.42.255.197
0 packets, 0 bytes
Flow: AS 0, mask 19
via 10.42.255.197, 0 dependencies, recursive
next hop 10.42.255.197, TenGigabitEthernet9/2.1500 via 10.42.255.197/32 (Default)
- which will indeed send packets to CPU to generate ICMP redirect.
So you first of all need to check why core send these packets back and not to final destination through the cloud. As timely measyre you can configure "no ip redirect" on Te9/8 and TE9/2 to stop ICMP redirects and possibly stop packets to be punted to CPU. Though that not clear main problem with core routing which should be cleared.
Nik
02-29-2012 03:14 PM
Hi Nik,
Thanks again for the reply.
I'm not sure why the core would send the packet back out to 14rsw02:Te9/2?????
When I do a traceroute from the source Vlan, it routes fine and i don't see it looping from the core back to 14rsw02:Te9/2.
\\ 17rsw01
17rsw01#traceroute 10.61.78.59 source Vlan120
Type escape sequence to abort.
Tracing the route to p01540.internal.vic.gov.au (10.61.78.59)
1 10.42.255.133 4 msec 0 msec 4 msec <-- 14rsw02
2 10.42.255.197 4 msec 4 msec 4 msec <-- core17lsr01
3 10.60.0.34 [MPLS: Labels 1027/19503 Exp 0] 4 msec 4 msec 4 msec
4 10.61.106.5 [MPLS: Label 19503 Exp 0] 8 msec 4 msec 4 msec
5 10.61.106.6 4 msec 4 msec 4 msec
6 10.61.105.107 4 msec 4 msec 4 msec
7 10.61.78.59 4 msec 4 msec 4 msec
interface Vlan120
description Production Vlan
ip address 172.17.3.1 255.255.255.0
ip helper-address 152.147.128.60
ip helper-address 152.147.225.10
no ip redirects
ip directed-broadcast
ip flow ingress
17rsw01#sh ip cef 10.61.78.59
10.61.64.0/19
nexthop 10.42.255.133 TenGigabitEthernet9/8 <-- sends it to 14rsw02
\\ 14rsw02
interface TenGigabitEthernet9/8
description nh17rsw01:Te9/8
ip address 10.42.255.133 255.255.255.252
no ip redirects
ip directed-broadcast
ip route-cache flow
ip summary-address eigrp 10 192.168.110.0 255.255.254.0 5
ip summary-address eigrp 10 152.147.176.0 255.255.248.0 5
ip summary-address eigrp 10 152.147.160.0 255.255.248.0 5
ip summary-address eigrp 10 152.147.128.0 255.255.224.0 5
ip summary-address eigrp 10 10.42.0.0 255.255.192.0 5
ip policy route-map CSE105-DTFDPC
Any further ideas as the routing seems to be fine?
Thanks.
Andy
02-29-2012 06:21 PM
You need to check the routing on the core - to see if that is flapping. It is core who is sending those packets back - so you need to understand what is happening with routing there. As A workaround for High CPU you can configure "no ip redirect" on Te9/2.1500 - did you try it?
As for the core - really look close for the routing - see if this particular route is fresh in your routing protocol database.
These two packets below:
------- dump of incoming inband packet -------
interface Te9/8, routine draco2_process_rx_packet_inline
dbus info: src_vlan 0x3FB(1019), src_indx 0x207(519), len 0x59E(1438)
bpdu 0, index_dir 0, flood 0, dont_lrn 0, dest_indx 0x380(896)
28020401 03FB0400 02070005 9E000000 00060520 09000040 00000000 03800000
mistral hdr: req_token 0x0(0), src_index 0x207(519), rx_offset 0x76(118)
requeue 0, obl_pkt 0, vlan 0x3FB(1019)
destmac 00.07.B3.0B.B7.40, srcmac 00.0E.D6.0B.9D.C0, protocol 0800
protocol ip: version 0x04, hlen 0x05, tos 0x00, totlen 1420, identifier 2675
df 1, mf 0, fo 0, ttl 127, src 172.17.3.51, dst 10.61.78.59
tcp src 13724, dst 50486, seq 3079014292, ack 4193993640, win 65524 off 5 checksum 0x9D45 ack
------- dump of outgoing inband packet -------
interface Te9/2, routine send_one_bufhdr_pkt
dbus info: src_vlan 0x5DC(1500), src_indx 0x207(519), len 0x59E(1438)
bpdu 0, index_dir 0, flood 0, dont_lrn 0, dest_indx 0x380(896)
00020000 05DC2C00 02070005 9E000000 00060520 00000040 00000000 03800000
mistral hdr: req_token 0x0(0), src_index 0x207(519), rx_offset 0x76(118)
requeue 0, obl_pkt 0, vlan 0x3FB(1019)
destmac 00.1A.30.2A.0A.40, srcmac 00.07.B3.0B.B7.40, protocol 0800
protocol ip: version 0x04, hlen 0x05, tos 0x00, totlen 1420, identifier 12067
df 1, mf 0, fo 0, ttl 126, src 172.17.3.51, dst 10.61.78.59
tcp src 13724, dst 50486, seq 1264642152, ack 4193993640, win 65524 off 5 checksum 0x98C ack
Those are same in terms of ip addresses - but coming through oposite interfaces. One below has even TTL decremented - so it seems to be reaching core and beeing sent back. Possibly you have some policy-routing for UDP enabled or smth else which still allowing ICMP to pass successfully and UDP to route back. Or the route may just flap - so you need to inspect it on core.
You can share core configuration and "show ip route", "show ip cef" for start if you want me to help with it.
Nik
02-29-2012 09:07 PM
Hi Nik,
I'll update the network diagram because once it hits the core, it's leaked into an INTERNAL VRF and then carried across the MPLS Core to the destination IP..
172.17.3.51 -> Gi13/30 [17rsw01] Te9/8 -> Te9/8 [14rsw02] Te9/2.1500 -> Te1/1.1500 [core17lsr01] Te2/2 -> [core17lsr02] --[MPLS Core]--> 10.61.78.59
\\ core17lsr01
interface TenGigabitEthernet1/1.1500
encapsulation dot1Q 1500
ip vrf forwarding INTERNAL
ip address 10.42.255.197 255.255.255.252
service-policy input IP-MPLS
service-policy output IP-IP
The service policy on this interface just places traffic into it's traffic class (eg: Gold, Silver, Bronze, etc).
Routing Tables on Core Router:
mel80cs17lsr01#sh ip route vrf INTERNAL 10.61.78.59
Routing Table: INTERNAL
Routing entry for 10.61.64.0/19
Known via "bgp 64600", distance 200, metric 0
Tag 64613, type internal
Redistributing via eigrp 15
Advertised by eigrp 15
Last update from 10.60.3.2 2w1d ago
Routing Descriptor Blocks:
* 10.60.3.2 (default), from 10.60.3.2, 2w1d ago
Route metric is 0, traffic share count is 1
AS Hops 1
Route tag 64613
MPLS Required
mel80cs17lsr01#show ip cef vrf INTERNAL10.61.78.59
10.61.64.0/19
nexthop 10.60.0.34 TenGigabitEthernet2/2 label 1027 19503
interface TenGigabitEthernet2/2
description MPLS Network to core17lsr02
mtu 9216
ip address 10.60.0.33 255.255.255.252
ip mtu 9000
mpls mtu 9100
mpls ip
Not sure what more I can look at to see if the route is flapping.
And on another issue (maybe for later), it doesn't explain why the captured netdr RX packets eventhough they are coming in the right interface (Te9/8) are being punted to the CPU.
Thanks.
Andy
02-29-2012 11:20 PM
Few questions:
- do you still got those packets there in CPU or that is stopped?
- can you do a new netdr and see if those are still coming from Te9/2
- did you apply no ip redirect to Te9/2?
Just want to make sure issue is still there.
Nik
03-01-2012 02:31 PM
Hi Nik,
I took a fresh netdr capture yesterday and those packets are still coming in from Te9/2.
I have yet to re-apply "no ip redirect" to Te9/2. Is this just needed on Te9/2.1500 or is it also required on the physical Te9/2 as well???
\\ nh14rsw02
interface TenGigabitEthernet9/2
description Trunk interface to core17lsr01-TenGig1/1
no ip address
mls qos trust dscp
service-policy output EGRESS
!
interface TenGigabitEthernet9/2.1500
description Internal Subinterface
encapsulation dot1Q 1500
ip address 10.42.255.198 255.255.255.252
ip access-group test in
ip access-group test out
Thanks.
Andy
03-01-2012 06:37 PM
Hi Andy,
That should be applied to L3 interface so Te9/2.1500 in your case. Regarding why packets are still sent - you may need to have details debugging and examine the config carefully. Why not opening TAC case and have a webex session with Cisco engineer to check that in more depth?!
I guess some elam packets captures can be helpfull here to undertsand core internal logic for sending those packets back.
Nik
03-01-2012 07:21 PM
Hi Nik,
You've been a great help in narrowing down this issue.
I'll apply "no ip redirects" on the L3 interface Te9/2.1500 and see if that reduces the High CPU.
Will also look at lodging a Cisco TAC case.
Not sure how to rate all your replies, but I'll go back and make sure I rate them all.
Cheers.
Andy
03-01-2012 08:25 PM
No worries Andy,
Please just update the case with the later findings and the root cause. Very interesting to see what is the root cause and good for a sake of this thread documentation.
Nik
03-07-2012 02:25 PM
Hi Nik,
I've had a chance to apply "no ip redirects" on the layer 3 interface Te9/2.1500 and this didn't have much of an impact. CPU is still very high even after applying that command.
interface TenGigabitEthernet9/2.1500
description Internal Subinterface
encapsulation dot1Q 1500
ip address 10.42.255.198 255.255.255.252
ip access-group test in
ip access-group test out
no ip redirects
Any more ideas to try or do we hand this off to the TAC?
Thanks.
Andy
03-11-2012 07:38 PM
Hi Andy,
Was away fro few days - just noticed your reply. Can you please do new Netdr capture and attach that full to this thread.
Please also ttach following commands:
show ver
show proc cpu sort
show proc cpu hist
show int ------- taken 3 times
show span det | i exec|of top|from
will check that once again in more details.
Nik
03-19-2012 04:26 PM
Hi Nik,
Sorry I've also been away so lost visibility of this issue as well.
Can I email you the details instead of attaching it on a public forum?
Thanks.
Andy
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide