Solved: Re: Catalyst 3560 "IP Input" Process CPU Utilization

jordan.bean · ‎10-04-2010

We just replaced our core network with (2) 3560G's. CORE1 is version 04. CORE2 is version 01. Both are running about 10 VLAN's with HSRP and have 12 EIGRP peers. Both are running auto QoS. We're passing about 50 Mbps through CORE1 and 3 Mbps through CORE2.

CORE1's CPU ranges from 8-12%. CORE2's CPU was showing about 7% this weekend when network utilization was virtually nothing. Now, with 3 Mbps through it, the CPU is at about 25-30%. The IP Input process is at about 10-20%:

CPU utilization for five seconds: 17%/3%; one minute: 21%; five minutes: 26%
PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
190 7080239 17633742 401 7.02% 10.04% 13.88% 0 IP Input

CORE2#show ip int | i CEF

Shows the following for each interface:

IP CEF switching is enabled
IP CEF switching turbo vector
IP route-cache flags are Fast, CEF

There is no EIGRP reconvergence occuring - the queues are 0.

Could this difference be due to the different versions of the switches and is this cause for concern? I find it odd that CORE1's CPU utilization is minimal at ~8% under peak load and CORE2's (which is essentially idle) CPU utilization is ~20-30%.

Rodney Dunn · ‎10-05-2010

Just enter the command "clear ip traffic" and hit Enter. No idea why that one was hidden.

HSRP should not drive the ARP numbers really.

If all the traffic coming through the core sees one of the routers as the best path towards the destination subnet it would arp. But remember the ARP's are only for directly connected ip's.

Did you confirm at what rate the 'sh ip traffic' counters were going up and which ones were increasing the most correlating to the higher

CPU?

I agree...get someone from TAC on the box to take a closer look on the box with you.

Post the resolution back to the thread once it's figured out.

View solution in original post

Rodney Dunn · ‎10-04-2010

If you look at "show interface stat" can you identify which interface/VLAN is causing so many of the process level packets. It's packets going out of hardware to process level that drives the "IP Input" process up.

You may be able to catch and dump a few of them via "show buffers input-interface packet" once you identify why interface it is.

Then you have to do an analysis of those packets to see why they are not being hardware switched through the device.

It could be things such as "TTL expired, packets to an ip address on the box, packets with options set, etc...".

jordan.bean · ‎10-04-2010

Okay, from what I can see from 'show inter stat', there are 2 L3 ports that connect to our voice service routers that handle PRI/CONF/XCODE/MTP. The counters appear to be incrementing quicker on the CORE2 than CORE1, so I believe the path on CORE2 is being chosen over the path on CORE1. I assume the voice packets fall under the process switched "IP Packets with Options"? I'm seeing about 300pps of voice traffic through that interface. I'm now separately graphing PRI/CONF/XCODE/MTP utilization so I can compare those graphs to the CORE2 CPU utilization to see if there's a correlation.

Any suggestions on how to improve performance or should I leave as-is? I supposed I could disable QoS going to the voice service routers since only voice traffic hits those links, but it would be better to leave things as-is for now since 20% CPU utilization is acceptable.

Rodney Dunn · ‎10-04-2010

Unless you have some features configured that need to look in the payload those voice packets should be hardware switched through the device.

Did you try the "show buffers input-interface packet" on the interfaces that show high numbers of "Process Switched" output in "show interface stat" to see if you can find out what type of packets they are?

jordan.bean · ‎10-04-2010

So on both 3560's:

0/13 = Voice Service Router 1 (PRI/XCODE/MTP/CONF)

0/14 = Voice Service Router 2 (PRI/XCODE/MTP/CONF)

CORE1:

GigabitEthernet0/13 is up, line protocol is up (connected)
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0
GigabitEthernet0/14 is up, line protocol is up (connected)
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0

CORE2:

GigabitEthernet0/13 is up, line protocol is up (connected)
Input queue: 44/75/0/0 (size/max/drops/flushes); Total output drops: 0
GigabitEthernet0/14 is up, line protocol is up (connected)
Input queue: 19/75/0/0 (size/max/drops/flushes); Total output drops: 0

The output from 'show buffers input-interface gig 0/13 packet' shows voice packets from the voice service router to various voice endpoints (phones, SIP gateway, etc.)

Gig 0/13 is configured as follows:

interface GigabitEthernet0/13
description TS-VSVC1-Gig0/1
no switchport
ip address 64.27.32.13 255.255.255.252
no ip redirects
no ip unreachables
ip pim sparse-dense-mode
ip igmp version 3
ip cgmp
load-interval 30
speed 1000
duplex full
srr-queue bandwidth share 1 30 35 5
queue-set 2
priority-queue out
mls qos trust dscp
auto qos trust
end

jordan.bean · ‎10-04-2010

And I just noticed no QoS configured on the router side:

Voice Service Router 1:

interface GigabitEthernet0/1
description To TS-CORE2
ip address 64.27.32.14 255.255.255.252
no ip redirects
no ip unreachables
ip pim sparse-dense-mode
ip cgmp
load-interval 30
duplex full
speed 1000
end

And here is a packet in the buffer on CORE2:

CORE2#show buffers input-interface gig 0/13 packet

Buffer information for RxQ7 buffer at 0x4400A10
data_area 0x6A8A670, refcount 1, next 0x5253DE8, flags 0x200
linktype 7 (IP), enctype 1 (ARPA), encsize 14, rxtype 1
if_input 0x47F83B8 (GigabitEthernet0/13), if_output 0x0 (None)
inputtime 2d16h (elapsed 00:00:31.516)
outputtime 00:00:00.000 (elapsed never), oqnumber 65535
datagramstart 0x6A8A6B6, datagramsize 214, maximum size 2196
mac_start 0x6A8A6B6, addr_start 0x6A8A6B6, info_start 0x0
network_start 0x6A8A6C4, transport_start 0x6A8A6D8, caller_pc 0x185E23C

source: 10.14.10.131, destination: 10.10.10.24, id: 0x4B96, ttl: 254,
TOS: 184 prot: 17, source port 17186, destination port 17204

       0: 0012DAD9 2DCA0013 7F1021B1 080045B8 ..ZY-J....!1..E8
      16: 00C84B96 0000FE11 47240A0E 0A830A0A .HK...~.G$......
      32: 0A184322 433400B4 00008000 3FB9BF3A ..C"C4.4....?9?:
      48: 69482297 064BFF7E 7B7E7C7D 7D7C7F7E iH"..K.~{~|}}|.~
      64: FF7F7FFD 7FFEFEFD FC7CFFFE 7D7D7CFE ...}.~~}||.~}}|~
      80: 7E7B7F7D 7E7F7EFD 7FFE7E7B FD7F7C7E ~{.}~.~}.~~{}.|~
      96: 7F7D7AFF 7E79FD7F 7CFCFFFE FE7EF97E .}z.~y}.||.~~~y~
     112: 7DFB777F 7E79FC79 7CFD7BFC 7D7CF97C }{w.~y|y|}{|}|y|
     128: 7DFB7DFE FD7A7EFE 7D7F7C7E FF7C7CFE }{}~}z~~}.|~.||~
     144: FC7E7CFF FC7CFDFC 79FAFF79 FB7A7E7D |~|.||}|yz.y{z~}
     160: 79FA777C FA7AF97F 7CF77BFF FC7DFB77 yzw|zzy.|w{.|}{w
     176: 7CFC79FE 7B7BFD7A 7DFEFBFD 7AFBFA7D ||y~{{}z}~{}z{z}
     192: 7BFEFE7C FE7C7CFE 7DFF7CFF FA7BFF7D {~~|~||~}.|.z{.}
     208: 7AF97C7B FE7B00                      zy|{~{.

Rodney Dunn · ‎10-04-2010

What does 'sh ip cef

10.10.10.24 detail' say?

If you have a valid CEF rewrite for that ip address and you see a lot of those frames on the input queue you probably need to have TAC take a closer look at the hardware programming.

The frame appears to be a small UDP frame and QOS should not impact it on ingress for Gig 0/13.

What interface does the 10.10.10.24 destination go through and what is the configuration on it?

Could you capture 'sh ip cef switching stat' to see if it gives an indication of punt reason for any traffic?

jordan.bean · ‎10-04-2010

10.10.10.24 is the destination IP in the output I included, but the destination changes based on the call, device, etc. The source IP of 10.14.10.131 is consistent across the buffered packets and is what is connected to the interfaces that have their input queues non-zero. So to me it seems that the packets from the voice service router may have IP options or something else set that's causing the 3560 to process switch those packets. I noticed that on the voice service routers the only configuration relating to QoS is:

sccp ip precedence 3

I may also remove that line and then add 'auto qos voip' (which should default to untrust) on the interfaces to CORE1 and CORE2.

Here is the output you requested:

CORE2#show ip cef 10.10.10.24 detail
10.10.10.24/32, epoch 2, flags attached
Adj source: IP adj out of Vlan2, addr 10.10.10.24 049461A0
Dependent covered prefix type adjfib cover 10.10.0.0/16
attached to Vlan2

Okay, so on our CORE2 we have:

interface Vlan2
description Voice/CallManager VLAN
ip address 10.10.10.8 255.255.0.0
ip helper-address 10.10.10.10
ip helper-address 10.10.10.14
no ip redirects
no ip unreachables
ip pim sparse-dense-mode
ip igmp version 3
ip cgmp
no ip mroute-cache
load-interval 30
standby 0 ip 10.10.10.1
standby 0 preempt
end

On CORE1 we have:

interface Vlan2
description Voice/CallManager VLAN
ip address 10.10.10.90 255.255.0.0 secondary
ip address 10.10.10.7 255.255.0.0
ip helper-address 10.10.10.10
ip helper-address 10.10.10.14
no ip redirects
no ip unreachables
ip pim sparse-dense-mode
no ip route-cache cef
no ip route-cache
ip cgmp
no ip mroute-cache
load-interval 30
standby 0 ip 10.10.10.1
standby 0 priority 105
standby 0 preempt
end

I'm not sure why the bold statements are there. I inherited this config and haven't changed them. I think I'll remove them after-hours.

CORE2#show ip cef switching stat

       Reason                          Drop       Punt Punt2Host
RP LES Neighbor resolution req           37          0          0
RP LES Total                             37          0          0

All Total 37 0 0

Rodney Dunn · ‎10-04-2010

I'm having a hard time following your topology. Draw it out and show the source and destination and routers along the path.

You should never turn off CEF so do turn that back on.

If the problem still exist after that post the topology.

Rodney

jordan.bean · ‎10-04-2010

CORE1---------------------------------- CORE2

gig0/13 gig0/13

| |

---gig 0/0 -VOICERTR- gig 0/1----

gig 0/13 are both CORE's are routed interfaces. gig 0/13 on both CORE's show a non-zero input queue as traffic increases. So, it appears that traffic from the voice router is being punted. The voice router is routing packets that it sources to CORE2. This is why we're seeing a higher input queue depth and higher CPU on CORE2.

The voice router does not have QoS applied to gig0/0 or gig0/1 (though it should). The only QoS, etc. related entry on this router is:

sccp ip precedence 3

Rodney Dunn · ‎10-04-2010

You said they are routed interfaces but posted the VLAN configuration which is what confused me.

jordan.bean · ‎10-04-2010

Okay, I ran some additional tests this evening since traffic was low. With no traffic to the voice router connected to CORE2, you can see everything is idle:

CORE2: show proc cpu sort

CPU utilization for five seconds: 6%/0%; one minute: 21%; five minutes: 18%
PID Runtime(ms)   Invoked      uSecs   5Sec   1Min   5Min TTY Process
197       26737      2452      10904 0.47% 0.28% 0.33%   1 Virtual Exec
132      346717   9696901         35 0.15% 0.08% 0.07%   0 Hulc LED Process
204     2802256   5255833        533 0.15% 0.17% 0.24%   0 Spanning Tree
141      136523    104736       1303 0.15% 0.05% 0.04%   0 HRPC qos request

GigabitEthernet0/13 is up, line protocol is up (connected)
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0

Then, I setup a conference and enabled packet debugging. With 1 conference going:

CPU utilization for five seconds: 10%/1%; one minute: 28%; five minutes: 18%
PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
190 9787979 25177958 388 4.31% 16.49% 8.70% 0 IP Input

GigabitEthernet0/13 is up, line protocol is up (connected)
Input queue: 2/75/0/0 (size/max/drops/flushes); Total output drops: 0

So, it's obvious that traffic through this port is causing the input queue depth to increase and also the CPU utilization to increase. (I'm sure the packet debug increases it as well, but the CPU increase along with the input queue increasing are the same symptoms as earlier.)

So, I enabled 'debug ip packet' with an ACL matching the IP of the router connected to Gig0/13 and saw a lot of the following:

Oct 4 22:46:54: IP: tableid=0, s=10.14.10.132 (GigabitEthernet0/14), d=10.14.10.131 (GigabitEthernet0/13), routed via FIB
Oct 4 22:46:54: IP: s=10.14.10.132 (GigabitEthernet0/14), d=10.14.10.131 (GigabitEthernet0/13), len 200, output feature, Check hwidb(72), rtype 1, forus FALSE, sendself FALSE, mtu 0, fwdchk FALSE
Oct 4 22:46:54: IP: s=10.14.10.132 (GigabitEthernet0/14), d=10.14.10.131 (GigabitEthernet0/13), g=64.27.32.14, len 200, forward
Oct 4 22:46:54: IP: s=10.14.10.132 (GigabitEthernet0/14), d=10.14.10.131 (GigabitEthernet0/13), len 200, sending full packet
Oct 4 22:46:54: IP: s=10.14.10.131 (GigabitEthernet0/13), d=10.14.10.132, len 200, input feature, MCI Check(63), rtype 0, forus FALSE, sendself FALSE, mtu 0, fwdchk FALSE
Oct 4 22:46:54: IP: tableid=0, s=10.14.10.131 (GigabitEthernet0/13), d=10.14.10.132 (GigabitEthernet0/14), routed via FIB
Oct 4 22:46:54: IP: s=10.14.10.131 (GigabitEthernet0/13), d=10.14.10.132 (GigabitEthernet0/14), len 200, output feature, Check hwidb(72), rtype 1, forus FALSE, sendself FALSE, mtu 0, fwdchk FALSE
Oct 4 22:46:54: IP: s=10.14.10.131 (GigabitEthernet0/13), d=10.14.10.132 (GigabitEthernet0/14), g=64.27.32.22, len 200, forward
Oct 4 22:46:54: IP: s=10.14.10.131 (GigabitEthernet0/13), d=10.14.10.132 (GigabitEthernet0/14), len 200, sending full packet
Oct 4 22:46:54: IP: s=10.14.10.131 (GigabitEthernet0/13), d=10.14.10.132, len 200, input feature, MCI Check(63), rtype 0, forus FALSE, sendself FALSE, mtu 0, fwdchk FALSE
Oct 4 22:46:54: IP: tableid=0, s=10.14.10.131 (GigabitEthernet0/13), d=10.14.10.132 (GigabitEthernet0/14), routed via FIB
Oct 4 22:46:54: IP: s=10.14.10.131 (GigabitEthernet0/13), d=10.14.10.132 (GigabitEthernet0/14), len 200, output feature, Check hwidb(72), rtype 1, forus FALSE, sendself FALSE, mtu 0, fwdchk FALSE
Oct 4 22:46:54: IP: s=10.14.10.131 (GigabitEthernet0/13), d=10.14.10.132 (GigabitEthernet0/14), g=64.27.32.22, len 200, forward
Oct 4 22:46:54: IP: s=10.14.10.131 (GigabitEthernet0/13), d=10.14.10.132 (GigabitEthernet0/14), len 200, sending full packet
Oct 4 22:46:54: IP: s=10.14.10.132 (GigabitEthernet0/14), d=10.14.10.131, len 200, input feature, MCI Check(63), rtype 0, forus FALSE, sendself FALSE, mtu 0, fwdchk FALSE

So, "routed via FIB" means the packets are CEF switched, correct? I'm stumped. Could this be a bug?

Rodney Dunn · ‎10-05-2010

Can you post the configuration of the gig 0/13 and 0/14 interfaces?

Are they L3 ports or L2 ports inside of a VLAN?

I ask because " s=10.14.10.132 (GigabitEthernet0/14), d=10.14.10.131 (GigabitEthernet0/13)"

would be an ip packet coming in gig 0/14 that is having to be CEF forwarded out gig 0/13.

If they are on the same L3 interface an ICMP redirect would be sent. I wonder if that is the forward scenario and

even if you have the ip redirects turned off if the hardware isn't recognizing that and still punting to try and get the

redirect sent.

jordan.bean · ‎10-05-2010

I think what happened here was that one call was terminated on an MTP on the router on gig0/14 and then a second call on gig0/13. I conferenced the calls together to make sure we'd see activity. So, that is why you seeing traffic between gig0/13 and gig0/14. The same issue occurs for traffing coming into gig 0/13 or gig0/14 destined elsewhere.

I'm thinking about putting a sniffer on gig0/13 today to capture some of the UDP packets in question and to see if they have any options set, etc. I'm not sure why they would because the config on the voice routers (2811's) isn't that complex.

The config is as follows:

CORE2:

interface GigabitEthernet0/13
description TS-VSVC1-Gig0/1
no switchport
ip address x.x.x.x 255.255.255.252
no ip redirects
no ip unreachables
ip pim sparse-dense-mode
ip igmp version 3
ip cgmp
load-interval 30
speed 1000
duplex full
srr-queue bandwidth share 1 30 35 5
queue-set 2
priority-queue out
mls qos trust dscp
auto qos trust
end

CORE2:

interface GigabitEthernet0/14
description TS-VSVC2-Gig0/1
no switchport
ip address x.x.x.x 255.255.255.252
no ip redirects
no ip unreachables
ip pim sparse-dense-mode
ip igmp version 3
ip cgmp
load-interval 30
speed 1000
duplex full
srr-queue bandwidth share 1 30 35 5
queue-set 2
priority-queue out
mls qos trust dscp
auto qos trust
end

VSVC1:

interface GigabitEthernet0/1
description To TS-CORE2
ip address x.x.x.x 255.255.255.252
no ip redirects
no ip unreachables
ip pim sparse-dense-mode
ip cgmp
load-interval 30
duplex full
speed 1000
end

Rodney Dunn · ‎10-05-2010

The next step would be to dig in to the hardware forwarding entries for those destinations on the 3560 to figure out why they are being punted. I don't see any of the obvious reasons there. I would suggest opening a TAC SR to take a deeper look.