Cisco 4507R problem

nawas · ‎04-19-2006

I have started to see the problem for couple days where some hosts connected to 4507R stopped responding to ping and I saw huge latency upto 400msec. (In normal cases it is 1 msec becuase they are directly connected hosts. I couldn't find anythiing in the logs or interface errors but show proc cpu history and found that CPU went up to 100% in last 60 sec. Anyone know why it is doing this? Do I need a reboot?

Thanks.

Bobby Thekkekandam · ‎04-19-2006

Hi Nawas,

Where are you pinging from? If you're pinging from the switch, you'll definitely see inconsistent response times, especially if the CPU gets pegged.

If you're pinging from another host connected through the switch, then we shouldn't see any impact on response times unless, for some reason, the ICMP packets are being process switched.

Do you know what processes were spiking?

regards,

Bobby

nawas · ‎04-19-2006

Hi Bobby

I'm pinging from a different host which is directly connected to this swtich and our monitoring system (HpOV also on the same vlan)sends pages/email when it looses pings. The highest process I see on show proc cpu is

26 3609955722129699394 169 9.03% 9.79% 9.92% 0 Cat4k Mgmt HiPri

27 30077935764176720830 720 8.47% 7.57% 7.61% 0 Cat4k Mgmt LoPri

but when I do show proc cpu histroy I see about 99% spike but that doesn't show which process, here is the capture

xec-4507-25#sh proc cpu history

1111122222222221111111111111111111122222333331111122222111

6666600000000007777799999777777777788888111115555511111666

100

90

80

70

60

50

40

30 **********

20 **********************************************************

10 **********************************************************

0....5....1....1....2....2....3....3....4....4....5....5....

0 5 0 5 0 5 0 5 0 5

CPU% per second (last 60 seconds)

4444554444444455444443435343434375534343434352424342429942

6644889955996688224442328852534394512442231368995366779978

100 **

90 **

80 * **

70 * **

60 ** ** * * * * **

50 ** ************ * * * *** * * * * * ***

40 ********************* * *** * * *** * * * * * * * * * *#*

30 ********************************##********************##**

20 ##########################################################

10 ##########################################################

0....5....1....1....2....2....3....3....4....4....5....5....

0 5 0 5 0 5 0 5 0 5

CPU% per minute (last 60 minutes)

* = maximum CPU% # = average CPU%

9977787677655587777777766675556677877787666555777776777766755586677777

9571002931794143410410487627389561511711952155461009150261467110632133

100 **

90 ** *

80 *** * * * * ** * * *

70 *********** ************* ************ *********** * * ******

60 ************ ************** ************** ***************** ********

50 **********************************************************************

40 **********************************************************************

30

Bobby Thekkekandam · ‎04-19-2006

The HiPri and LoPri processes are normal.Your average CPU is also in the normal rage, while the max CPU is clearly hitting 100% within the last 60 minutes. So we can certainly infer that there are occasional spikes.

The questions is whether this truly correlates with the host response time. The first thing we need to determine is whether the spikes are caused by a process or by CPU switched traffic.

You'll need to capture the output of "show proc cpu" and "show platform health" and catch it while the CPU is spiked so that we can determine what's causing it. If it is very intermittent, you may have a hard time catching it, so if you have a way of scripting it with a cron job, that may be easier.

Also, here's a good doc on t-shooting CPU issues on this platform.

http://www.cisco.com/en/US/products/hw/switches/ps663/products_tech_note09186a00804cef15.shtml

HTH,

Bobby

*Please rate helpful posts.