You might not have the option

Emmanuel Saldana · ‎09-21-2016

Help, I am having trouble with my 1941 Cisco router.

We are experiencing intermittent ping timeouts, I suspected the CPU utilization but it shows around 20%/19% but when sorted the only highest value I see was IP input with only 0.80% or more. I tried troubleshooting the cause by inserting IP CEF, the problem was solved for a while but stll the utilization goes up around 33% and more and also the ping test goes timeout always.

The router was used for the internet service and DHCP server for a hospitality industry. I included some of the show commands if you could please review them.

I also adjusted the tcp using ip tcp adjust-mss 1400.. for I thought this would help.

Mark Malone · ‎09-21-2016

Hi looking at your wan interface the buffers are being flooded causing delays and drops too much traffic overwhelming it , cpu looks fine but if you post the show proc cpu history to be sure

Were these pause counters increasing during the issue , if your buffer fills and traffic cant go anywhere it can be the cause of your issue

  15 lost carrier, 0 no carrier, 1822 pause output

Emmanuel Saldana · ‎09-21-2016

Hi, here's the history..

RTR#show processes cpu history

RTR 09:05:39 AM Wednesday Sep 21 2016 UTC

111111111222221111111111111111111111111 1111111111
222277777000002222233333111110000011111888880000022222999999
100
90
80
70
60
50
40
30
20 **********
10 ************************************************************
0....5....1....1....2....2....3....3....4....4....5....5....6
0 5 0 5 0 5 0 5 0 5 0
CPU% per second (last 60 seconds)

221267112323222232222221222232422222222222322322222112312222
065086751843422028573507302705495795474446039076859881156048
100
90
80 *
70 **
60 **
50 **
40 ** * *
30 * ## * * #*** * ***#***** * ** ******* * * *
20 ***###***######*#######*#*#########################***#*####
10 ############################################################
0....5....1....1....2....2....3....3....4....4....5....5....6
0 5 0 5 0 5 0 5 0 5 0
CPU% per minute (last 60 minutes)
* = maximum CPU% # = average CPU%

447999445333455555444344335459999887779999899989889897788966888999889988
451219668677041003944610376529326703593567936211251396726076321045992530
100 * * * *** * * * *
90 *** ***** **#***#* * ** * ** *******
80 *** ****** ***###*##************* ************
70 **** ####******######*******************##**#***
60 **** * * ####*****#########*****************######**
50 **###*** ****** ***#####**############*#########**#**#######*#
40 ***###****************** ****###########################################
30 ***######****######*#*******############################################
20 ########################################################################
10 ########################################################################
0....5....1....1....2....2....3....3....4....4....5....5....6....6....7..
0 5 0 5 0 5 0 5 0 5 0 5 0
CPU% per hour (last 72 hours)
* = maximum CPU% # = average CPU%

Mark Malone · ‎09-21-2016

At the time the cpu was running that high at 100% it could have caused the pauses and dropped packets as router was under pressure

Only way to find out exactly is capture it in real time and see what the cause was , cpu must be caught as its happening which can be difficult to do

Use this script below and let it run when it hits 75% or more in cpu it will start to collect logs to see which process is the cause and send them to flash , that way you will capture the real time cpu problems

.................................................................

event manager applet High_CPU

    event snmp oid 1.3.6.1.4.1.9.9.109.1.1.1.1.4.1 get-type exact entry-op ge entry-val "75" exit-time 10 poll-interval 5

    action 0.1 syslog msg "CPU Utilization is high"

    action 0.2 cli command "enable"

    action 0.4 cli command "show log | append flash:CPU_Profile.txt"

    action 0.5 cli command "show process cpu sorted | append flash:CPU_Profile.txt"

    action 0.6 cli command "show interfaces | append flash:CPU_Profile.txt"

    action 0.7 cli command " show ip cef switching stat | append flash:CPU_Profile.txt"

    action 0.8 cli command " show ip traffic | append flash:CPU_Profile.txt"

    action 0.9 cli command " show int switching | append flash:CPU_Profile.txt"

    action 1.0 cli command "no event manager applet High_CPU"

    action 1.1 cli command "end"

Joseph W. Doherty · ‎09-21-2016

Disclaimer

The Author of this posting offers the information contained within this posting without consideration and with the reader's understanding that there's no implied or expressed suitability or fitness for any purpose. Information provided is for informational purposes only and should not be construed as rendering professional advice of any kind. Usage of this posting's information is solely at reader's own risk.

Liability Disclaimer

In no event shall Author be liable for any damages wha2tsoever (including, without limitation, damages for loss of use, data or profit) arising out of the use or inability to use the posting's information even if Author has been advised of the possibility of such damage.

Posting

As Mark has already noted, both your gig interfaces show ingress queue congestion. As both your gig interfaces are connected at gig, I suspect gig bursts are overrunning the router's capacity to process such a burst.

I recall some Cisco whitepaper that described it's often relatively safe to increase the ingress queue, even to max, to avoid dropping packets while the CPU tries to process the burst.

You might also consider running those gig interfaces at 100, or even 10, Mbps, to slow how fast packets can hit the router. (BTW, Cisco recommends a 1941 for only up to 25 Mbps.)

Emmanuel Saldana · ‎09-22-2016

Hi, can you please give me any idea what to do here?

I tried lowering the speed to 100Mbps but still the pings are having some timeouts and also with the process like these, how can I tell if this is high or low in utilization?

And also, does the router CPU utilization is high due to interrupts?

RTR#show process cpu sorted
CPU utilization for five seconds: 23%/21%; one minute: 24%; five minutes: 24%
PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
162 239022980 601579552 397 0.63% 0.66% 0.65% 0 IP Input
32 1469600 7414926 198 0.15% 0.11% 0.09% 0 ARP Input
158 96836 25113044 3 0.07% 0.07% 0.07% 0 IPAM Manager
316 814460 1897301 429 0.07% 0.09% 0.09% 0 IP NAT Ager
194 66869888 941727 71008 0.07% 0.05% 0.05% 0 ADJ background
113 56620 1727166 32 0.07% 0.01% 0.00% 0 BPSM stat Proces
116 115328 12853769 8 0.07% 0.07% 0.07% 0 VRRS Main thread
322 6584 52998 124 0.07% 0.00% 0.00% 0 mdns Timer Proce
334 179820 809876 222 0.07% 0.03% 0.02% 0 CFT Timer Proces
242 203104 1047703 193 0.07% 0.03% 0.00% 0 DHCPD Receive
10 0 1 0 0.00% 0.00% 0.00% 0 License Client N

RTR 06:49:52 AM Friday Sep 23 2016 UTC

112222211111111111111111111111112222222222333332222222222222
993333399999777777777799999777770000033333222229999955555222
100
90
80
70
60
50
40
30 ***************
20 ************************************************************
10 ************************************************************
0....5....1....1....2....2....3....3....4....4....5....5....6
0 5 0 5 0 5 0 5 0 5 0
CPU% per second (last 60 seconds)

322223222323323333333333334553333333333333333433333333222333
255480768050492554657544448524633223542434438182220221429234
100
90
80
70
60 *
50 ***
40 ** **** *#* * * ***
30 *** *#***#*##*######################################## *###
20 ############################################################
10 ############################################################
0....5....1....1....2....2....3....3....4....4....5....5....6
0 5 0 5 0 5 0 5 0 5 0
CPU% per minute (last 60 minutes)
* = maximum CPU% # = average CPU%

542434343444555534433443333333223232434434433744799944533345555544434433
536752768611051151224410155044694927212173066645121966867704100394461037
100 *
90 ***
80 * ***
70 * ****
60 * * * **** *
50 * * * * **** * **###*** ******
40 ** **************** ** ** * ***********###****************** *
30 #***********####*****#****************#**********######****######*#*****
20 ##**###############**##*#####***#***####################################
10 ########################################################################
0....5....1....1....2....2....3....3....4....4....5....5....6....6....7..
0 5 0 5 0 5 0 5 0 5 0 5 0
CPU% per hour (last 72 hours)
* = maximum CPU% # = average CPU%

Mark Malone · ‎09-23-2016

Hi

what does the wan interface currently show can you post it please

Your getting drops to multiple destinations not just 1 correct ?

Emmanuel Saldana · ‎09-23-2016

Hi, the interface gig0/0 was the WAN Interface and the gig0/1 and its sub-interfaces are for the internal LAN for WIFI and Data use.

Mark Malone · ‎09-23-2016

You need to reduce the amount of data going through the wan port or get a better performing router

This counter has doubled since your original post showing that the router is under serious pressure to handle the traffic its being oversubscribed , you can try disable flow control on the interface see if that helps

Enabling QOS ion an oversubscribed port like that may not help

QoS cannot operate properly if a switch sends PAUSE frames, because this slows all of that ports traffic,including any traffic which may have high priority.

Pause Frames

When the receive part (Rx) of the port has its Rx FIFO queue filled and reaches the high water mark, the transmit part (Tx) of the port starts to generate pause frames. The remote device is expected to stop / reduce the transmission of packets for the interval time mentioned in the pause frame. If the Rx is able to clear the Rx queue or reach low water mark within this interval, Tx sends out a special pause frame that mentions the interval as zero (0x0). This enables the remote device to start to transmit packets. If the Rx still works on the queue, once the interval time expires, the Tx sends a new pause frame again with a new interval value.

19 lost carrier, 0 no carrier, 13389 pause output

EDIT: if you can see locally whats generating the large volumes of traffic , you could reduce it in bandwidth through qos by matching the ip/subnet with an acl, on the wan interface that may take the pressure of the interface and reduce the pause frames

Emmanuel Saldana · ‎09-23-2016

thanks but can you give me the command so that I could try this?

Mark Malone · ‎09-23-2016

You might not have the option looking at the docs but check in interface mode , is the option flowcontrol available ,

Use the flowcontrol interface configuration command to set the receive flow-control state for an interface. When flow control send is operable and on for a device and it detects any congestion at its end, it notifies the link partner or the remote device of the congestion by sending a pause frame. When flow control receive is on for a device and it receives a pause frame, it stops sending any data packets. This prevents any loss of data packets during the congestion period.

Use the receive off keywords to disable flow control.

flowcontrol receive {desired | off | on}

Note The switch can receive, but not send, pause frames.

Syntax Description


receive	Set whether the interface can receive flow-control packets from a remote device.
desired	Allow an interface to operate with an attached device that is required to send flow-control packets or with an attached device that is not required to but can send flow-control packets.
off	Turn off the ability of an attached device to send flow-control packets to an interface.
on	Allow an interface to operate with an attached device that is required to send flow-control packets or with an attached device that is not required to but can send flow-control packets.

Emmanuel Saldana · ‎09-23-2016

I'm sorry but I already tried to look for that command on the 1941 router earlier but all I see is "flow-sampler WORD" command was shown for the router's interface.

rasmus.elmholt · ‎09-24-2016

Hi Joseph

Do you have a reference for the 25mbit recommendation?

Ping Timeouts and High CPU Utilization

Syntax Description