Solved: 100% CPU Usage on 887VA when network traffic is heavy

Ian Stephens · ‎09-04-2012

We have a problem with 100% CPU usage and a small packet loss when the router can't keep up at full speed (100Mb/s) NAT.

We are not using any inspect commands, so there are no overheads there.

Why is the router slowing down and grinding to a halt?

We are running a basic NAT and our ISP has provided us 100Mb/s VDSL connection. It's when we hit these high speeds that the router CPU usage hits 100% and we experience packet loss when pinging for example (intermittent no replies... etc).

Below is our running config and process information.

Your thoughts, fixes, comments and suggestions are greatly appreciated.

show proc cpu sort

r1.xxx.xxxx.com#show proc cpu sort

CPU utilization for five seconds: 96%/96%; one minute: 96%; five minutes: 96%

PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process

98 340532 13421485 25 1.89% 1.28% 1.21% 0 Ethernet Msec Ti

2 35928 21401 1678 1.34% 1.04% 1.03% 0 Load Meter

92 2372784 528420 4490 0.63% 0.83% 0.96% 0 COLLECT STAT COU

146 1192 16749 71 0.47% 0.06% 0.01% 0 TCP Timer

289 93284 3293892 28 0.39% 0.25% 0.24% 0 PPP Events

281 18392 828131 22 0.23% 0.07% 0.06% 0 PPPoE Background

115 156872 146421 1071 0.23% 0.19% 0.15% 0 IP Input

288 134128 3293930 40 0.23% 0.41% 0.42% 0 PPP manager

97 23132 775596 29 0.15% 0.06% 0.07% 0 Ethernet Timer C

111 74836 3279452 22 0.15% 0.24% 0.23% 0 IPAM Manager

63 69968 555090 126 0.15% 0.23% 0.23% 0 LED Timers

283 17560 209076 83 0.07% 0.03% 0.05% 0 IP NAT Ager

274 7452 21461 347 0.07% 0.03% 0.00% 0 Compute load avg

188 7740 207739 37 0.07% 0.02% 0.00% 0 Inspect process

68 4680 106699 43 0.07% 0.01% 0.00% 0 Console redirect

32 7896 111262 70 0.07% 0.03% 0.00% 0 ARP Background

17 4212 104127 40 0.07% 0.02% 0.00% 0 IPC Periodic Tim

25 1380 21372 64 0.07% 0.00% 0.00% 0 IPC Loadometer

56 6512 54449 119 0.07% 0.02% 0.00% 0 Fast Throttle Ti

244 2860 189 15132 0.07% 0.14% 0.03% 8 Virtual Exec

AND MORE... but I omitted it because I was getting the message "This message can not be displayed due to its content. Please use the Contact Us link with any questions"...

Our Running Config

version 15.1

no service pad

service timestamps debug datetime msec

service timestamps log datetime msec

no service password-encryption

!

hostname r1.essex.xxxx.xxx

!

boot-start-marker

boot system flash c880data-universalk9-mz.151-4.M3.bin

boot-end-marker

!

no logging buffered

enable secret 5 xxxxxx

enable password xxxxxx

!

no aaa new-model

memory-size iomem 10

no ip source-route

!

ip dhcp excluded-address 192.168.0.1

ip dhcp excluded-address 192.168.0.50 192.168.0.255

!

ip dhcp pool NET-POOL

network 192.168.0.0 255.255.255.0

default-router 192.168.0.1

dns-server 8.8.8.8 8.8.4.4

!

ip cef

ip name-server 8.8.8.8

ip name-server 8.8.4.4

no ipv6 cef

!

controller VDSL 0

!

no ip ftp passive

!

interface Ethernet0

no ip address

!

interface Ethernet0.101

encapsulation dot1Q 101

pppoe-client dial-pool-number 1

!

interface ATM0

no ip address

shutdown

no atm ilmi-keepalive

!

interface FastEthernet0

no ip address

!

interface FastEthernet1

no ip address

shutdown

!

interface FastEthernet2

no ip address

shutdown

!

interface FastEthernet3

no ip address

shutdown

!

interface Vlan1

ip address 192.168.0.1 255.255.255.0

ip nat inside

ip virtual-reassembly in

ip tcp adjust-mss 1452

!

interface Dialer0

ip address 81.138.131.190 255.255.255.248

no ip redirects

no ip unreachables

no ip proxy-arp

ip mtu 1492

ip nat outside

ip virtual-reassembly in

encapsulation ppp

dialer pool 1

ppp authentication chap callin

ppp chap hostname xxxxxxxx

ppp chap password 0 xxxxxxxxx

ppp ipcp route default

no cdp enable

!

ip forward-protocol nd

no ip http server

ip http secure-server

!

ip nat inside source list 101 interface Dialer0 overload

ip nat inside source static 192.168.0.250 xxx.xxx.xxx.xxx

!

access-list 101 permit ip any any

!

Joseph W. Doherty · ‎09-04-2012

Disclaimer

The Author of this posting offers the information contained within this posting without consideration and with the reader's understanding that there's no implied or expressed suitability or fitness for any purpose. Information provided is for informational purposes only and should not be construed as rendering professional advice of any kind. Usage of this posting's information is solely at reader's own risk.

Liability Disclaimer

In no event shall Author be liable for any damages whatsoever (including, without limitation, damages for loss of use, data or profit) arising out of the use or inability to use the posting's information even if Author has been advised of the possibility of such damage.

Posting

"Software" based routers use their main CPU for everything including forwarding packets. When you push enough traffic toward them, their CPU will max out, which seems to be your case from both your CPU stats and the volume of traffic you describe. An 880 series router is rated at 50 Kpps or about 25 Mbps (unidirectional). That's for minimal sized packet so higher throughput is possible with larger packets (often the norm). Cisco notes max transfer rate (for 1500 byte packets) at about 200 Mbps. They also recommend the 880 for WAN links up to 8 Mbps (duplex).

Your configuration looks pretty "clean", so your only real solution would be a "faster" device.

View solution in original post

paolo bevilacqua · ‎09-09-2012

The CPU hits the 100% mark when we are pushing around 90Mb/s inbound from the Dialer0 to the Vlan1 (downloading a file for example). There are not many NAT clients behind the network yet, so it's purely throughput not bloating of the NAT translation table. I think even when we hit 100% CPU, the translation table only has 100 entries.

That is normal, and consistent or exceeding the performances tested by Cisco. See attachment, NAT testing.

With such a fast circuit, you will need a faster router.

View solution in original post

Joseph W. Doherty · ‎09-04-2012

Disclaimer

The Author of this posting offers the information contained within this posting without consideration and with the reader's understanding that there's no implied or expressed suitability or fitness for any purpose. Information provided is for informational purposes only and should not be construed as rendering professional advice of any kind. Usage of this posting's information is solely at reader's own risk.

Liability Disclaimer

In no event shall Author be liable for any damages whatsoever (including, without limitation, damages for loss of use, data or profit) arising out of the use or inability to use the posting's information even if Author has been advised of the possibility of such damage.

Posting

"Software" based routers use their main CPU for everything including forwarding packets. When you push enough traffic toward them, their CPU will max out, which seems to be your case from both your CPU stats and the volume of traffic you describe. An 880 series router is rated at 50 Kpps or about 25 Mbps (unidirectional). That's for minimal sized packet so higher throughput is possible with larger packets (often the norm). Cisco notes max transfer rate (for 1500 byte packets) at about 200 Mbps. They also recommend the 880 for WAN links up to 8 Mbps (duplex).

Your configuration looks pretty "clean", so your only real solution would be a "faster" device.

Ian Stephens · ‎09-05-2012

Thank you for your reply Josepth.

That's what I thought too - I thought we had reached the limit of what the router could handle.

However, dropping packets is a bit of an issue, I didn't expect that. Are there any tweaks we could put in place to stop packet loss?

Thanks again.

Joseph W. Doherty · ‎09-05-2012

Disclaimer

The Author of this posting offers the information contained within this posting without consideration and with the reader's understanding that there's no implied or expressed suitability or fitness for any purpose. Information provided is for informational purposes only and should not be construed as rendering professional advice of any kind. Usage of this posting's information is solely at reader's own risk.

Liability Disclaimer

In no event shall Author be liable for any damages whatsoever (including, without limitation, damages for loss of use, data or profit) arising out of the use or inability to use the posting's information even if Author has been advised of the possibility of such damage.

Posting

However, dropping packets is a bit of an issue, I didn't expect that.  Are there any tweaks we could put in place to stop packet loss?

Yes and no. As your configuration is "clean", you've not "wasting" CPU.

What you could consider, ingress shapers to selectively drop some traffic. Right now your drops are "random", but if you're going to have drops, one could argue it's better to be selective about them. For example, you might drop packets from TCP flows, which will both recover from the drops and slow their transmission rate. This would help preclude drops from traffic that doesn't deal well with packet loss.

Of course, this does add additional load to your processing, so your overall throughput is likely to be even less, yet it might seem better to your users.

Ian Stephens · ‎09-05-2012

Also, I forgot to mention Joseph - when we disable CEF, traffic slows right down to just 20Mb/s - that's all we can push through it. However, with CEF enabled we can push almost 100Mb/s - is this normal?

Joseph W. Doherty · ‎09-05-2012

Disclaimer

The Author of this posting offers the information contained within this posting without consideration and with the reader's understanding that there's no implied or expressed suitability or fitness for any purpose. Information provided is for informational purposes only and should not be construed as rendering professional advice of any kind. Usage of this posting's information is solely at reader's own risk.

Liability Disclaimer

In no event shall Author be liable for any damages whatsoever (including, without limitation, damages for loss of use, data or profit) arising out of the use or inability to use the posting's information even if Author has been advised of the possibility of such damage.

Posting

Also, I forgot to mention Joseph - when we disable CEF, traffic slows right down to just 20Mb/s - that's all we can push through it.  However, with CEF enabled we can push almost 100Mb/s - is this normal?

Can't say whether that much of a delta is normal, but CEF is Cisco's premiere technology, designed to increase L3 forwarding performance. So a drop in forwarding performance should be expected, again though, just the delta is the open issue (NB: your mileage may vary).

Ian Stephens · ‎09-05-2012

Thanks for the information Joseph. I have just looked at our Dialer interface and noticed that "Stateful Inspection" is present:

Dialer0 is up (if_number 12)

Corresponding hwidb fast_if_number 12

Corresponding hwidb firstsw->if_number 12

Internet address is xxx.xxx.xxx.xxx/29

ICMP redirects are never sent

Per packet load-sharing is disabled

IP unicast RPF check is disabled

Input features: Stateful Inspection, Dialer i/f override, Virtual Fragment Reassembly, Virtual Fragment Reassembly After IPSec Decryption, NAT Outside

Output features: Post-routing NAT Outside, Stateful Inspection, Dialer idle reset, Dialer idle reset

IP policy routing is disabled

BGP based policy accounting on input is disabled

BGP based policy accounting on output is disabled

Interface is marked as point to point interface

Hardware idb is Dialer0

Fast switching type 15, interface type 98

IP CEF switching enabled

IP CEF switching turbo vector

IP Null turbo vector

IP prefix lookup IPv4 mtrie 8-8-8-8 optimized

Input fast flags 0x400040, Output fast flags 0x10100

ifindex 12(12)

Slot Slot unit -1 VC -1

IP MTU 1492

Is this normal? Is it needed for NAT? Can it be disabled?

Joseph W. Doherty · ‎09-05-2012

Disclaimer

The Author of this posting offers the information contained within this posting without consideration and with the reader's understanding that there's no implied or expressed suitability or fitness for any purpose. Information provided is for informational purposes only and should not be construed as rendering professional advice of any kind. Usage of this posting's information is solely at reader's own risk.

Liability Disclaimer

In no event shall Author be liable for any damages whatsoever (including, without limitation, damages for loss of use, data or profit) arising out of the use or inability to use the posting's information even if Author has been advised of the possibility of such damage.

Posting

Ryan Barclay wrote:

Thanks for the information Joseph. I have just looked at our Dialer interface and noticed that "Stateful Inspection" is present:

Dialer0 is up (if_number 12)

Corresponding hwidb fast_if_number 12

Corresponding hwidb firstsw->if_number 12

Internet address is xxx.xxx.xxx.xxx/29

ICMP redirects are never sent

Per packet load-sharing is disabled

IP unicast RPF check is disabled

Input features: Stateful Inspection, Dialer i/f override, Virtual Fragment Reassembly, Virtual Fragment Reassembly After IPSec Decryption, NAT Outside

Output features: Post-routing NAT Outside, Stateful Inspection, Dialer idle reset, Dialer idle reset

IP policy routing is disabled

BGP based policy accounting on input is disabled

BGP based policy accounting on output is disabled

Interface is marked as point to point interface

Hardware idb is Dialer0

Fast switching type 15, interface type 98

IP CEF switching enabled

IP CEF switching turbo vector

IP Null turbo vector

IP prefix lookup IPv4 mtrie 8-8-8-8 optimized

Input fast flags 0x400040, Output fast flags 0x10100

ifindex 12(12)

Slot Slot unit -1 VC -1

IP MTU 1492

Is this normal? Is it needed for NAT? Can it be disabled?

My guess, might be related to NAT.

paolo bevilacqua · ‎09-05-2012

Ryan Barclay wrote:

Also, I forgot to mention Joseph - when we disable CEF, traffic slows right down to just 20Mb/s - that's all we can push through it. However, with CEF enabled we can push almost 100Mb/s - is this normal?

You need to run with CEF exclusively. The router is not designed to work without.

Ian Stephens · ‎09-05-2012

Thank you for the information Paolo.

Ian Stephens · ‎09-05-2012

Do you think that if I configure a single direct Ethernet interface on the inside instead of using the Vlan this would help with the CPU pegging? Just a thought.

Sent from Cisco Technical Support iPad App

Joseph W. Doherty · ‎09-05-2012

Disclaimer

The Author of this posting offers the information contained within this posting without consideration and with the reader's understanding that there's no implied or expressed suitability or fitness for any purpose. Information provided is for informational purposes only and should not be construed as rendering professional advice of any kind. Usage of this posting's information is solely at reader's own risk.

Liability Disclaimer

In no event shall Author be liable for any damages whatsoever (including, without limitation, damages for loss of use, data or profit) arising out of the use or inability to use the posting's information even if Author has been advised of the possibility of such damage.

Posting

Ryan Barclay wrote:

Do you think that if I configure a single direct Ethernet interface on the inside instead of using the Vlan this would help with the CPU pegging? Just a thought.

Sent from Cisco Technical Support iPad App

I doubt it.

One other possible CPU consumer is the router dealing with fragmentation because of your PPoE, but don't see a way to avoid it because your are PPoE. With your configuration, since you also have the mss-adjust for TCP, unlikely there's much of this happening.

ROBERTO TACCON · ‎09-06-2012

Any log with "fragment table has reached its maximum" ?

Any change on CPU with "no ip virtual-reassembly" ?

http://www.cisco.com/en/US/docs/ios-xml/ios/security/d1/sec-cr-i3.html#GUID-70035BE4-0286-4E4C-8B59-263F64069CA4

Ian Stephens · ‎09-06-2012

I have applied the following to both the Dialer0 and the Vlan1 interfaces:

no ip virtual-reassembly in

However, CPU is still 99% when traffic is heavy.

How can I check the fragment table? Sorry, I'm quite new to all of this.

Thank you very much for your help.

ROBERTO TACCON · ‎09-06-2012

>How can I check the fragment table?

sh ip virtual-reassembly