High CPU load on ASR1002

Iulian Vaideanu · ‎09-02-2013

We have several ASR1002 and ASR1004 devices used as PPPoE concentrators. One of them, recently deployed, seems to be a bit more loaded than it should be (compared to others of similar configurations). The platform's specs are ASR1002 / RP1 / ESP10 / SIP10 running IOS XE 3.9.1S, and given ~2700 simultaneous PPPoE sessions and 850Mbps / 500 Mbps (up / down) of traffic, I get the following:

#show platform software status control-processor brief

Load Average

Slot Status 1-Min 5-Min 15-Min

RP0 Healthy 0.30 0.42 0.43

ESP0 Healthy 0.22 0.26 0.16

SIP0 Healthy 0.02 0.01 0.00

Memory (kB)

Slot Status Total Used (Pct) Free (Pct) Committed (Pct)

RP0 Healthy 3874516 1830528 (47%) 2043988 (53%) 2433452 (63%)

ESP0 Healthy 2009440 646056 (32%) 1363384 (68%) 449160 (22%)

SIP0 Healthy 449316 296768 (66%) 152548 (34%) 290272 (65%)

CPU Utilization

Slot CPU User System Nice Idle IRQ SIRQ IOwait

RP0 0 27.34 8.28 0.00 62.87 1.19 0.29 0.00

ESP0 0 15.38 16.78 0.00 67.33 0.09 0.39 0.00

SIP0 0 2.30 1.10 0.00 96.59 0.00 0.00 0.00

On an identical platform (hardware-wise) running IOS XE 3.4.0S, I get lower CPU load values with much more traffic (~4500 PPPoE sessions and 1.6Gbps / 1.3Gbps of traffic):

#show platform software status control-processor brief

Load Average

Slot Status 1-Min 5-Min 15-Min

RP0 Healthy 0.82 0.85 0.84

ESP0 Healthy 1.70 1.71 1.66

SIP0 Healthy 0.00 0.00 0.00

Memory (kB)

Slot Status Total Used (Pct) Free (Pct) Committed (Pct)

RP0 Healthy 3874968 2083852 (54%) 1791116 (46%) 2663188 (69%)

ESP0 Healthy 2009892 766344 (38%) 1243548 (62%) 618688 (31%)

SIP0 Healthy 449768 328112 (73%) 121656 (27%) 360644 (80%)

CPU Utilization

Slot CPU User System Nice Idle IRQ SIRQ IOwait

RP0 0 24.75 5.28 0.00 68.76 0.79 0.39 0.00

ESP0 0 16.31 5.90 0.00 77.27 0.10 0.40 0.00

SIP0 0 0.70 0.60 0.00 98.50 0.00 0.20 0.00

What might be related to the CPU load issue is that in the logs of the problematic ASR there are a lot of "%IOSXE_INFRA-3-PUNT_ADDR_RES_ENCAP_ERR: Punted address resolution packet with unknown encap PPP" entries. Another possibly relevant piece of info is that the problematic ASR uses port-channelled "downlink" GbE interfaces (with manual (Vlan-based) load-balancing).

The questions are: could the use of port-channels affect CPU load on ASR1K? what could be the cause of those log entries? what else (IOS commands) can I use to debug this CPU load issue further?

Many thanks in advance.

Iulian Vaideanu · ‎09-04-2013

In a Linux session on the RP, "top" show the outputs below (sorted by CPU time - first one is taken from the problematic ASR, second one from the other ASR, used for comparison). It may be just a coincidence, but on the abnormally loaded ASR the "hman" process takes quite a lot of CPU time... what exactly does the "Host Manager" process do?

1st (problematic):

top - 16:54:00 up 6 days, 10:07, 0 users, load average: 1.15, 1.08, 1.02

Tasks: 149 total, 1 running, 148 sleeping, 0 stopped, 0 zombie

Cpu(s): 40.5%us, 31.9%sy, 0.0%ni, 25.9%id, 0.0%wa, 1.2%hi, 0.5%si, 0.0%st

Mem: 3874516k total, 1843064k used, 2031452k free, 152376k buffers

Swap: 0k total, 0k used, 0k free, 969184k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ TIME COMMAND

29314 root 20 0 1949m 611m 176m S 31.7 16.2 2391:21 39,51 linux_iosd-imag

28162 root 20 0 20640 7196 4704 S 1.6 0.2 132:20.19 132:20 hman

27377 root 20 0 159m 100m 93m S 1.0 2.7 78:41.58 78:41 fman_rp

26551 root 20 0 36420 13m 10m S 0.7 0.4 71:03.11 71:03 cmand

16924 root 15 -5 0 0 0 S 0.6 0.0 46:21.40 46:21 lsmpi-xmit

16925 root 15 -5 0 0 0 S 0.6 0.0 44:53.33 44:53 lsmpi-rx

4 root -50 0 0 0 0 S 0.5 0.0 29:37.12 29:37 sirq-timer/0

6 root -50 0 0 0 0 S 0.3 0.0 28:38.28 28:38 sirq-net-rx/0

12 root -50 0 0 0 0 S 0.3 0.0 24:46.83 24:46 sirq-rcu/0

26099 root 20 0 4944 3220 1252 S 1.2 0.1 22:39.71 22:39 btrace_rotate.s

32212 root 20 0 122m 111m 6704 S 0.1 2.9 9:25.64 9:25 smand

26889 root 20 0 24720 7620 5124 S 0.0 0.2 5:40.07 5:40 emd

28436 root 20 0 29680 15m 13m S 0.0 0.4 4:05.69 4:05 imand

30341 root 20 0 18712 948 724 S 0.0 0.0 3:22.02 3:22 pcscd

16923 root 15 -5 0 0 0 S 0.0 0.0 2:36.59 2:36 lsmpi-refill

32611 root 20 0 4100 2308 1192 S 0.0 0.1 1:53.54 1:53 sort_files_by_i

2nd (ok):

top - 16:51:20 up 271 days, 11:10, 0 users, load average: 1.28, 1.04, 0.76

Tasks: 132 total, 5 running, 127 sleeping, 0 stopped, 0 zombie

Cpu(s): 28.7%us, 9.6%sy, 0.0%ni, 60.7%id, 0.0%wa, 0.5%hi, 0.5%si, 0.0%st

Mem: 3874968k total, 2090688k used, 1784280k free, 133848k buffers

Swap: 0k total, 0k used, 0k free, 1229736k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ TIME COMMAND

26121 root 20 0 1917m 585m 145m R 69.8 15.5 30640:29 510,40 linux_iosd-imag

25349 root 20 0 26820 14m 13m S 1.0 0.4 3499:32 58,19 imand

23360 root 20 0 28476 12m 8980 R 0.8 0.3 2639:35 43,59 cmand

14988 root 15 -5 0 0 0 R 0.3 0.0 1698:23 28,18 lsmpi-rx

24216 root 20 0 158m 111m 105m S 1.0 2.9 1570:34 26,10 fman_rp

14987 root 15 -5 0 0 0 S 0.4 0.0 1099:26 18,19 lsmpi-xmit

24935 root 20 0 16716 5932 4216 S 0.2 0.2 523:52.12 523:52 hman

23041 root 20 0 4960 3264 1252 S 0.5 0.1 507:25.29 507:25 btrace_rotate.s

28331 root 20 0 101m 91m 6468 S 0.1 2.4 194:33.48 194:33 smand

23708 root 20 0 19716 6556 4080 S 0.0 0.2 194:29.48 194:29 emd

28712 root 20 0 4208 2436 1184 S 0.0 0.1 129:20.18 129:20 sort_files_by_i

315 root 20 0 5048 3336 1244 S 0.1 0.1 53:34.76 53:34 chasync.sh

14734 root 20 0 4460 2816 1308 S 0.1 0.1 21:53.70 21:53 reflector.sh

14986 root 15 -5 0 0 0 R 0.0 0.0 20:38.17 20:38 lsmpi-refill

coobic · ‎10-01-2013

Call admission control exist in your configuration?

If not, please try these (before reload the box):

call admission new-model

call admission limit 700

call admission cpu-limit 70

call admission pppoe 10 1

call admission pppoa 10 1

call admission vpdn 10 1

call admission ip 10 1

In our (similar) case this help.