Hi,

Chris Russell · ‎10-09-2012

Has anyone else seen anything like this?

We have 4 WAP121s running 1.0.1.10. After a couple of days of operation, they all become unusably slow. The admin web interface takes several minutes to load, and wireless clients timeout before RADIUS authentication can complete. Waiting long enough allows the diag info to be downloaded, which shows hostapd using all of the CPU time available:

Top:

Mem: 44256K used, 17272K free, 0K shrd, 4748K buff, 19148K cached

CPU: 99% usr 0% sys 0% nic 0% idle 0% io 0% irq 0% sirq

Load average: 9.36 8.16 7.68 11/40 1404

PID PPID USER STAT VSZ %MEM CPU %CPU COMMAND

1137 870 root R < 2612 4% 0 98% /usr/sbin/hostapd /tmp/hostapd.con

870 1 root R < 10068 16% 0 1% /usr/sbin/dman

1358 1123 root S < 3404 6% 0 1% admin.cgi

1403 1375 root R < 1808 3% 0 1% top -n 1 -b

1069 870 root R < 13048 21% 0 0% /usr/sbin/snmpd -f -c /tmp/snmpd.c

1124 870 root S < 5420 9% 0 0% /usr/bin/mini_httpd-ssl -S -E /etc

1123 870 root S < 5376 9% 0 0% /usr/bin/mini_httpd-ssl -D -p1 80

1360 1358 root S < 5376 9% 0 0% /usr/bin/mini_httpd-ssl -D -p1 80

1373 1372 root S < 5376 9% 0 0% /usr/bin/mini_httpd-ssl -D -p1 80

797 1 root R < 4668 8% 0 0% /usr/sbin/tspec

1372 1123 root S < 2468 4% 0 0% download.cgi diag-log

1 0 root S < 1812 3% 0 0% init

1369 870 root S < 1812 3% 0 0% /sbin/udhcpc -l /tmp/udhcpc.lease.

1104 870 root S < 1812 3% 0 0% /sbin/getty -L 115200 ttyS0

1374 1372 root S < 1808 3% 0 0% sh -c sh /sbin/show_diagnostics.sh

1375 1374 root S < 1804 3% 0 0% sh /sbin/show_diagnostics.sh

721 1 root D < 1800 3% 0 0% insmod /lib/modules/2.6.21.5/extra

783 1 root R < 1800 3% 0 0% insmod /lib/modules/2.6.21.5/extra

752 1 root R < 1800 3% 0 0% insmod /lib/modules/2.6.21.5/extra

720 1 root D < 1800 3% 0 0% insmod /lib/modules/2.6.21.5/extra

768 1 root D < 1800 3% 0 0% insmod /lib/modules/2.6.21.5/extra

1393 870 root R < 1008 2% 0 0% /usr/sbin/sntp -s 3600 0 crdc01.sa

1371 870 root S < 848 1% 0 0% /sbin/dhcp6c brtrunk 0 /usr/share/

1096 870 root S < 668 1% 0 0% /usr/sbin/syslogd

882 870 root R < 660 1% 0 0% /usr/bin/eapd

668 1 root SW< 0 0% 0 0% [mtdblockd]

3 1 root SW< 0 0% 0 0% [events/0]

4 1 root SW< 0 0% 0 0% [khelper]

2 1 root SWN 0 0% 0 0% [ksoftirqd/0]

49 5 root SW< 0 0% 0 0% [unionfs_siod/0]

698 1 root SWN 0 0% 0 0% [jffs2_gcd_mtd2]

20 5 root SW< 0 0% 0 0% [kblockd/0]

21 5 root SW< 0 0% 0 0% [kseriod]

45 5 root DW 0 0% 0 0% [pdflush]

5 1 root DW< 0 0% 0 0% [kthread]

47 5 root SW< 0 0% 0 0% [kswapd0]

48 5 root SW< 0 0% 0 0% [aio/0]

1399 5 root SW< 0 0% 0 0% [kthread]

46 5 root RW 0 0% 0 0% [pdflush]

1404 5 root RW< 0 0% 0 0% [kthread]

Unless rebooted the unit will continue in this state until about 3.5 days have passed since the last boot, when the unit will crash and restart, logging a "

CPU Unable to handle kernel paging request for process = swapper and pid = 0". The stack trace is slightly different each time, but the uptimes are pretty consistent:

crashdump1:

Software Version: 1.0.1.10

Crash Log:

Uptime of the AP: 3 Days, 12 Hours, 50 Minutes and 22 Seconds

CPU Unable to handle kernel paging request for process = swapper and pid = 0

crashdump2:

Software Version: 1.0.1.10

Crash Log:

Uptime of the AP: 3 Days, 15 Hours, 21 Minutes and 38 Seconds

CPU Unable to handle kernel paging request for process = swapper and pid = 0

crashdump3:

Software Version: 1.0.1.10

Crash Log:

Uptime of the AP: 3 Days, 13 Hours, 19 Minutes and 18 Seconds

CPU Unable to handle kernel paging request for process = swapper and pid = 0

crashdump4:

Software Version: 1.0.1.10

Crash Log:

Uptime of the AP: 3 Days, 17 Hours, 6 Minutes and 6 Seconds

CPU Unable to handle kernel paging request for process = swapper and pid = 0

As a workaround, we used to use a small script to reboot them every night via SSH, but someone at Cisco made a less-than-brilliant decision to remove this advertised feature.

Tom Watts · ‎10-09-2012

Hi Chris, this is a pretty interesting topic and finding.

I don't think you will get the attention you need using the forums. Judging this book by the cover, this looks like a potentially serious problem.

Would it be possible for you to collect a comprehensive data dump which includes-

Network topology, including

-ISP, modem model, router, switches, other access points, wireless LAN controllers, etc

-Servers, ACS, MS, UNIX, etc

-Number of connected clients and types, printers, computers (and OS)

-Length of wires from your switches to the APs

Config files and data dump

-Access point config files

-Access point logging

-Set up an external syslog and gather those logs

-Packet capture from the access points showing normal authentication

-RADIUS parameters (config file if possible)

-Working wired packet capture on the LAN when the AP seems to fail

-If able, a failing packet capture during the event from the AP

-If able, a pcap on the switchport the AP connects when fails

-RADIUS logs

Environment details

-Where are the access points installed

-INSIDDER scan for possible RF interferences

-Details of the environment around each AP, such as metal objects, walls, windows, elevation, cable distances, POE or non-POE

Other points of consideration, if using POE, what happens using power adapters?

Can you move the AP's to a closer proximity of the switches (if there is any kind of variable differences)?

Do you have any other similar deployments somewhere else that has the same issue or no other issues?

If you can collect most or all of this info, I'd recommend to give the SBSC a call to open a ticket and try to push the case up to the L2 team.

-Tom
Please rate helpful posts

-Tom Please mark answered for helpful posts http://blogs.cisco.com/smallbusiness/

michael.newton11 · ‎02-12-2016

Hi,

Was this issue ever resolved? I'm having similar issues with my WAP121.

Kind regards

Michael

Chris Russell · ‎02-14-2016

There's no solution. We threw them all in trash and put in another vendor's product.

michael.newton11 · ‎02-15-2016

Thanks for the reply Chris...I thought as much!

Never had so much trouble with any AP like this one...

Cheers

WAP121 hostapd uses all CPU, crashes every 3.5 days