10-09-2012 03:09 PM
Has anyone else seen anything like this?
We have 4 WAP121s running 1.0.1.10. After a couple of days of operation, they all become unusably slow. The admin web interface takes several minutes to load, and wireless clients timeout before RADIUS authentication can complete. Waiting long enough allows the diag info to be downloaded, which shows hostapd using all of the CPU time available:
Top:
Mem: 44256K used, 17272K free, 0K shrd, 4748K buff, 19148K cached
CPU: 99% usr 0% sys 0% nic 0% idle 0% io 0% irq 0% sirq
Load average: 9.36 8.16 7.68 11/40 1404
PID PPID USER STAT VSZ %MEM CPU %CPU COMMAND
1137 870 root R < 2612 4% 0 98% /usr/sbin/hostapd /tmp/hostapd.con
870 1 root R < 10068 16% 0 1% /usr/sbin/dman
1358 1123 root S < 3404 6% 0 1% admin.cgi
1403 1375 root R < 1808 3% 0 1% top -n 1 -b
1069 870 root R < 13048 21% 0 0% /usr/sbin/snmpd -f -c /tmp/snmpd.c
1124 870 root S < 5420 9% 0 0% /usr/bin/mini_httpd-ssl -S -E /etc
1123 870 root S < 5376 9% 0 0% /usr/bin/mini_httpd-ssl -D -p1 80
1360 1358 root S < 5376 9% 0 0% /usr/bin/mini_httpd-ssl -D -p1 80
1373 1372 root S < 5376 9% 0 0% /usr/bin/mini_httpd-ssl -D -p1 80
797 1 root R < 4668 8% 0 0% /usr/sbin/tspec
1372 1123 root S < 2468 4% 0 0% download.cgi diag-log
1 0 root S < 1812 3% 0 0% init
1369 870 root S < 1812 3% 0 0% /sbin/udhcpc -l /tmp/udhcpc.lease.
1104 870 root S < 1812 3% 0 0% /sbin/getty -L 115200 ttyS0
1374 1372 root S < 1808 3% 0 0% sh -c sh /sbin/show_diagnostics.sh
1375 1374 root S < 1804 3% 0 0% sh /sbin/show_diagnostics.sh
721 1 root D < 1800 3% 0 0% insmod /lib/modules/2.6.21.5/extra
783 1 root R < 1800 3% 0 0% insmod /lib/modules/2.6.21.5/extra
752 1 root R < 1800 3% 0 0% insmod /lib/modules/2.6.21.5/extra
720 1 root D < 1800 3% 0 0% insmod /lib/modules/2.6.21.5/extra
768 1 root D < 1800 3% 0 0% insmod /lib/modules/2.6.21.5/extra
1393 870 root R < 1008 2% 0 0% /usr/sbin/sntp -s 3600 0 crdc01.sa
1371 870 root S < 848 1% 0 0% /sbin/dhcp6c brtrunk 0 /usr/share/
1096 870 root S < 668 1% 0 0% /usr/sbin/syslogd
882 870 root R < 660 1% 0 0% /usr/bin/eapd
668 1 root SW< 0 0% 0 0% [mtdblockd]
3 1 root SW< 0 0% 0 0% [events/0]
4 1 root SW< 0 0% 0 0% [khelper]
2 1 root SWN 0 0% 0 0% [ksoftirqd/0]
49 5 root SW< 0 0% 0 0% [unionfs_siod/0]
698 1 root SWN 0 0% 0 0% [jffs2_gcd_mtd2]
20 5 root SW< 0 0% 0 0% [kblockd/0]
21 5 root SW< 0 0% 0 0% [kseriod]
45 5 root DW 0 0% 0 0% [pdflush]
5 1 root DW< 0 0% 0 0% [kthread]
47 5 root SW< 0 0% 0 0% [kswapd0]
48 5 root SW< 0 0% 0 0% [aio/0]
1399 5 root SW< 0 0% 0 0% [kthread]
46 5 root RW 0 0% 0 0% [pdflush]
1404 5 root RW< 0 0% 0 0% [kthread]
Unless rebooted the unit will continue in this state until about 3.5 days have passed since the last boot, when the unit will crash and restart, logging a "
CPU Unable to handle kernel paging request for process = swapper and pid = 0". The stack trace is slightly different each time, but the uptimes are pretty consistent:
crashdump1:
Software Version: 1.0.1.10
Crash Log:
Uptime of the AP: 3 Days, 12 Hours, 50 Minutes and 22 Seconds
CPU Unable to handle kernel paging request for process = swapper and pid = 0
crashdump2:
Software Version: 1.0.1.10
Crash Log:
Uptime of the AP: 3 Days, 15 Hours, 21 Minutes and 38 Seconds
CPU Unable to handle kernel paging request for process = swapper and pid = 0
crashdump3:
Software Version: 1.0.1.10
Crash Log:
Uptime of the AP: 3 Days, 13 Hours, 19 Minutes and 18 Seconds
CPU Unable to handle kernel paging request for process = swapper and pid = 0
crashdump4:
Software Version: 1.0.1.10
Crash Log:
Uptime of the AP: 3 Days, 17 Hours, 6 Minutes and 6 Seconds
CPU Unable to handle kernel paging request for process = swapper and pid = 0
As a workaround, we used to use a small script to reboot them every night via SSH, but someone at Cisco made a less-than-brilliant decision to remove this advertised feature.
10-09-2012 03:59 PM
Hi Chris, this is a pretty interesting topic and finding.
I don't think you will get the attention you need using the forums. Judging this book by the cover, this looks like a potentially serious problem.
Would it be possible for you to collect a comprehensive data dump which includes-
-ISP, modem model, router, switches, other access points, wireless LAN controllers, etc
-Servers, ACS, MS, UNIX, etc
-Number of connected clients and types, printers, computers (and OS)
-Length of wires from your switches to the APs
-Access point config files
-Access point logging
-Set up an external syslog and gather those logs
-Packet capture from the access points showing normal authentication
-RADIUS parameters (config file if possible)
-Working wired packet capture on the LAN when the AP seems to fail
-If able, a failing packet capture during the event from the AP
-If able, a pcap on the switchport the AP connects when fails
-RADIUS logs
-Where are the access points installed
-INSIDDER scan for possible RF interferences
-Details of the environment around each AP, such as metal objects, walls, windows, elevation, cable distances, POE or non-POE
Other points of consideration, if using POE, what happens using power adapters?
Can you move the AP's to a closer proximity of the switches (if there is any kind of variable differences)?
Do you have any other similar deployments somewhere else that has the same issue or no other issues?
If you can collect most or all of this info, I'd recommend to give the SBSC a call to open a ticket and try to push the case up to the L2 team.
-Tom
Please rate helpful posts
02-12-2016 06:55 AM
Hi,
Was this issue ever resolved? I'm having similar issues with my WAP121.
Kind regards
Michael
02-14-2016 09:46 AM
There's no solution. We threw them all in trash and put in another vendor's product.
02-15-2016 02:18 AM
Thanks for the reply Chris...I thought as much!
Never had so much trouble with any AP like this one...
Cheers
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide