09-25-2016 05:55 PM - edited 07-05-2021 05:52 AM
Hi All,
I would like to share my experience after upgrading my WLC's in a HA pair from 81.131.0 to 8.2.121.0.
It has been a week since I have upgraded the 8540 controllers in HA pair to 8.2.121.0 code which is termed as a "stable code".
I have already come across 3 issues with this but not sure if it has a bug ID associated to all these issues.
1)One of the ethernet ports on the 8540 controller goes down
I have connected 4 X 10 Gig uplink from each controller and noticed (multiple times) that one of the ports on port channel connecting to the secondary controller goes down.
Fix: Reboot the controller (only the affected unit), all the the interface comes up.
2)WLC secondary and Primary crashed
With a high load of close to 15000 users on one WLC, the secondary (standby)WLC failed and rebooted itself (Still under investigation by Dimension data and Cisco).
Soon after this unit came back up, in ten to 15 mins the Primary (Active) controller crashed and fails over to secondary controller.
2a) All the AP's don't failover (50%) and get stuck in transit. Radius authentications from this controller freezes.
*spamApTask4: Sep 24 08:25:12.817: %RRM-3-RRM_LOGMSG: [PS]rrmClient.c:1159 RRM LOG: IAPP AGGR Neigh, Unable to find AP f8:4f:57:e2:0e:50
*spamApTask6: Sep 24 08:25:12.751: %RRM-3-RRM_LOGMSG: [PS]rrmClient.c:1159 RRM LOG: IAPP AGGR Neigh, Unable to find AP f8:4f:57:a1:e0:50
*spamApTask1: Sep 24 08:25:12.547: %RRM-3-RRM_LOGMSG: [PS]rrmClient.c:1159 RRM LOG: IAPP AGGR Neigh, Unable to find AP f8:4f:57:a1:ac:c0
*spamApTask2: Sep 24 08:25:12.343: %RRM-3-RRM_LOGMSG: [PS]rrmClient.c:1159 RRM LOG: IAPP AGGR Neigh, Unable to find AP f8:4f:57:66:f8:90
*spamApTask5: Sep 24 08:25:11.764: %RRM-3-RRM_LOGMSG: [PS]rrmClient.c:1159 RRM LOG: IAPP AGGR Neigh, Unable to find AP ec:e1:a9:f1:ab:20
2b) Looks like the radius aaa process was stuck after the failover.Attached is the screenshot from ISE with failed authentications.
*Dot1x_NW_MsgTask_2: Sep 23 11:54:25.971: %DOT1X-3-AAA_AUTH_SEND_FAIL: [SA]1x_aaa.c:849 Unable to send AAA message for client e8:b1:fc:50:54:42
*Dot1x_NW_MsgTask_0: Sep 23 11:54:25.969: %DOT1X-3-ABORT_AUTH: [SA]1x_bauth_sm.c:450 Authentication Aborted for client d8:bb:2c:57:db:d0 Abort Reason:DOT1X RESTARTED DUE TO EAPOL-START/CLIENT ROAM
*Dot1x_NW_MsgTask_2: Sep 23 11:54:25.968: %DOT1X-3-ABORT_AUTH: [SA]1x_bauth_sm.c:450 Authentication Aborted for client e8:b1:fc:50:54:42 Abort Reason:DOT1X RESTARTED DUE TO EAPOL-START/CLIENT ROAM
*Dot1x_NW_MsgTask_5: Sep 23 11:54:25.965: %DOT1X-3-ABORT_AUTH: [SA]1x_bauth_sm.c:450 Authentication Aborted for client 2c:f0:a2:68:67:bd Abort Reason:DOT1X RESTARTED DUE TO EAPOL-START/CLIENT ROAM
Fix: Performed a "Redundancy force switchover" to the primary WLC if it is back up. All the AP's joined the active controller and authentications started functioning normally.
Bug ID: CSCUx28505 :WLC 8510 Crashed with "fp_main_task" in the 8_2_1_124 image
WLC crashes on boot with high traffic
Conditions:
WLC crashes on boot with high traffic
Workaround:
After Stop the IXIA traffic and boot the WLC:-No crash observed-
Not a relevant workaround for my case.
I am seeking inputs from Cisco for these issues.
Cheers,
Dev.
09-25-2016 06:02 PM
8.2.124.0 was supposed to come out over the weekend.
8.2 MR4 Beta program information can be found HERE.
09-25-2016 07:20 PM
Thanks a lot Leo for the updates. I shall stay tuned to see if these get addressed.
Hope to see the public release of 8.2 MR3 soon.
09-25-2016 10:06 PM
Hi DNK,
Looks like you have some fun time with 8.2.121.0.
I have upgraded our 8540 pair to 8.2.121.7 instead of 8.2.121.0 after seen below.
09-26-2016 05:13 AM
Hi Rasika,
Yes, very unpredictable behaviour since the upgrade.
Am monitoring the system and hardware memory utilisation in very high which might trigger a crash.
Could you kindly send the release notes/fixed bugs in the 8.2.121.7 code?
8.2 MR3 might be released in a couple of days. Many of such bugs (unexpected crash related) are fixed in that release.
If not, will be safe to take the 8.2.121.7 code.
Thanks a lot for your sharing your experience too.
Cheers,
Dev.
10-14-2016 02:19 PM
Hi,
We have also noticed that after 8.2.121.0 snmp traps and polling stop working.
We are struggling to find a stable code for our environment but no luck so far. In 8.2.111.0 and before there is a bug: ap stuck radio were people are not able to associate and ap's has to be rebooted to fix the issue.
Di you experience any the snmp issue or the stuck radio on the 8.2.121.7? Also TAC recommended 8.2.121.14.
Regards,
Dan
10-14-2016 02:20 PM
Another important thing that we have noticed: in 8.2.111.0: a lot of radius stops with zero octets in/out.
10-26-2016 03:13 PM
Hi Mihai,
Sorry for the late response, yes, I had noticed the AP 's in a stuck state sometimes while its also broadcasting the SSID.
Yes, reboot was the internal fix we found to be effective.
I have also noticed the radius stopped to respond messages in the logs sometimes for port 1813 (accounting server).
However, have validated that logs are being created on the radius server. So looks cosmetic unless it was port 1812 (Auth server).
I see some improvements in 8.2.121.130 code but it has its own know caveats.
09-26-2016 08:10 PM
I have upgraded our 8540 pair to 8.2.121.7
Makes me want to roll back to 8.2.100.0, Rasika.
09-28-2016 07:33 PM
Hi Rasika,
Any major issues/concerns noted till now on the 8.2.121.7?
On 8.2.121.0 code these are the processes that are causing high utilization in hardware memory.
Name PID Priority CPU Use (usr/sys)% hwm CPU Reaper
osapiBsnTimer 5568 ( 70/ 71) 0 ( 0/ 0)% 94 8
redXmlTransferMain 2416 (240/ 7) 0 ( 0/ 0)% 87 4
fp_main_task 2401 (240/ 7) 0 ( 0/ 0)% 0 0
Note: if the utilization crosses 90% then it is alarming and can potentially cause the controller to crash.
Workaround: In a HA pair perform "redundancy force switchover to the standy unit" it brings down the utilization but has to be monitored frequently.
The task specified in the bug are comparatively low.
spamApTask7 5690 (118/ 53) 0 ( 0/ 0)% 22 5
spamApTask6 5689 (118/ 53) 0 ( 0/ 0)% 22 2
spamApTask5 5688 (118/ 53) 0 ( 0/ 0)% 22 20
spamApTask4 5687 ( 53/ 78) 0 ( 0/ 0)% 22 3
spamApTask3 5686 (118/ 53) 0 ( 0/ 0)% 20 10
spamApTask2 5685 (118/ 53) 0 ( 0/ 0)% 22 23
spamApTask1 5684 (118/ 53) 0 ( 0/ 0)% 22 2
spamApTask0 5683 (118/ 53) 0 ( 0/ 0)% 23 8
spamReceiveTask 5682 (120/ 52) 0 ( 0/ 0)% 58 0
09-28-2016 08:28 PM
Hi Dev,
Which CLI commands gives above output, I can check mine & see. We haven't experience any crashes since upgrade in 4 weeks back.
I think 8.2.130.0 has been released. You can go with that instead of 8.2.121.7
http://www.cisco.com/c/en/us/td/docs/wireless/controller/release/notes/crn82mr2.html
HTH
Rasika
09-29-2016 06:30 AM
Hi Rasika,
Thanks for your comments.
sh process CPU is fetching me with these logs( you will notice these tasks at the end of output.
I was offered 8.2.121.14 but Task named RedxmlTransfer main task isn't fixed in this code.
This might be a random issue as am not seeing this on all 8540 HA setup.
I just went through the release notes of 8.2.121.130 and it is fixed in that release.
I might have to go with this code now.
CSCva55011
Task Name redXmlTransferMain reloads unexpectedly with HA SSO
10-07-2016 02:55 AM
We also had some of these issues with 8.2.121.0. We upgraded to 8.2.130.0 last week and have no more issues since then.
03-26-2018 11:49 PM
The Cisco 8540 Wireless Controller provides centralized control, management, and troubleshooting for high-scale deployments in service provider and large campus deployments. It offers flexibility to support multiple deployment modes in the same controller: for example, centralized mode for campus, Cisco FlexConnect mode for lean branches managed over the WAN, and mesh (bridge) mode for deployments where full Ethernet cabling is unavailable. As a component of the Cisco Unified Wireless Network, this controller provides real-time communications between Cisco Aironet access points, the Cisco Prime Infrastructure, and the Cisco Mobility Services Engine, and is inter-operable with other Cisco controllers.
1Two Type A 3.0 USB ports
2CIMC port 10/100/1000 Base-T
3SerialCOM Connector—Standard RS-232 Serial COM port using RJ-45 connector
4Ethernet Service Port (SP)—Management 10/100/1000 Base-T
5Redundancy Port (RP)
6VGA Connector—Rear panel has a standard VGA port using a female D-Sub-15 Connector
7ID Switch and LED
The Cisco Integrated Management Controller (CIMC) is the management service for the C-Series servers. CIMC runs within the server.
CIMC is a separate management module that is built into the motherboard. CIMC has its own ARM-based processor which runs the CIMC software. It is shipped with a running version of the firmware. Users can update CIMC firmware through the Firmware Update Management page. You need not worry about installing the initial CIMC firmware.
Step 1 Connect the CIMC cable to the 10/100/1000 port in base T.
Step 2 Use the command imm dhcp enable on WLC CLI to enable DHCP to set the IP.
Step 3 If DHCP is not available, use the command imm address <ip address> <net mask> <gateway ip>.
View the IP and details, using the command imm summary.
imm ?
address IMM Static IP Configuration
dhcp Enable | Disable | Fallback DHCP
restart Saves settings and Restarts IMM Module
summary Displays IMM Parameters
username Configures Login Username for IMM
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide