cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
6377
Views
10
Helpful
13
Replies

Observations and feedback on 8.2.121.0 code-Not recommended for customers with high traffic

Devaiah N K
Level 1
Level 1

Hi All,

I would like to share my experience after upgrading my WLC's in a  HA pair from 81.131.0 to 8.2.121.0.

It has been a week since I have upgraded the 8540 controllers in HA pair to 8.2.121.0 code which is termed as a "stable code". 

I have already come across 3 issues with this but not sure if it has a bug ID associated to all these issues.

1)One of the ethernet ports on the 8540 controller goes down

I have connected 4 X 10 Gig uplink from each controller and noticed (multiple times) that one of the ports on port channel connecting to the secondary controller goes down.

Fix: Reboot the controller (only the affected unit), all the the interface comes up.

2)WLC secondary and Primary crashed

With a high load of close to 15000 users on one WLC, the secondary (standby)WLC failed and rebooted itself (Still under investigation by Dimension data and Cisco).

Soon after this unit came back up, in ten to 15 mins the Primary (Active) controller crashed and fails over to secondary controller.

2a) All the AP's don't failover (50%)  and get stuck in transit. Radius authentications from this controller freezes.

*spamApTask4: Sep 24 08:25:12.817: %RRM-3-RRM_LOGMSG: [PS]rrmClient.c:1159 RRM LOG: IAPP AGGR Neigh, Unable to find AP f8:4f:57:e2:0e:50

*spamApTask6: Sep 24 08:25:12.751: %RRM-3-RRM_LOGMSG: [PS]rrmClient.c:1159 RRM LOG: IAPP AGGR Neigh, Unable to find AP f8:4f:57:a1:e0:50

*spamApTask1: Sep 24 08:25:12.547: %RRM-3-RRM_LOGMSG: [PS]rrmClient.c:1159 RRM LOG: IAPP AGGR Neigh, Unable to find AP f8:4f:57:a1:ac:c0

*spamApTask2: Sep 24 08:25:12.343: %RRM-3-RRM_LOGMSG: [PS]rrmClient.c:1159 RRM LOG: IAPP AGGR Neigh, Unable to find AP f8:4f:57:66:f8:90

*spamApTask5: Sep 24 08:25:11.764: %RRM-3-RRM_LOGMSG: [PS]rrmClient.c:1159 RRM LOG: IAPP AGGR Neigh, Unable to find AP ec:e1:a9:f1:ab:20

2b) Looks like the radius aaa process was stuck after the failover.Attached is the screenshot from ISE with failed authentications.

*Dot1x_NW_MsgTask_2: Sep 23 11:54:25.971: %DOT1X-3-AAA_AUTH_SEND_FAIL: [SA]1x_aaa.c:849 Unable to send AAA message for client e8:b1:fc:50:54:42

*Dot1x_NW_MsgTask_0: Sep 23 11:54:25.969: %DOT1X-3-ABORT_AUTH: [SA]1x_bauth_sm.c:450  Authentication Aborted for  client d8:bb:2c:57:db:d0 Abort Reason:DOT1X RESTARTED DUE TO EAPOL-START/CLIENT ROAM

*Dot1x_NW_MsgTask_2: Sep 23 11:54:25.968: %DOT1X-3-ABORT_AUTH: [SA]1x_bauth_sm.c:450  Authentication Aborted for  client e8:b1:fc:50:54:42 Abort Reason:DOT1X RESTARTED DUE TO EAPOL-START/CLIENT ROAM

*Dot1x_NW_MsgTask_5: Sep 23 11:54:25.965: %DOT1X-3-ABORT_AUTH: [SA]1x_bauth_sm.c:450  Authentication Aborted for  client 2c:f0:a2:68:67:bd Abort Reason:DOT1X RESTARTED DUE TO EAPOL-START/CLIENT ROAM

Fix: Performed a "Redundancy force switchover" to the primary WLC if it is back up. All the AP's joined the active controller and authentications started functioning normally.

Bug ID: CSCUx28505 :WLC 8510 Crashed with "fp_main_task" in the 8_2_1_124 image

WLC crashes on boot with high traffic

Conditions:
WLC crashes on boot with high traffic

Workaround:

After Stop the IXIA traffic and boot the WLC:-
No crash observed-

Not a relevant workaround for my case.

I am seeking inputs from Cisco for these issues.

Cheers,

Dev.

13 Replies 13

Leo Laohoo
Hall of Fame
Hall of Fame

8.2.124.0 was supposed to come out over the weekend.  

8.2 MR4 Beta program information can be found HERE.

Thanks a lot Leo for the updates. I shall stay tuned to see if these get addressed.

Hope to see the public release of 8.2 MR3 soon.

Hi DNK,

Looks like you have some fun time with 8.2.121.0.

I have upgraded our 8540 pair to 8.2.121.7 instead of 8.2.121.0 after seen below.

CSCva93401 - WLC system crash (spamApTask) immediately after upgrade to 8.2.121.0
I think we are lucky so far, did not crash it yet.
Thanks for sharing your observations with 8540/8.2.x
Regards
Rasika

Hi Rasika,

Yes, very unpredictable behaviour since the upgrade. 

Am monitoring the system and hardware memory utilisation in very high which might trigger a crash. 

Could you kindly send the release notes/fixed bugs in the 8.2.121.7  code?

8.2 MR3 might be released in a couple of days. Many of such bugs (unexpected crash related) are fixed in that release.

If not, will be safe to take the 8.2.121.7 code. 

Thanks a lot for your sharing your experience too.

Cheers,
Dev.

Hi,

We have also noticed that after 8.2.121.0 snmp traps and polling stop working.

We are struggling to find a stable code for our environment but no luck so far. In 8.2.111.0 and before there is a bug: ap stuck radio were people are not able to associate and ap's has to be rebooted to fix the issue. 

Di you experience any the snmp issue or the stuck radio on the 8.2.121.7? Also TAC recommended 8.2.121.14.

Regards,

Dan

Another important thing that we have noticed: in 8.2.111.0: a lot of radius stops with zero octets in/out.

Hi Mihai,

Sorry for the late response, yes, I had noticed the AP 's in a  stuck state sometimes while its also broadcasting the SSID.

Yes, reboot was the internal fix we found to be effective.

I have also noticed the radius stopped to respond messages in the logs sometimes for port 1813 (accounting server).

However, have validated that logs are being created on the radius server. So looks cosmetic unless it was port 1812 (Auth server).

I see some improvements in 8.2.121.130 code but it has its own know caveats.

I have upgraded our 8540 pair to 8.2.121.7 

Makes me want to roll back to 8.2.100.0, Rasika.  

Hi Rasika,

Any major issues/concerns noted till now on the 8.2.121.7?

On 8.2.121.0 code these are the processes that are causing high utilization in hardware memory.

Name                               PID         Priority         CPU Use   (usr/sys)% hwm   CPU   Reaper
osapiBsnTimer             5568   ( 70/ 71)        0     (  0/  0)%                94     8
redXmlTransferMain 2416   (240/  7)        0     (  0/  0)%                87     4
fp_main_task              2401   (240/  7)        0     (  0/  0)%                  0    0

Note: if the utilization crosses 90% then it is alarming and can potentially cause the controller to crash.

Workaround: In a HA pair perform "redundancy force switchover to the standy unit" it brings down the utilization but has to be monitored frequently.

The task specified in the bug are comparatively low.

 spamApTask7        5690   (118/ 53)        0     (  0/  0)%  22    5
 spamApTask6        5689   (118/ 53)        0     (  0/  0)%  22    2
 spamApTask5        5688   (118/ 53)        0     (  0/  0)%  22   20
 spamApTask4        5687   ( 53/ 78)        0     (  0/  0)%  22    3
 spamApTask3        5686   (118/ 53)        0     (  0/  0)%  20   10
 spamApTask2        5685   (118/ 53)        0     (  0/  0)%  22   23
 spamApTask1        5684   (118/ 53)        0     (  0/  0)%  22    2
 spamApTask0        5683   (118/ 53)        0     (  0/  0)%  23    8
 spamReceiveTask    5682   (120/ 52)        0     (  0/  0)%  58    0

Hi Dev,

Which CLI commands gives above output, I can check mine & see.  We haven't experience any crashes since upgrade in 4 weeks back.

I think 8.2.130.0 has been released. You can go with that instead of 8.2.121.7

http://www.cisco.com/c/en/us/td/docs/wireless/controller/release/notes/crn82mr2.html

HTH

Rasika

Hi Rasika,

Thanks for your comments.

sh process CPU is fetching me with these logs( you will notice these tasks at the end of output.

I was offered 8.2.121.14 but Task named RedxmlTransfer main task isn't fixed in this code.

This might be a random issue as am not seeing this on all 8540 HA setup.

I just went through the release notes of 8.2.121.130 and it is fixed in that release.

I might have to go with this code now.

CSCva55011
Task Name redXmlTransferMain reloads unexpectedly with HA SSO

We also had some of these issues with 8.2.121.0. We upgraded to 8.2.130.0 last week and have no more issues since then.

The Cisco 8540 Wireless Controller provides centralized control, management, and troubleshooting for high-scale deployments in service provider and large campus deployments. It offers flexibility to support multiple deployment modes in the same controller: for example, centralized mode for campus, Cisco FlexConnect mode for lean branches managed over the WAN, and mesh (bridge) mode for deployments where full Ethernet cabling is unavailable. As a component of the Cisco Unified Wireless Network, this controller provides real-time communications between Cisco Aironet access points, the Cisco Prime Infrastructure, and the Cisco Mobility Services Engine, and is inter-operable with other Cisco controllers.
1Two Type A 3.0 USB ports
2CIMC port 10/100/1000 Base-T
3SerialCOM Connector—Standard RS-232 Serial COM port using RJ-45 connector
4Ethernet Service Port (SP)—Management 10/100/1000 Base-T
5Redundancy Port (RP)
6VGA Connector—Rear panel has a standard VGA port using a female D-Sub-15 Connector
7ID Switch and LED
The Cisco Integrated Management Controller (CIMC) is the management service for the C-Series servers. CIMC runs within the server.

CIMC is a separate management module that is built into the motherboard. CIMC has its own ARM-based processor which runs the CIMC software. It is shipped with a running version of the firmware. Users can update CIMC firmware through the Firmware Update Management page. You need not worry about installing the initial CIMC firmware.
Step 1 Connect the CIMC cable to the 10/100/1000 port in base T.

Step 2 Use the command imm dhcp enable on WLC CLI to enable DHCP to set the IP.

Step 3 If DHCP is not available, use the command imm address <ip address> <net mask> <gateway ip>.

View the IP and details, using the command imm summary.
imm ?
address IMM Static IP Configuration
dhcp Enable | Disable | Fallback DHCP
restart Saves settings and Restarts IMM Module
summary Displays IMM Parameters
username Configures Login Username for IMM

Review Cisco Networking for a $25 gift card