01-08-2015 03:08 AM - edited 07-05-2021 02:14 AM
Hello,
we have an expected reboot of our active WLC5508 in HA cluster two weeks ago. The standby unit became active and because of Xmas and New Year holidays we found it only yesterday.
There is no crash file generated by the controller during reboot. Only one thing that we have is log messages sent to the syslog server by the wlc at the time of reboot:
Dec 19 00:34:39 wlc2 wlc2: *rmgrMain: Dec 19 00:34:40.113: #RMGR-0-RED_HA_RELOAD: rmgr_utils.c:216 System reboot: reason: category Peer reload req object redundancy management interface and redundancy port are down
Dec 19 00:39:57 141.79.131.4 wlc2: *rfacMain: Dec 19 00:39:58.957: #RMGR-0-RED_HA_RELOAD: rmgr_utils.c:216 System reboot: reason: category New XML downloaded object rsyncmgrXferTrasport
The first time mark is exactly the time of redundancy switchover:
(Cisco Controller) >show redundancy summary
....
Switchover Reason = Active controller failed, Switchover Time = Fri Dec
19 00:34:40 2014
The WLCs in HA have 7.6.130.0
I found two bugs with unexpected reboot for this software version
CSCur86730 and CSCuq97965
But nothing regarding the log sent by the controller.
A detail, that may be importent or may be not: the rebooted/crashed wlc was replaced by cisco 1 or 2 weeks vor dem reboot.
Do somebody have already had the same or similar issue?
Thank you and kind regards
Sergej
01-08-2015 05:39 AM
- Looks like your peer redundancy link may have been interrupted, making the HA STBye think it had to take over. Check these links and their statuses during your holiday period. Some offices and or rooms may have had outages for instances during the holidays.
M.
01-08-2015 06:43 AM
Hello marce1000,
we have manually the on the 19th. dec. rebooted controller to the active unit again yesterday. Today I checked the uptime on both wlc in the HA cluster:
active wlc (xxx.xxx.131.4)
(Cisco Controller) >show sysinfo
Manufacturer's Name.............................. Cisco Systems Inc.
Product Name..................................... Cisco Controller
Product Version.................................. 7.6.130.0
Bootloader Version............................... 1.0.20
Field Recovery Image Version..................... 7.6.95.16
Firmware Version................................. FPGA 1.7, Env 1.8, USB console 2.2
Build Type....................................... DATA + WPS
System Name...................................... wlc2
System Location..................................
System Contact...................................
System ObjectID.................................. 1.3.6.1.4.1.9.1.1069
Redundancy Mode.................................. SSO (Both AP and Client SSO)
IP Address....................................... xxx.xxx.131.3
Last Reset....................................... Software reset
System Up Time................................... 20 days 14 hrs 44 mins 7 secs
and the standby wlc (xxx.xxx.131.6)
(Cisco Controller-Standby) >show sysinfo
Manufacturer's Name.............................. Cisco Systems Inc.
Product Name..................................... Cisco Controller
Product Version.................................. 7.6.130.0
Bootloader Version............................... 1.0.20
Field Recovery Image Version..................... 7.6.95.16
Firmware Version................................. FPGA 1.7, Env 1.8, USB console 2.2
Build Type....................................... DATA + WPS
System Name...................................... wlc2
System Location..................................
System Contact...................................
System ObjectID.................................. 1.3.6.1.4.1.9.1.1069
Redundancy Mode.................................. SSO (Both AP and Client SSO)
IP Address....................................... xxx.xxx.131.3
Last Reset....................................... Watchdog reset
System Up Time................................... 0 days 23 hrs 56 mins 33 secs
It looks like the active unit really crahed on the 19th. of Dec. caused by unknown software error and the standby unit rebooted yesterday after switchover to another wlc caused by watchdog reset nad with with log message:
Jan 7 15:25:18 xxx.xxx.131.6 wlc2: *rfacMain: Jan 07 15:25:19.710: #RMGR-0-RED_HA_RELOAD: rmgr_utils.c:216 System reboot: reason: category New XML downloaded object rsyncmgrXferTrasport
Is it actually normal the active wlc reboot itself, if the switchover manually started or the HA link goes down?
Thanks.
Sergej
01-08-2015 11:51 PM
just forgot to mention, the switchover was made with command " redundancy force-switchover" via ssh connection
07-31-2015 12:39 PM
Hi Sergej, I'm wondering if you evenutally resolved your issue ?
I also have WLC5508 in HA running v7.6.130.0. I have a twist to my issue. If my primary is the active WLC, after a couple of days, it reboots with the same error as yours. However, if my backup is the active WLC, it doesn't reboot. it's pretty much stable. vvery strange.
I would appreciated some update from your part.
Thanks !
Tony
12-02-2015 05:50 AM
Dear All ,
We have configured HA using WLC 5508 version 8.0 . But after few days it was rebooting unexpectedly and we have seen the below message from syslog server :
root@syslog ~]# tail -f /var/log/messages
Nov 29 03:29:01 nms rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="16103" x-info="http://www.rsyslog.com"] rsyslogd was HUPed
Dec 2 06:44:00 10.21.21.16 bkash-WLC-Primary: *rmgrMain: Dec 03 00:38:55.034: #RMGR-0-RED_HA_RELOAD: rmgr_utils.c:239 System is rebooting, reason: XMLs were not trasferred from Active to Standby
Can anyone help me to find out the solution .I am extremly waiting for your reply.
Thanks and regards
Erfan
12-02-2015 06:25 AM
v8.0.120.0? Dumb question, but was everything setup right and how long has this been working until you noticed the reboots? You have a back to back Ethernet cable from the RP ports?
-Scott
12-02-2015 08:52 AM
Hi Scott , We already connected back to back cable on RP port. We have configured peer service port in Redundancy .We already face similar types of problem 3 to 4 times .
WLC IOS version is : 8.0.115.0.
Can you tell why it's happning unexpected reboot and it showing XMLs are not transfering from active to standby .
For your information , If secondary WLC is in Active mode than We have seen that unexpcted reboot would be happened .
I am expecting you prompt reply and support.
Thanks and regards
Erfan
12-02-2015 09:07 AM
I would not use that code. Go with v8.0.121.0 and verify that the image on both controllers are the same. Might be the image you have. I have SSO working in many environments with v7.6.130.0, v8.0.120.0, and v8.0.121.0.
-Scott
12-03-2015 04:51 AM
Hi Scott ,
Thanks for your reply.
Is it a bug on IOS version 8.0.115.0. of WLC ??
For your informaiton , Actually it was working fine few days after each reboot and suddenly it does reboot .But we couldn't find out any reason on this strange behaviour of WLC in HA mode.
We have configured service peer ip in redundancy tab .Is it required to configure HA in WLC.
It would be highly appreciated if you reply.
Thanks and regards
Erfan
12-03-2015 05:03 AM
I don't know for sure if it's a bug, but that code isn't recommended. I have to assume also that when you setup SSO that you followed the guide and it is properly setup.
http://www.cisco.com/c/en/us/td/docs/wireless/controller/technotes/7-5/High_Availability_DG.html
There are two ways to setup HA, one is SSO which you are doing and the other is N+1.
http://www.cisco.com/c/en/us/td/docs/wireless/technology/hi_avail/N1_High_Availability_Deployment_Guide.pdf
I would break up the SSO and upgrade both controllers to v8.0.121.0. But first I would probably factory reset the secondary HA controller and go through the the startup wizard just to bring it online. Then after you have your code upgraded, any third party cents and webauth portal pages, then go setup SSO again. This is what I would do since your having this reboot issues for a long time. I don't think your going to fix that with the code your running and you would probably have to break up SSO anyways. The steps are in the guide to break up SSO. You pretty much just have to disable SSO on both units via CLI.
-Scott
12-08-2015 12:12 AM
Hi scott ,
Can you tell that do we need to enable or disable gateway reachibility option in HA mode in WLC as we are getting unexpted reboot in WLC.
Today we have chaged the RP port cable to check whether its happing for cable or not.
Please have a look the below details for your reference
(Cisco Controller) >show redundancy summary ?
(Cisco Controller) >show redundancy summary
Redundancy Mode = SSO ENABLED
Local State = ACTIVE
Peer State = STANDBY HOT
Unit = Primary
Unit ID = 74:A2:E6:C7:6F:E0
Redundancy State = SSO
Mobility MAC = 74:A2:E6:C7:6F:E0
Management Gateway Failover = DISABLED
BulkSync Status = Complete
Average Redundancy Peer Reachability Latency = 436 Micro Seconds
Need your feedback on this
Thanks and regards
Erfan
12-08-2015 06:08 AM
You do not need this since you have a back to back cable. Did you upgrade the controller to v8.0.121.0?
-Scott
12-10-2015 07:22 AM
Hi scott ,
Now ! We will start the upgrade process .Should we disable the SSO from active WLC and then do the up-gradation as per deployment guide .
We are eagerly waiting for your reply.
Thanks and regards
Erfan
12-13-2015 01:30 PM
Sorry for the late response. Yes you need to disable SSO on the primary. This will make you reboot the controllers so that they can come back up as separate units.
Hope that helps
-Scott
*** Please rate helpful post ***
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide