cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
539
Views
8
Helpful
10
Replies

Frequent WLC 5508 Crashes and EmWeb process using too much CPU.

nikolas-pereira
Spotlight
Spotlight

Hello everyone,

I am experiencing frequent crashes on my Cisco WLC 5508 running version 8.0.140.0. The WLC is out of support and I cannot upgrade to a newer version, and I honestly noticed that there are reports of bugs with this same task in newer versions. The customer have an 9800 controller with more than 4.000 clients, but still with 5508 to the legacy envollriment while is refreshing the devices. We have around 1.100 clients in this legacy controller.

  • The crash handler points to the EMWEB task consuming almost all available memory before reboot.

  • I have analyzed logs and the core dump generated at the crash time, but have not been able to precisely identify the root cause.

  • The customer encounters this mainly while browsing the WEB GUI, but don't know where.

 

Anyone knows about this error? how to workaround or find the trigger to the crash?

 

Similar problems:

https://www.findbugzero.com/operational-defect-database/vendors/cisco/defects/CSCus77368

https://bst.cloudapps.cisco.com/bugsearch/bug/CSCva98597 (seems like the same error, but in a newer version)

2 Accepted Solutions

Accepted Solutions

Rich R
VIP
VIP

In recent years it has been working smoothly, recently they added a new SSID with central web auth with cisco ISE.
Well I think you've answered your own question right there @nikolas-pereira - that's what has caused the instability!

As everyone has pointed out already - this is so badly out of support that adding any new features/APs/clients to this setup was sheer madness/idiocy and just asking for trouble!

It's not even on the last release of 8.0!  So the absolute minimum thing you should do is upgrade to the final 8.0 release (which won't have any impact in terms of supported APs)
https://software.cisco.com/download/home/282600534/type/280926587/release/8.0.152.0
There actually were a few escalation image engineering updates after 8.0.152.0 but without TAC support you won't be able to get those.  Depending on APs which need to be supported (check Compatibility Matrix link below) other options are:
https://software.cisco.com/download/home/282600534/type/280926587/release/8.3.150.0
https://software.cisco.com/download/specialrelease/9a6a7cf84f9fdf04b95c76e2ac7820e7 (8.5.182.12)

As this is being caused by web auth, one thing you can try which might help is to ensure that https redirect is disabled.
https://www.cisco.com/c/en/us/support/docs/wireless-mobility/wireless-lan-wlan/118826-config-https-webauth-00.html
It serves no useful purpose these days because 95%+ of OS and browsers will ignore the redirect due to certificate errors and also creates excessive load for the WLC.
See https://www.cisco.com/c/en/us/support/docs/wireless-mobility/wlan-security/115951-web-auth-wlc-guide-00.html
"In Version 8.0 and later, you can enable redirection of HTTPS traffic with the CLI command config network web-auth https-redirect enable. This uses a lot of resources for the WLC in cases where many HTTPS requests are sent. It is not advisable to use this feature before WLC version 8.7 where the scalability of this feature was enhanced. Also note that a certificate warning is unavoidable in this case. If the client requests any URL (such as https://www.cisco.com), the WLC still presents its own certificate issued for the virtual interface IP address. This never matches the URL/IP address requested by client and the certificate is not trusted unless the client forces the exception in their browser."
There's more info on the performance below that paragraph - bottom line is turn it off (if it's currently on).

View solution in original post

We found the trigger for the crash. Basically, when the WLC accumulated a very large list of dynamically excluded clients, and we accessed that list through the GUI, it crashed. To mitigate this, we can configure the deletion to last only a few days or automate a cleanup of this list every week.

View solution in original post

10 Replies 10

eglinsky2012
Spotlight
Spotlight

Oh, boy. 8.0.140.0 is forever burned in my mind, the first WLC code upgrade I did that had catastrophic results. I don't remember the details of the issue, but we had to do an emergency downgrade back to 8.0.121.0 soon after the upgrade to restore stability. That was a decade ago.

When did these crashes start, did any configuration or upstream network changes occur at the time?

You could obtain newer code by identifying a security vulnerability on the current version that's fixed in a newer version and requesting the latest version from TAC. Other users here will have mo insight on that. But you'd have to check the release notes of the new version (in this case, 8.5.x is the latest available) to make sure your AP models are compatible with it. For example, here's the list of APs supported in 8.5.182.0:

https://www.cisco.com/c/en/us/td/docs/wireless/controller/release/notes/crn85mr8.html#supported-aps

First of all, thanks for the answer. In recent years it has been working smoothly, recently they added a new SSID with central web auth with cisco ISE. I remember they tried to put it in the latest version, but they had to perform a downgrade on the same day for some reason. 

marce1000
Hall of Fame
Hall of Fame

 

  -  @nikolas-pereira    >...running version 8.0.140.0. The WLC is out of support and I cannot upgrade to a newer version, 
                                 For sure that is a completed 'dead end (of story)' situation ; aireos controllers are being phased out and if used must use the last release made available ;  and 8.0  is from the stone age....

  M.



-- Each morning when I wake up and look into the mirror I always say ' Why am I so brilliant ? '
    When the mirror will then always repond to me with ' The only thing that exceeds your brilliance is your beauty! '

Leo Laohoo
Hall of Fame
Hall of Fame

One of the biggest problems with AireOS in 8.X.X.X is the web process.  This is when people have left the web interface running for long periods of time (more than a whole day).  And that's only for one person.  If there are multiple people with an active browser to the controller running in their background it all adds up.  

I believe "EMWEB" is that process.  

Oh, and another thing:  Always save the config.  This is a controller with 8.0.X.X so there is this "feature" that if the running-config does not "synch" with the startup-config, the WLC process will get "stressed out" and can be one of the causes of a crash.  So make it a habit that whoever logs in to the controller MUST save the config before exit.  

Saikat Nandy
Cisco Employee
Cisco Employee

emweb was a pain in aireos for lifelong - be it any version of aireos. A quick bug scrub tells me 5508 is listed with 98 defects related to emweb. Please go for upgrade. There is nothing else can be done for emweb (although there is no guarantee that it won't come back again).

Rich R
VIP
VIP

In recent years it has been working smoothly, recently they added a new SSID with central web auth with cisco ISE.
Well I think you've answered your own question right there @nikolas-pereira - that's what has caused the instability!

As everyone has pointed out already - this is so badly out of support that adding any new features/APs/clients to this setup was sheer madness/idiocy and just asking for trouble!

It's not even on the last release of 8.0!  So the absolute minimum thing you should do is upgrade to the final 8.0 release (which won't have any impact in terms of supported APs)
https://software.cisco.com/download/home/282600534/type/280926587/release/8.0.152.0
There actually were a few escalation image engineering updates after 8.0.152.0 but without TAC support you won't be able to get those.  Depending on APs which need to be supported (check Compatibility Matrix link below) other options are:
https://software.cisco.com/download/home/282600534/type/280926587/release/8.3.150.0
https://software.cisco.com/download/specialrelease/9a6a7cf84f9fdf04b95c76e2ac7820e7 (8.5.182.12)

As this is being caused by web auth, one thing you can try which might help is to ensure that https redirect is disabled.
https://www.cisco.com/c/en/us/support/docs/wireless-mobility/wireless-lan-wlan/118826-config-https-webauth-00.html
It serves no useful purpose these days because 95%+ of OS and browsers will ignore the redirect due to certificate errors and also creates excessive load for the WLC.
See https://www.cisco.com/c/en/us/support/docs/wireless-mobility/wlan-security/115951-web-auth-wlc-guide-00.html
"In Version 8.0 and later, you can enable redirection of HTTPS traffic with the CLI command config network web-auth https-redirect enable. This uses a lot of resources for the WLC in cases where many HTTPS requests are sent. It is not advisable to use this feature before WLC version 8.7 where the scalability of this feature was enhanced. Also note that a certificate warning is unavoidable in this case. If the client requests any URL (such as https://www.cisco.com), the WLC still presents its own certificate issued for the virtual interface IP address. This never matches the URL/IP address requested by client and the certificate is not trusted unless the client forces the exception in their browser."
There's more info on the performance below that paragraph - bottom line is turn it off (if it's currently on).

We found the trigger for the crash. Basically, when the WLC accumulated a very large list of dynamically excluded clients, and we accessed that list through the GUI, it crashed. To mitigate this, we can configure the deletion to last only a few days or automate a cleanup of this list every week.

How long is the Client Exclusion timer?

was set to zero, so that the devices would remain on the list until the complaint reached support and they removed the MAC from the list. I set it to keep it for only 3 days now.


@nikolas-pereira wrote:
was set to zero

So this means that clients that failed to join the SSID DDoS-ed the controller until it crashed.

And if the SSID requires an external authentication server, continuous failed authentication can also hammer the authentication server to it's knees.

Review Cisco Networking for a $25 gift card