Re: High ping latency to 8510 WLC

powys · ‎06-05-2019

We're having issues with high ping latency to our 8510 WLC running 8.3.143.0 (although we've tried rolling back to earlier versions without making a difference).

What we've tried:-
Changing firmware (rolling back to 8.2.x releases previously used and forward again) - no difference to performance but seems to have broken things with Cisco Prime Infrastructure, which now reports "Device is in encrypted mode" when we try to backup the WLC config

Switching between controllers (we have an active-standby HA pair) - no difference

Inserting a 2960-X switch between the controller and the Nexus core switch so we can ping "in the middle of the cable" to determine whether the latency is WLC-side or Nexus-side (it's WLC-side)

When the system's quiet the ping response drops to sub-1ms, but when it's misbehaving we'll see three-figure ping responses. The bulk of the traffic on these breaks out at the APs themselves so the 10Gbit interface isn't showing as particularly busy.

Does anyone have any suggestions we can try?

patoberli · ‎06-06-2019

Not sure I can help here, but from where to where do you ping?
Like: wireless client to virtual-interface

powys · ‎06-13-2019

Pinging from my wired desk connection to the Nexus core switch (fine)
Pinging from my wired desk connection to the intermediate 2960X installed for testing (fine)

Pinging from my wired desk connection to the WLC (high latency)

Pinging from the Nexus core switch to the intermediate 2960X (fine)

Pinging from the Nexus core switch to the WLC (high latency)

Pinging from the intermediate 2960X to the Nexus (fine)

Pinging from the intermediate 2960X to the WLC (high latency)

See below example (***.***.***.74 is the WLC, .1 is the Nexus), tested from the 2960X about a minute ago:-

WLCTestSwitch#ping ***.***.***.74
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to ***.***.***.74, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 737/867/975 ms

WLCTestSwitch#ping ***.***.***.1
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to ***.***.***.1, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/2/4 ms

UPDATE:

Half an hour later and it's back down to <1ms ping response.

patoberli · ‎06-14-2019

This could indicate an overloaded connection between the WLC and the switch it's attached to, or overloaded CPU on the WLC.
Please check, once it happens again, the used bandwidth of the wire and the CPU load.

powys · ‎06-14-2019

The 10Gbit connection to the WLC is showing 3/255 load inbound and outbound. The WLC itself shows 1% CPU usage and 33% memory usage.

patoberli · ‎06-14-2019

Now, or while the Ping latency was high?

It might also be a hung process on the WLC, but there I can't help.

powys · ‎06-14-2019

Both. The port and CPU usage seems constant regardless of whether the pings are poor or not.

The WLC pair (they're a HA pair) have had numerous reboots as part of their firmware upgrades, but won't have had a power-cycle for a while. I'll try shutting each down and restarting.

Scott Fella · ‎06-16-2019

You should look at the switch port counters for anything that might indicate an issue. What you can do since you have an HA SSO pair, is to issue a redundancy failover and see if there is high latency on the other unit. This will isolate if there is a problem with a link connection or module perhaps. During this failover, take some latency readings as the active unit will reboot and the secondary unit will become active. See if the latency goes away during the reboot and comes back when the units both sync or not. You can then provide this data back to us or to TAC if you have a case open. Basically you are isolating each controller to see if its reproducible on both or not on both.

-Scott
*** Please rate helpful posts ***

powys · ‎06-18-2019

We've already failed the pair over and identified this as an issue on both controllers.

I initially suspected it could be due to the core switch replacement (from a Catalyst 6500 to a Nexus 7000) hence the temporary 2960X inline with one of the controllers, but the results with the 2960X inline suggest it's at the controller end, not the Nexus end.

Last thing I tested was a power-cycle (watching the LAN interface lights following a failover request and pulling the plug as soon as they go out). Both controllers have had this.

It's behaving at the moment, if the latency ramps up later today I'll do the during-failover checks you suggested. Thanks.

powys · ‎06-20-2019

Failed over in the middle of a moderate (high 2-figure/low 3-figure ping) incident, it made no difference to the ping response even as it failed over.

Here are the port stats from the Nexus, freshly cleared 5 minutes ago:-

admin state is up, Dedicated Interface
Hardware: 1000/10000 Ethernet, address: 706e.6d**.**** (bia 706e.6d**.****)
Description: WLC-04 (Secondary) via 2960X
MTU 1500 bytes, BW 10000000 Kbit, DLY 10 usec
reliability 255/255, txload 3/255, rxload 2/255
Encapsulation ARPA, medium is broadcast
Port mode is trunk
full-duplex, 10 Gb/s, media type is 10G
Beacon is turned off
Auto-Negotiation is turned on
Input flow-control is off, output flow-control is off
Auto-mdix is turned on
Rate mode is dedicated
Switchport monitor is off
EtherType is 0x8100
EEE (efficient-ethernet) : n/a
Last link flapped 9week(s) 1day(s)
Last clearing of "show interface" counters 00:05:01
0 interface resets
Load-Interval #1: 30 seconds
30 seconds input rate 117574272 bits/sec, 23960 packets/sec
30 seconds output rate 118693504 bits/sec, 25719 packets/sec
input rate 117.57 Mbps, 23.96 Kpps; output rate 118.69 Mbps, 25.72 Kpps
Load-Interval #2: 5 minute (300 seconds)
300 seconds input rate 114731432 bits/sec, 23774 packets/sec
300 seconds output rate 116003008 bits/sec, 25503 packets/sec
input rate 114.73 Mbps, 23.77 Kpps; output rate 116.00 Mbps, 25.50 Kpps
RX
7230303 unicast packets 8398 multicast packets 984 broadcast packets
7239796 input packets 4374089576 bytes
0 jumbo packets 0 storm suppression packets
0 runts 0 giants 0 CRC/FCS 0 no buffer
0 input error 0 short frame 0 overrun 0 underrun 0 ignored
0 watchdog 0 bad etype drop 0 bad proto drop 0 if down drop
0 input with dribble 0 input discard
0 Rx pause
TX
7760531 unicast packets 1275 multicast packets 2198 broadcast packets
7764088 output packets 4421211334 bytes
0 jumbo packets
0 output error 0 collision 0 deferred 0 late collision
0 lost carrier 0 no carrier 0 babble 0 output discard
0 Tx pause

patoberli · ‎06-20-2019

Hmm weird, I would have expected the link to flap if the WLC gets failed over (and reboots):
Last link flapped 9week(s) 1day(s)

You sure look at the right port? (Sorry for asking this!)

powys · ‎06-20-2019

I was wondering if anyone would pick up on that :)
This is the secondary controller with the temporary 2960X inline. Ports been up ever since the 2960X went in.

I've just failed back to the primary controller (with link flap from the earlier failover), here are the port stats after 15 minutes:-
Ethernet3/22 is up
admin state is up, Dedicated Interface
Hardware: 1000/10000 Ethernet, address: 706e.6d**.**** (bia 706e.6d**.****)
Description: WLC-04 (Primary)
MTU 1500 bytes, BW 10000000 Kbit, DLY 10 usec
reliability 255/255, txload 2/255, rxload 2/255
Encapsulation ARPA, medium is broadcast
Port mode is trunk
full-duplex, 10 Gb/s, media type is 10G
Beacon is turned off
Auto-Negotiation is turned on
Input flow-control is off, output flow-control is off
Auto-mdix is turned on
Rate mode is dedicated
Switchport monitor is off
EtherType is 0x8100
EEE (efficient-ethernet) : n/a
Last link flapped 02:17:56
Last clearing of "show interface" counters 00:15:35
0 interface resets
Load-Interval #1: 30 seconds
30 seconds input rate 83163888 bits/sec, 17433 packets/sec
30 seconds output rate 85537392 bits/sec, 19152 packets/sec
input rate 83.16 Mbps, 17.43 Kpps; output rate 85.54 Mbps, 19.15 Kpps
Load-Interval #2: 5 minute (300 seconds)
300 seconds input rate 92726472 bits/sec, 19262 packets/sec
300 seconds output rate 94194336 bits/sec, 20736 packets/sec
input rate 92.73 Mbps, 19.26 Kpps; output rate 94.19 Mbps, 20.74 Kpps
RX
20469286 unicast packets 3907 multicast packets 3904 broadcast packets
20477058 input packets 12869085381 bytes
0 jumbo packets 0 storm suppression packets
0 runts 0 giants 0 CRC/FCS 0 no buffer
0 input error 0 short frame 0 overrun 0 underrun 0 ignored
0 watchdog 0 bad etype drop 0 bad proto drop 0 if down drop
0 input with dribble 0 input discard
0 Rx pause
TX
21776425 unicast packets 27153 multicast packets 9976 broadcast packets
21813444 output packets 12986331538 bytes
0 jumbo packets
0 output error 0 collision 0 deferred 0 late collision
0 lost carrier 0 no carrier 0 babble 0 output discard
0 Tx pause

patoberli · ‎06-20-2019

As expected, interface looks fine.
I would suggest you upgrade to 8.3.150.0 (fixes a lot of security issues and an important flash bug with the 2700+3700 series) or open a TAC.
Release notes: http://www.cisco.com/c/en/us/td/docs/wireless/controller/release/notes/crn83mr5.html#resolved_caveats