02-03-2021 05:21 AM - edited 07-05-2021 01:10 PM
Hi,
I'm seeing the exact behaviour referenced in CSCvb85177 with a pair of 1532E APs attempting to stay joined to an existing mesh of LAP1142s.
WLC 8.2.170 at remote datacentre, L3 discovery required
Discovery by DNS
1x 1142 RAP [Flex-bridge]
4x 1142 MAP [Flex-bridge]
1x 1142 non-mesh [Flexconnect]
2x 1532 MAP [Flex-bridge]
The network of 1142s has been stable for a year or so, no unusual issues. I've moved the 5ghz backhaul from the indoor-only (GB) channel 44 to 40mhz 108 (indoor-outdoor) across all mesh APs. Absolutely nothing on those channels in this location, DFS detects nothing, and surveying can't hear anything.
Symptoms are that the 1532s will select a parent, DHCP, find the controller and be happy. This lasts for anywhere between 8 and 24hrs so far. At some point in that timeframe, the 1532 will report that its parent has gone down, flap a bit between 'down' and 'control' states on the backhaul radio, then reselect the same parent which it'd lost. At that point, the mesh adjacency looks correct, parent 1142 is happy, 1532 is happy as a child, keys installed and running.
Mesh adjacency from RAP:
INST-LAP1142-hall#show mesh adjacency child show MESH Adjacency Child ADJ 2 Identity 84b8.02ac.ff42 MA: 8890.8d92.d8ef ver 0x20 minver 0x0 on device Dot11Radio:1 txpkts 616416 txretries 22182 Flags: CHILD BEACON worstDv 255 Ant 0, channel 108, biters 0, ppiters 10, fwd_state 3 Numroutes 0, snr 0, snrUp 37 snrDown 0 linkSnr 0 blistExp 3 bliters 0 adjustedEase 0 unadjustedEase 0 stickyEase 0 txParent 0 rxParent 0 BGN newnham Vector through 84b8.02ac.ff42: Per antenna smoothed snr values: 0 0 0 0 Subordinate neighbors: 84b8.02ac.ff42 Hop-Count Extension: ON, Version: 1
Mesh adjacency from 1532MAP:
INST-CAP1532-OUTDOOR#show mesh adjacency parent show MESH Adjacency Parent ADJ 4 Identity c47d.4f39.f4fe MA: c47d.4f52.d1ff ver 0x20 minver 0x20 on device Dot11Radio:1 txpkts 13684 txretries 2394 Flags: UPDATED NEIGH PARENT BEACON worstDv 0 Ant 0, channel 108, biters 0, ppiters 10, fwd_state 3 Numroutes 1, snr 0, snrUp 49 snrDown 37 linkSnr 34 blistExp 3 bliters 0 adjustedEase 6248576 unadjustedEase 6248576 stickyEase 8648576 txParent 9469 rxParent 5411 Authentication: EAP, Encryption: AES-CCMP, Fwd-state: OPEN/CONTROL BGN newnham Vector through c47d.4f39.f4fe: Vector ease 1 -1, FWD: c47d.4f39.f4fe Per antenna smoothed snr values: 33 0 0 0 Hop-Count Extension: ON, Version: 1
Mesh and other status on MAP:
INST-CAP1532-OUTDOOR#show mesh status show MESH Status MeshAP in state Maint Uplink Backbone: Virtual-Dot11Radio0, hw Dot11Radio1 Configured BGN: newnham, Extended mode 0 Children: Not accept child No.of Children: 0 rxNeighReq 67348 rxNeighRsp 94721 txNeighReq 58769 txNeighRsp 67348 rxNeighRsp 588146 txNeighUpd 148618 nextchan 0 nextant 0 downAnt 0 downChan 0 curAnts 0 nextNeigh 1, malformedNeighPackets 0,poorNeighSnr 431 excludedPackets 0,insufficientMemory 0, authenticationFailures 0 Parent Changes 3, Neighbor Timeouts 4 Vector through c47d.4f39.f4fe: Vector ease 1 -1, FWD: c47d.4f39.f4fe Authentication Failure statistics Child MAC No.Of PSK Failures Preferred Parent 0000.0000.0000 --- INST-CAP1532-OUTDOOR#show capwap ip config LWAPP Static IP Configuration IP Address 172.20.0.105 IP netmask 255.255.255.0 Default Gateway 172.20.0.1 --- INST-CAP1532-OUTDOOR#show capwap client rcb AdminState : ADMIN_ENABLED SwVer : 8.2.170.0 NumFilledSlots : 2 Name : INST-CAP1532-OUTDOOR Location : default location MwarName : wlc-cov-greenferret MwarMacAddr : 5ee5.4fac.0000 MwarHwVer : 0.0.0.0 ApMode : Flexconnect+Bridge ApSubMode : Not Configured OperationState : DISCOVERY CAPWAP Path MTU : 1485 Link-Encryption (AP) : Disabled Link-Encryption (MWAR) : Disabled Prefer-mode : IPv4 LinkAuditing : disabled ApRole : MeshAP ApBackhaul : 802.11a ApBackhaulChannel : 0 ApBackhaulSlot : 3 ApBackhaul11gEnabled : 0 ApBackhaulTxRate : 0 Ethernet Bridging State : 0 Daisy Chaining State : Disabled Public Safety State : disabled AP Rogue Detection Mode : Enabled AP Tcp MSS Adjust : Disabled Predownload Status : None Auto Immune Status : Disabled RA Guard Status : Enabled Efficient Upgrade State : Disabled Efficient Upgrade Role : None TFTP Server : Disabled Antenna Band Mode : Dual Band Universal AP Priming mode : Unprimed 802.11bg(0) Radio ADMIN State = ENABLE [1] OPER State = UP [2] CONFIG State = UP [2] HW State = UP [4] Radio Mode : Flexconnect+Bridge GPR Period : 10 Beacon Period : 100 DTIM Period : 0 World Mode : 1 VoceraFix : 0 Dfs peakdetect : 0 Fragmentation Threshold : 2346 Current Tx Power Level : 3 Current Channel : 13 Current Bandwidth : 20 802.11a(1) Radio ADMIN State = ENABLE [1] OPER State = UP [2] CONFIG State = UP [2] HW State = UP [4] Radio Mode : Flexconnect+Bridge GPR Period : 10 Beacon Period : 100 DTIM Period : 0 World Mode : 1 VoceraFix : 0 Dfs peakdetect : 1 Fragmentation Threshold : 2346 Current Tx Power Level : 1 Current Channel : 108 Current Bandwidth : 40
However, it then tries its (recently attempted) static address, but the gateway fails to be reachable, causing it to fallback to DHCP. It never receives a DHCP response, and so sits there offline, awaiting an IP.
The issue in the bug is described as a DHCP issue, which is what it initially appears like. However, with packet capture on the wire from the RAP, it's clear that the DHCP response to the 1152MAP is being sent from the DHCP server back towards it, but never being received by the MAP. After adding a static IP to avoid this DHCP problem, the AP arps out for its gateway, and again the response from the gw is visible on the wire, but never received by the MAP.
Bridge forwarding tables on the MAP and RAP both show 0 packets for the relevant MACs.
Capture from RAP here, showing the 1532 continually trying to ARP out for its gateway:
What do we think is going on here then?
My current guess is that there's some incompatibility between the 1532 and the 1142. The referenced bug has two support cases noted, but no workaround or fix.
Any help appreciated!
Cheers, Paul
02-07-2021 07:16 AM
I haven't got very far with the 1532 access points still.
The two behave differently, which is somewhat odd given they have very close serial numbers and were made in the same week; same software (obviously), same bootloader, same revision, same antennas.
So far, I've flattened them properly again, by booting into rommon and deleting everything but the image from flash, then booting them off-controller and doing a capwap clear, then letting them join over the wire and reconfiguring.
Both seem sensitive to dropping off the network and getting into weird states if there's other mesh APs coming and going on the network.
One goes into a flashing green state and *sometimes* rejoins correctly. The other goes into a flashing red state, and is the source of the debug output in my first post, where it refuses to receive packets. I can't see the flashing red is a relevant output - the docs say that's 'ethernet link down' which it clearly is, as it's a Mesh AP. Anyone seen that on one of these? Video below.
I've replaced the RAP with a 3702, from the original 1142. The 3702/1532 Cisco state is an approved mesh solution in the docs. That hasn't made any difference.
I'm not understanding why the 1142 mesh works fine; any AP can be rebooted or drop off due to marginal signal and rejoin without getting into a state. These 1532 seem incredibly unreliable and flaky. Sigh.
Anyone think that power might be an issue here? They're DC powered from 48v PSUs which can supply 24w, which is only just above the requirements. I've got a pair of 120w PSUs arriving to try.
While I'm asking for help - anyone else noticed that Prime (3.9 in this case) can't pull mesh link SNR from the Controller / APs ? I get 0db and blank graphs in Prime. Looking manually on the Neighbour Details on the Controller shows the link SNR correctly. Saw one report of it being an SNMP bug, but nothing else.
Ta, PC
02-09-2021 02:56 AM
Somewhat talking to myself here, but it might help someone else seeing the same issue with Prime; the controller returns zeros for mesh link SNR in the mesh neighbour table via SNMP, which is presumably why Prime reports it as zero.
Odd that it hasn't been seen more often.
root@host:/home/paul# snmpwalk -v2c -c redacted 94.229.79.172 1.3.6.1.4.1.9.9.616.1.3.1.1.3 iso.3.6.1.4.1.9.9.616.1.3.1.1.3.8.204.104.138.242.240.124.14.206.11.94.224 = INTEGER: 0 iso.3.6.1.4.1.9.9.616.1.3.1.1.3.8.204.104.138.242.240.136.144.141.146.216.224 = INTEGER: 0 iso.3.6.1.4.1.9.9.616.1.3.1.1.3.8.204.104.138.242.240.136.144.141.146.218.48 = INTEGER: 0 iso.3.6.1.4.1.9.9.616.1.3.1.1.3.124.14.206.11.94.224.8.204.104.138.242.240 = INTEGER: 0 iso.3.6.1.4.1.9.9.616.1.3.1.1.3.124.14.206.11.94.224.136.144.141.146.216.224 = INTEGER: 0 iso.3.6.1.4.1.9.9.616.1.3.1.1.3.124.14.206.11.94.224.196.125.79.82.209.240 = INTEGER: 0 iso.3.6.1.4.1.9.9.616.1.3.1.1.3.136.144.141.146.216.224.8.204.104.138.242.240 = INTEGER: 0 iso.3.6.1.4.1.9.9.616.1.3.1.1.3.136.144.141.146.216.224.124.14.206.11.94.224 = INTEGER: 0 iso.3.6.1.4.1.9.9.616.1.3.1.1.3.136.144.141.146.216.224.136.144.141.146.218.48 = INTEGER: 0 iso.3.6.1.4.1.9.9.616.1.3.1.1.3.136.144.141.146.218.48.8.204.104.138.242.240 = INTEGER: 0 iso.3.6.1.4.1.9.9.616.1.3.1.1.3.136.144.141.146.218.48.136.144.141.146.216.224 = INTEGER: 0
I've connected the most flaky 1532, this morning, to a new 120w 48v PSU. Uptime before going weird has been variable; it can do 15hrs then 'crash' twice in an hour. Let's see if more power is helpful.
Cheers, PC
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide