Solved: Re: Prime 3.10.2 Coral to 9800 WLC version 17.8/9/10.x reports down

Ethan Grinnell · ‎12-25-2022

After upgrading our 9800 WLC to 17.9.2, Prime has started indicating that the Coral connection is down. It was fine in 17.6.x. Prime is 3.10.2 with all of the latest patches. As far as I have been able to determine the actual telemetry stream is fine though. AP discovery completes with no issues, I have client details, etc.

This document and this bug both mention the command:

show telemetry internal protocol cntp-tcp manager x.x.x.x 20828

https://www.cisco.com/c/en/us/support/docs/wireless/catalyst-9800-series-wireless-controllers/214286-managing-catalyst-9800-wireless-controll.html

https://bst.cloudapps.cisco.com/bugsearch/bug/CSCvs40684

It appears that Prime uses that command in the "Catalyst 9800 Telemetry Coral Health" service check. However, as of 17.8.1+ that command requires additional arguments (It wants the vrf-id and mgmt IP). Prime doesn't use the VRF-aware command syntax though. I've tested with both 9800-80 and 9800-L-F models, I can't speak for any other 9800 model

wc9800-06#show telemetry internal protocol cntp-tcp manager x.x.x.x 20830 ?
  source-vrf  Source VRF

wc9800-06#
wc9800-06#show telemetry internal protocol cntp-tcp manager PrimeInfrastructure_IP 20830 source-vrf 0 WLC_MGMT_IP
Telemetry protocol manager stats:

Con str                : PrimeInfrastructure_IP:20830:0:WLC_MGMT_IP
Sockfd                 : 109
Protocol               : cntp-tcp
State                  : CNDP_STATE_CONNECTED
Table id               : 0
Wait Mask              : 
Connection Retries     : 0
Send Retries           : 0
Pending events         : 0
Source ip              : WLC_MGMT_IP
Bytes Sent             : 22288781
Msgs Sent              : 20578
Msgs Received          : 0
Creation time:         : Sun Dec 25 16:32:27:746
Last connected time:   : Sun Dec 25 16:32:27:747
Last disconnect time:  : 
Last error:            : 
Connection flaps:      : 0
Last flap Reason:      : 
Keep Alive Timeouts:   : 0
Last Transport Error   : No Error

As a test to see if this was the only thing that is affecting that service, I made an EEM script that translates from the original command to the new syntax. With that script in place Prime's "Catalyst 9800 Telemetry Coral Health" service is happy. So it seems it was that single command's syntax changing that broke it. To be clear, this doesn't affect telemetry actually functioning, it only impacts Prime's reporting of the telemetry status.

event manager applet telemManagerCmd authorization bypass
 event cli pattern "^show telemetry internal protocol cntp-tcp manager [[:digit:]\.]+ [[:digit:]]+$" enter
 action 000 set wlc_mgmt_ip" "Unknown"
 action 001 set wlc_mgmt_vrf_id "0"
 action 002 set pi_ip "Unknown"
 action 003 set pi_port "Unknown"
 action 004 regexp "([[:digit:]\.]+) ([[:digit:]]+)" "$_cli_msg" ignore pi_ip pi_port
 action 005 cli command "enable"
 action 006 cli command "terminal length 0"
 action 007 cli command "terminal width 0"
 action 008 cli command "show wireless interface summary"
 action 009 regexp "[[:alnum:]\/]+[[:space:]]+Management[[:space:]]+[[:digit:]]+[[:space:]]+([[:digit:]\.]+)" "$_cli_result" ignore wlc_mgmt_ip
 action 010 cli command "show telemetry internal protocol cntp-tcp manager $pi_ip $pi_port source-vrf $wlc_mgmt_vrf_id $wlc_mgmt_ip"
 action 011 puts "$_cli_result"

Is anyone else seeing this with 9800 WLC versions 17.8.1 and higher? I'm curious if there is something particular to our setup where it makes the older CLI command disappear or if Prime uses the VRF-aware command syntax somehow.

Ethan Grinnell · ‎07-11-2023

3.10.4 was released last week. I confirmed that it fixes the issue. They changed to the new VRF aware syntax:

show telemetry internal protocol cntp-tcp manager PrimeInfrastructure_IP 20830 source-vrf 0 WLC_MGMT_IP

View solution in original post

marce1000 · ‎12-25-2022

- FYI : https://bst.cloudapps.cisco.com/bugsearch/bug/CSCvt65754 , check if that could help for your current predicament too ,

M.

-- ' 'Good body every evening' ' this sentence was once spotted on a logo at the entrance of a Weight Watchers Club !

Ethan Grinnell · ‎12-26-2022

Thanks for the link.

I'm not sure how the Coral version ties into WLC version, it doesn't need to be a 1-to-1 correspondence though. I think that Coral is 17.9.1 in Prime 3.10.2.

balaji.bandi · ‎12-26-2022

Do you have any firewall between ? check these ports are allowed :

Cisco Prime Infrastructure to controller: TCP port 830 is used by Cisco Prime Infrastructure to push the telemetry configuration to the controller (using NETCONF).
Controller to Cisco Prime Infrastructure: TCP port 20828 is used for Cisco IOS-XE 16.10.x and 16.11.x, and TCP port 20830 is used for Cisco IOS-XE 16.12x, 17.1.x and later releases.

BB

***** Rate All Helpful Responses *****

How to Ask The Cisco Community for Help

Ethan Grinnell · ‎12-26-2022

There is a firewall in production, no interesting ports blocked.

In the lab I have them in the same network/VLAN and no firewall. Production Prime and WLC aren't in the same network.

balaji.bandi · ‎12-26-2022

that just a suggestion, not necessary to be that is a problem. since you mentioned it was working before the upgrade and having errors after the upgrade. this required some TAC involvement.

Personally, since you went to the latest version of Code - cisco moving towards DNAC deployment - rather focusing on Prime Infra. (just thinking some features may fading - may be ?)

BB

***** Rate All Helpful Responses *****

How to Ask The Cisco Community for Help

eglinsky2012 · ‎12-26-2022

Prime 3.10.3 is out. Not sure if this will be resolved, but it might be worth checking out if only for the long list of other bugs resolved.

Ethan Grinnell · ‎12-26-2022

Thanks, I'll give that a shot. Not sure how I missed that it came out last Thursday. I guess I need to confirm my notification settings.

Ethan Grinnell · ‎12-26-2022

No change. Oh well, it was worth doing anyway.

I opened a TAC case earlier this afternoon

balaji.bandi · ‎12-27-2022

Sure new PI released only bug fix, which was reported earlier, but TAC can assist you to fix the issue.

its worth posting back what TAC suggested for the benefit of wide community members.

BB

***** Rate All Helpful Responses *****

How to Ask The Cisco Community for Help

Rich R · ‎12-30-2022

Nice analysis and workaround @Ethan Grinnell

Make sure TAC open a bug for it to get it fixed.

------------------------------
Please click Helpful if this post helped you and Select as Solution (drop down menu at top right of this reply) if this answered your query.
------------------------------
TAC recommended codes for AireOS WLC's and TAC recommended codes for 9800 WLC's
Best Practices for AireOS WLC's, Best Practices for 9800 WLC's and Cisco Wireless compatibility matrix
Check your 9800 WLC config with Wireless Config Analyzer using "show tech wireless" output or "config paging disable" then "show run-config" output on AireOS and use Wireless Debug Analyzer to analyze your WLC client debugs
Field Notice: FN63942 APs and WLCs Fail to Create CAPWAP Connections Due to Certificate Expiration
Field Notice: FN72424 Later Versions of WiFi 6 APs Fail to Join WLC - Software Upgrade Required
Field Notice: FN72524 IOS APs stuck in downloading state after 4 Dec 2022 due to Certificate Expired
- Fixed in 8.10.190.0, latest 9800 releases, 8.5.182.11 (8.5 mainline) and 8.5.182.108 (8.5 IRCM)
Field Notice: FN70479 AP Fails to Join or Joins with 1 Radio due to Country Mismatch, RMA needed
How to avoid boot loop due to corrupted image on Wave 2 and Catalyst 11ax Access Points (CSCvx32806)
Field Notice: FN74035 - Wave2 APs DFS May Not Detect Radar After Channel Availability Check Time
Leo's list of bugs affecting 2800/3800/4800/1560 APs

jvodeb · ‎02-21-2023

@Ethan Grinnell thanks for workaround.

What did TAC say?

Ethan Grinnell · ‎02-24-2023

Still investigating internally

brian.smith · ‎03-20-2023

PI = Prime Infrastructure; 9800 is the Cisco 9800-CL (WLC switch)

I too have been running into issues with the Coral Services since upgrading to PI 3.10.x. TAC has always just had me stop and restart services to resolve.

Ethan Grinnell · ‎06-09-2023

This will probably be the fix:

https://bst.cloudapps.cisco.com/bugsearch/bug/CSCwc30033

TAC says: "There is not workaround (as of now), however, it will be fixed in 3.10.4. ETA for that version has been changing, it is expected to be released late June, early July"