03-29-2023 02:19 AM
Hello Cisco WLAN Experts,
today I did an upgrade on our central 9800-80-WLC from 17.3.5a to 17.3.7.
After the upgrade the following Event-Message appeared several times in the Gui:
Chassis 1 R0/0: ncsshd_bp: NETCONF/SSH: fatal: mm_answer_sign: Xkey_sign failed: error in libcrypto
Also on our Prime I noticed that some of the APs that are connected to the 9800-80-WLC are reported
AP `xyz-123' disassociated from Controller 9800-80
I did already a Sync on Prime for the 9800-80, but the APs are still reported as being "Not Registered"
and Last Reboot Reason "Image Upgrade Success".
Did also a Reset on one of these WLAN-APs without improvement.
Who knows more about the Event-Message and the Prime problem ?
Thank You for any hints and Tipps.
Kind regards
Wini
03-29-2023 02:51 AM
Looks for me Bug - but before we go in to bug
what is the reason of upgrade ?
On WLC cat 9800 do you see AP associated ?
03-29-2023 02:52 AM
Most of the time you can Google the message to get an idea of the issue. If you can’t find anything then opening a TAC case is your best source. As far as Prime, typically that issue might be with the version and some compatibility issue. I have spun up a new PI 3.10 when I added the 9800’s never really tried with a lower version than that. If you have the resources, spin a new PI 3.10 and add the controller and see if the issue goes away. At least then you know the answer.. compatibility.
03-29-2023 03:59 AM
- It's probably similar to this one : https://bst.cisco.com/bugsearch/bug/CSCvt43974 . I would try to go beyond 17.3.x such as 17.9.3 (if you still have Wave1 series APs and need support for them) ,
M.
03-29-2023 03:38 PM
Agreed with Marce - no excuse to stay on 17.3 now that 17.9.3 supports the wave 1 ac APs.
Note 17.3 goes end of software maintenance in 2 days time and no more security fixes after September!
https://www.cisco.com/c/en/us/products/collateral/ios-nx-os-software/ios-xe-17/ios-xe-17-3-x-eol.html
03-31-2023 01:03 AM
Hello,
thank You for Your recommendation to use version 17.9.3 instead.
But this version does not support Cisco Prime 3.8.1 according to Your SW-compatibility-matrix. It shows Cisco Prime 3.10.2 instead. Would You still recommend to use 17.9.3 in this case ?
Today I digged a little deepe into our Prime 3.8.1. It now shows a failed telemetry status for the 9800-80-WLC.
I tried opened also a TAC. But our service provider missed to add the serial numbers of our 9800-WLCs to the service contract.
What a mess !! Who can help with Telemetry between a 9800 and Prime ?
It is intersting to note, that all 160 WLAN-APs are working on the 9800. Exatly 16 of them, all 3800-APs are reported to be gone on the Prime system.
And how can I rollback the software to 17.3.5a on a 9800-WLC?
Thank You for Your tipps
Greetings
Wini
03-31-2023 01:30 AM
>...But this version does not support Cisco Prime 3.8.1 according to Your SW-compatibility-matrix. It shows Cisco Prime 3.10.2 instead. Would You still recommend to use 17.9.3 in this case ?
Since Prime is kind of an ending product (line) , I would always advice to use it's latest version , it will support 17.9.3
M.
03-31-2023 03:19 AM - edited 03-31-2023 04:01 AM
Hello Marce,
thank You for Your advice to use the latest Prime version. To setup a new one in Version 3.10 will take me weeks of burden.
I simply have trusted Cisco that 9800-80-HA using 17.3.7 will work with Prime 3.8.1 according to SW-compatibilty-matrix
to solve a Field-notice problem with new 9103 V03 not joining the WLC due to new drivers from new Cisco suppliers.
But apparently it does not. It seems to me that after the upgrade, the telemetry link is broken!!
I also cannot find any telemetry-info in the config regarding the connection to our Prime anymore.
For example, this block and many more, is missing after the update:
telemetry ietf subscription 113891536
encoding encode-tdl
filter tdl-transform BsnMobileStationStats
stream native
update-policy periodic 90000
receiver ip address a.b.c.d 20830 protocol cntp-tcp
telemetry ietf subscription 117267423
encoding encode-tdl
filter tdl-transform LradIfChannelNoise
stream native
update-policy periodic 180000
receiver ip address 10.200.67.67 20830 protocol cntp-tcp
The check also shows that Telemetry has been gone after the SW-upgrade:
WLC-9800#show telemetry internal protocol cntp-tcp manager 10.200.67.67 20830 protocol cntp-tcp source-address 10.222.126.4
% Error: Connection '10.200.67.67:20830::10.222.126.4' doesn't exist
How can i make telemetry work agian after the SW-Upgrade ?
We are using a HA-setup by the way. After the upgrade, we are running on the former standby now.
Does it makes sense to force a failover ?
Will the second machine do a better job ?
Please check and come back with information.
Kind regards
Wini
03-31-2023 03:38 AM
Prime-WLC problems often require a delete and re-add.
Note 3.8 is approaching EOL so you need to be planning the move to 3.10 anyway - software maintenance ending 15 July 2023:
https://www.cisco.com/c/en/us/products/collateral/cloud-systems-management/prime-infrastructure/prime-infrastructure-pids-pb.html
And the entire product is going EOL - you'll need to use 3.10 until then because there won't be any new major releases and software maintenance only till 28 September 2024:
https://www.cisco.com/c/en/us/products/collateral/cloud-systems-management/prime-infrastructure/prime-infrast-gen-appliance-lic-eol.html
03-31-2023 05:33 AM - edited 03-31-2023 05:34 AM
Hello Marce,
in the meantime I found Bug 9800 Controller telemetry failure issue CSCvx46784
Symptom: Prime Infrastructure may fail to collect telemetry data from a managed Catalyst 9800 wireless LAN controller. On closer inspection, the "tdlcold" process in the Coral service became stuck in the CNDP_STATE_CON_INIT state. When the Coral container was restarted, the eWLC controller was able to reconnect and the Coral service's state was able to transition to CNDP_STATE_CON_CONNECTED. Conditions: This was observed in Prime Infrastructure 3.8, managing a Catalyst 9800 wireless LAN controller.
Can someone please tell me how to restart a Coral Container on a Prime?!?
What a mess
Wini
03-31-2023 08:11 AM - edited 03-31-2023 08:21 AM
Can someone please tell me how to restart a Coral Container on a Prime?!?
Ref : https://www.cisco.com/c/en/us/support/docs/wireless/catalyst-9800-series-wireless-controllers/214286-managing-catalyst-9800-wireless-controll.html
>...Note: On Prime 3.8, Coral service can be restarted outside of container using 'sudo /opt/CSCOlumos/coralinstances/coral2/coral/bin/coral restart 1'
Appendix : below are a number of other useful commands for debugging telemetry
show telemetry ietf subscription all
show telemetry ietf subscription 23 detail (e.g.)
show telemetry internal subscription all stats
show telemetry internal connection <0-4294967294> detail
show telemetry ietf subscription configured
M.
06-06-2024 02:51 AM
Restart the coral service has solved the issue on my side. Thx.
04-02-2023 11:39 PM - edited 04-02-2023 11:57 PM
Hello marce1000,
thank You very much for Your explanations and commands. I have restarted the Coral container on Prime but it did not help.
When I run the command "show telemetry ietf subscription configured" on the 9800-80 I see only four IDs:
WLC-9800#show telemetry ietf subscription configured
Telemetry subscription brief
ID Type State Filter type
--------------------------------------------------------
124 Configured Valid tdl-uri
125 Configured Valid transform-name
126 Configured Valid transform-name
127 Configured Valid tdl-uri
These entries are used by DNA-Space and point to our DNA Spaces connector.
When I do this on the 9800-L-C-Guest-WLC, running still on 17.3.6, I can see a big list of entries like:
WLC-9800-Guest#show telemetry ietf subscription configured
Telemetry subscription brief
ID Type State Filter type
--------------------------------------------------------
102687940 Configured Valid tdl-uri
147114719 Configured Valid transform-name
221543406 Configured Valid transform-name
276706442 Configured Valid transform-name
310032319 Configured Valid transform-name
......
Looking into the detail of these IDs, I can find the IP of our Prime.
So in principal, Telemetry is running on our Prime towwrds the 9800-L-C-Guest-WLc with 17.3.6 software.
But no longer to the upgraded 9800-80-HA running 17.3.7 now!
As already said, the whole telemetry-block for connections to our Prime is missing in the running-config of the 9800-80.
Can You explain to me please, how this big block of commands is established in the running config ?
Can I copy this block of commands from my backup-config of the 9800-80 manually into the running-config instead ?
The 9800-80 has changed it's active unit after the issu-upgrade. Does it make sense to test a "redundancy force-failover" to change the active unit to the unit which was active unit before the SW-Upgrade to 17.3.7 ?
Will this bring back telemetry function ?
I also tried the Telnet-connection in comparison to the working 9800-L-C for Guest:
WLC-9800-Guest#telnet Prime-IP 20830
Trying Prime-IP, 20830 ... Open
[Connection to Prime-IP closed by foreign host]
WLC-9800#telnet Prime-IP 20830
Trying Prime-IP, 20830 ... Open
Keeps hanging in Open
From Prime to the non-working 9800-80 and working 9800-L-C
prime1/admin# ssh 9800-IP admin port 830
protocol identification string lack carriage return
Connection closed by 9800-IP port 830
prime1/admin#
prime1/admin#ssh 9800-L-C-IP admin port 830
protocol identification string lack carriage return
admin@9800-L-C-IP's password:
<?xml version="1.0" encoding="UTF-8"?>
<hello xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<capabilities>
.......
Please check and advise
Kind regards
Wini
04-03-2023 02:10 AM
>...The 9800-80 has changed it's active unit after the issu-upgrade. Does it make sense to test a "redundancy force-failover" to change the active unit to the unit which was active unit before the SW-Upgrade to 17.3.7 ?
Don't think that will help , and it's not advisable you may want to try 17.9.3 from which you can directly upgrade when on 17.3.x and also has the support for Wave 1 series APs again available ,
M.
04-03-2023 04:28 AM - edited 04-03-2023 04:31 AM
Hello Marce1000,
thank You for Your advise. But Your recommended version 17.9.3 does not support Wave 1 2702-APs, of which we have around 600 pieces in our zoo. Also our Prime 3.8.1 is not supported by 17.9.3 !! Therefore the upgrade to 17.9.3 is not a good idea in my opinion.
By the way, the complaining Process on the 9800-80 is called "ncsshd_bp".
I would like to debug this process, but cannot find it using the command
WLC-9800#set platform software trace ?
Who knows how I can do a Per-process-Debugging this process ?
I also havent't found the Alarm-Message in the Error-and-Messages-Guide for 9800-WLCs:
%DMI-2-NETCONF_SSH_CRITICAL: Chassis 1 R0/0: ncsshd_bp: NETCONF/SSH: fatal: mm_answer_sign: Xkey_sign failed: error in libcrypto
What does this mean and how can I solve this Telemetry-mistake?
Thank You for any tipps.
Kind regards
Wini
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide