cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
2228
Views
11
Helpful
8
Replies

Cisco DNA Center SWIM using HTTPS does not work on Cat9500

dominikhug
Level 1
Level 1

Hi all

I have an issue with deploying images to Catalyst 9500-24Y4C using SWIM.

Some background: I have added two C9500-24Y4C manually configured into DNAC using device discovery. This went fine and I was able to upgrade them using SWIM to IOS-XE version 17.6.3 without any issues. Since IOS-XE 17.6.4 is now suggested release for use with SDA, I wanted to upgrade both switches to mentioned version but image deployment using HTTPS does not work.

Using EEM catchall script which shows me all commands entered on the switch I was able to see that there is no copy-command sent by DNAC to the switch. When I try the same on a Cat9300, I can clearly see that DNAC sends a copy-command to copy the image:

001041: Oct 20 12:50:32.645 CEST: %HA_EM-6-LOG: catchall: copy https://DNAC/api/v1/file/temporary/ae258075-360f-41af-99cb-3c48c8f526a9 flash:cat9k_iosxe.17.06.04.SPA.bin

 

Using SCP as transfer method, the image was deployed successfully.

 

Image has been added manually by uploading it to DNAC since DNAC is running in air-gapped mode.

Has someone of you ever experienced something similar and has a idea how to solve this?

DNAC version is 2.2.3.6

8 Replies 8

dominikhug
Level 1
Level 1

dominikhug_0-1666610361169.png

 

Hi @dominikhug 

I've noticed that you dont see DNAC login to the device and execute the 'copy https or scp' commands (using the catchall eem script) if the device is configured to use netconf. Instead, shortly after starting the image distribution, I've noticed that DNAC connects to the device using netconf over ssh so I can only assume that DNAC is executing the relevant commands in netconf to copy the image to the device (and the commands to unpack and install the image etc.) which may explain why we dont see anything in the logs. 

I've just tested with one of my spare 9300 switches to confirm this, and when DNAC is configured to connect to the switch using netconf, no copy command is observed in the logs during image distribution, however if I disable netconf and distribute the image again, I can see DNAC login to the switch and execute the copy command as follows

copy https://<dnac ip>/api/v1/file/temporary/2a4ad238-8b1c-42da-90d8-e22f238ed570 flash:cat9k_iosxe.17.06.04.SPA.bin 

Can you check if the Cat9500 is configured to use netconf and if the Cat9300 is not? If so, this may explain what you are observing

As to why DNAC is using SCP instead of HTTP for image distribution - I've experienced a similar issue before when the HTTPs file transfer check fails on the target device, which in my case was due to a certificate issue. If you navigate to Provision -> Inventory select the Catalyst 9500 and then select Actions -> Software Image -> Image Update and then select 'Update Readiness Report' do you see any errors next to 'File Transfer Check' such as the following.

willwetherman_0-1666655033340.png

In my case, the DNAC-CA trustpoint was missing on the switch. To fix this had to force update the telemetry settings on the switch by selecting the switch in the inventory and then selecting Actions -> Telemetry -> Update Telemetry Settings. Select 'Force Configuration Push' and then update.

Hope this helps

Will

EDIT

For info - you can test if HTTPs file transfer is working from DNAC to the Catalyst 9500 by executing the following command on the switch (this is what DNAC does when running the SWIM readiness check). If this fails with (I/O error) then you may have a certificate issue that needs to be checked.

copy https://<dnac ip>//core/img/cisco-bridge.png null:

 

Hi Will

Thank you for your reply and the tests you've made.

Because of your inputs I've run the same tests you did. My results confirmed your results:

When NETCONF on the mentioned Cat9500 is disabled (i.e. NETCONF port is removed from device setting) I can see the copy command is sent by DNAC:
001920: Oct 25 10:07:31.920 CEST: %HA_EM-6-LOG: catchall: copy https://DNAC/api/v1/file/temporary/ee99d5a3-6556-46d9-b440-fadf41ced78b bootflash:cat9k_iosxe.17.06.03.SPA.bin
I can even see the file on flash: is getting biger, so file transfer works as expected.

Interestingly, my Cat9300 still have the NETCONF port added in their settings and SWIM works like a charm. The only difference between the Cat9300 and Cat9500 is, that Cat9300 have been added by LAN automation and Cat9500 have been manually imported. Could this somehow lead to a problem?

In the catchall log it looks like this when NETCONF is enabled:

001007: Oct 25 08:41:06.485 CEST: %HA_EM-6-LOG: catchall: enable
001008: Oct 25 08:41:06.536 CEST: %HA_EM-6-LOG: catchall: terminal length 0
001009: Oct 25 08:41:06.553 CEST: %HA_EM-6-LOG: catchall: terminal width 0
001010: Oct 25 08:41:06.568 CEST: %HA_EM-6-LOG: catchall: show install summary
001011: Oct 25 08:41:06.605 CEST: %HA_EM-6-LOG: catchall: terminal width 0
001012: Oct 25 08:41:06.640 CEST: %HA_EM-6-LOG: catchall: dir bootflash:cat9k_iosxe.17.06.03.SPA.bin
001013: Oct 25 08:41:06.705 CEST: %HA_EM-6-LOG: catchall: dir /recursive bootflash:
001014: Oct 25 08:41:15.170 CEST: %HA_EM-6-LOG: catchall: dir /all bootflash:
001015: Oct 25 08:41:15.466 CEST: %HA_EM-6-LOG: catchall: show file systems
001016: Oct 25 08:41:23.499 CEST: %HA_EM-6-LOG: catchall: dir /all bootflash:
001017: Oct 25 08:41:23.770 CEST: %HA_EM-6-LOG: catchall: show file systems
001018: Oct 25 08:41:23.806 CEST: %HA_EM-6-LOG: catchall: show install summary
001019: Oct 25 08:41:23.864 CEST: %HA_EM-6-LOG: catchall: show platform software yang-management process
001020: Oct 25 08:41:24.021 CEST: %HA_EM-6-LOG: catchall: show ip http server status
001021: Oct 25 08:41:24.112 CEST: %DMI-5-AUTH_PASSED: R0/0: dmiauthd: User 'dnac-user' authenticated successfully from dnac-ip:51392 for netconf over ssh. External groups: PRIV15
001022: Oct 25 08:41:32.488 CEST: %SSH-5-SSH2_SESSION: SSH2 Session request from dnac-ip (tty = 2) using crypto cipher 'aes256-ctr', hmac 'hmac-sha2-256' Succeeded
001023: Oct 25 08:41:32.512 CEST: %SEC_LOGIN-5-LOGIN_SUCCESS: Login Success [user: dnac-user] [Source: dnac-ip] [localport: 22] at 08:41:32 CEST Tue Oct 25 2022
001024: Oct 25 08:41:32.512 CEST: %SSH-5-SSH2_USERAUTH: User 'dnac-user' authentication for SSH2 Session from dnac-ip (tty = 2) using crypto cipher 'aes256-ctr', hmac 'hmac-sha2-256' Succeeded
001025: Oct 25 08:41:32.541 CEST: %HA_EM-6-LOG: catchall: enable
001026: Oct 25 08:41:32.589 CEST: %HA_EM-6-LOG: catchall: terminal length 0
001027: Oct 25 08:41:32.605 CEST: %HA_EM-6-LOG: catchall: terminal width 0
001028: Oct 25 08:41:32.656 CEST: %HA_EM-6-LOG: catchall: dir /all bootflash:

The most important line in my opinion is following:
001021: Oct 25 08:41:24.112 CEST: %DMI-5-AUTH_PASSED: R0/0: dmiauthd: User 'dnac-user' authenticated successfully from dnac-ip:51392 for netconf over ssh. External groups: PRIV15

After that DNAC regularly sends a "dir /all bootflash:" command to check the status of the file transfer. Since there is no transfer running, there is no file.


Regarding the other points you've mentioned:

DNAC-CA trustpoint:
All devices have the DNAC-CA trustpoint configured, nevertheless I've updated telemetry settings on all nodes.

HTTPS file transfer test:
File transfer over HTTPS works and can also be testet using Update Readiness Report.

To summarize: It seems when NETCONF is enabled on Cat9500 the file does not get copied from DNAC to the switch and it fails after 10 minutes. As soon as NETCONF port is removed vom device setting, the file gets copied as expected and SWIM works.

Dominik

Ok that is interesting. If the image distribution is failing to the Catalyst 9500 when netconf is enabled, and is working when netconf is disabled, then I can only assume that you are hitting a bug with the 9500. When you run look at the 'Image Update Status', do you see any errors under Distribution similar to the below?

willwetherman_0-1666691685010.png

I got a similar error like on your screenshot:

dominikhug_0-1666692954022.png

But it's not very informative.

I went into Activities - Audit Logs and found a proof that DNAC want's to initiate file transfer using NETCONF:

dominikhug_1-1666693041517.png

After about 8 minutes I'll get the error "HTTPS.Image copy is not progressing".

I think I'll raise a TAC request to investigate this.

 

TAC was going to be my next suggestion. Hopefully this has helped to narrow down the issue. Please post back if you find a fix for this

Hi Will

Finally an update. I did a lot of troubleshooting, packet captures and RCA together with TAC. TAC was not able to finde the cause and assigned a bug id to this problem:

CSCwe17731

I think I'll have some more TAC sessions

dominikhug
Level 1
Level 1

SR has finally been closed.

It seems like if SWIM is starting file transfer over NETCONF, the switch uses the DNAC-facing-interface which is part of the underlay. That means it does not consider "ip http client source-interface Loopback0". Since that is not what I'd expect, this DNAC-facing-interface-IP was not included in the firewall-rules and was blocked. Because I was always assuming the traffic would originate from Lo0, I did not search the FW-logs for the DNAC-facing-interface-IP.
Customer now added this IP to the rule, now everything is working as expected.

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community:

Innovations in Cisco Full Stack Observability - A new webinar from Cisco