Switch Upgrade in SD-Access environment

AminK · ‎11-02-2025

Hello Community.

I have an SD-Access network with approximately 50 Catalyst switches (C9500, C9300, C9200). When these switches first provisioned in SD-Access with DNA, They had IOS-XE version 17.03.03. Now due to security requirements I have to upgrade these switches. First I upgraded my DNA Center to gold version (v 2.3.7.9 at the time of writing-Catalyst Center) and then I started to upgrade my switches. I can upgrade my switches in two ways

1) Upgrade manually by uploading bin file from TFTP and reload, and then re-provision with Catalyst Center.

2) Upgrade with Catalyst Center and then provision switch afterwards.

Now no matter what method I choose and no matter what version of IOS-XE I choose, (17.12.05 or 17.15.03 currently gold), I run into some weird problems.

Switch boots up normal without any error and isis, and LISP comes up without problem. But for example:

1) Switch upgraded and after booting up, One client in switch can't see another client in same switch and same VLAN. They need to communicate with each other. I can see both of them from my switch in another floor. Downgrade to 17.3.3 solved the problem.

2) Switch upgraded and after booting up, One VRF completely stopped working on switch. They just couldn't see their gateway which was up with no problems. All other VRFs was working without problem. Downgrade solved the problem.

I cleared lisp caches like eid-table and tried to refresh them with no luck. I searched the internet with no luck. So if anyone can give me a clue that what should I check for, or any requirement that I don't know about, I highly appreciate it.

Thank you.

Andrii Oliinyk · ‎11-03-2025

there is a lot of missing diagnosis outputs to advise:
1) sho mac ad vlan X / sho ip arp vrf X VLAN X / basic icmp connectivity from AcGW on the adjacent switch. on the same switch endpoints in the same VLAN must see each other as per ARPs flooded locally to switch.
2) sho vrf X / sho ip arp vrf X . "They (Edge nodes?) just couldn't see their gateway which was up with no problems" - how is it? AcGWs in affected VRF X came up w/o VRF or what exactly has been observed?

AminK · ‎11-03-2025

Thank you so much for your time.

In case of number one, I downgraded and It fixed the problem. And yes exactly, Arp must at least flood in same switch.

In case of number two, yes the anycast gateway was in up/up state in my VRF. But clients with same vlan as AcGW couldn't ping it. Pointing to VRF was not right and I apologize for that.

But aside all of these, My main concern is something else. The fact that what is the main problem, and why I have different kind of problems in each switch, because they are not limited to above problems. Should I Change my workflow? Like upgrading Border Control first? Do you have similar experience in upgrading switches in SD-Access environment?

Andrii Oliinyk · ‎11-03-2025

with upgrades i've never met problems u described. whether it was via DNAC SWIM or manual.
the root cause i can think of for case (2) is MAC-address of AcGW has been changed on arbitrary SVI while clients ARP-cash kept entry with previous MAC. AcGW will drop packets destined to it with wrong MAC. why clients didnt refresh their ARP-cash during switch reboot (implied down/up cycle on clients ports) is another quiz in this case. but i've met cases with iDRACs/iLOs of servers keeping their ARP-chash for default GW even during upstream switch replacement (down/up cycle on their ports just didnt work).