cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
566
Views
0
Helpful
6
Replies

Leaf/Spine upgrade unreachable Status

KVS7
Level 1
Level 1

Just would like some ideas on how to troubleshoot a leaf/spine upgrade that installed the firmware but then went "unreachable status" for about an hour before I called it a day and left. Here's the upgrade hops that I did.

ACI infrastructure just out of the box

Hardware is 93180 + 9332C 

APIC firmware version 5.2(8) upgraded to 6.0.2(f) then to 6.0.8(f)

Spines were at 6.0.2(h) already so I just created an odds and evens group to get to 6.0.8(f)

Summary: 5.2.8 > 6.0.2 > 6.0.8 Is all I did starting first with the APIC and then from 6.0.2h to 6.0.8(f) switches.

I installed both the 32/64 bit versions. I left the "advanced" settings default which is continue on error and no gracefull insertion/removal or whatever. 

I did notice that the TEP addresses are not pingable. I did not put management IPs on them before the upgrade (it's pretty much out of box) but I thought that the communication is handled via infra network aka TEP. I tested an even leaf by setting up an OOB so NTP will work but it also did not boot.

I did not have vPCs yet since it's out of box, however everything is physically cabled correctly. Github pre-validation artical mentions this is not required for a successful upgrade. When the leafs went down, I noticed APIC1 bond0 eth2-1 (Odd spine) went down and is now active on 2-2 probably because the TEP address is down and has nothing to do with there not being a vPC. 

Any ideas? 

6 Replies 6

AshSe
VIP
VIP

Hey @KVS7 

Please do check below points as initial task to Troubleshoot:

  1. Power Cycle
  2. Take Console Access to one of the unreachable leaf switch.
  3. Check VLAN configuration on APIC, spines, and leaves. Make sure Infrastructure VLAN is configured.
  4. MTU Consistency across the fabric

Thanks Ash, I will try and power cycle but that's about all I can do. it's out of box so the 3 node APIC cluster was healthy or fully fit meaning it has an infra-vlan.

The switch is in a different country so I can't access it until our Raritan orders come in. Any other ideas? I'll get back after power cycle. 

Hey @KVS7 

Until your Raritan orders (I am not fully clear what this order mean) come in, you can do below activities:

1. Review APIC Configuration (Remotely):

  • Fabric Membership: In the APIC GUI, go to Fabric > Inventory > Pod 1 > Fabric Membership. Verify that the leaf switch is listed. If it's listed but showing as "unreachable," check the "Last Update" time. Is it recent, or does it indicate that the APIC hasn't heard from the leaf in a while?
  • Node Configuration: If the leaf is listed, click on it. Check the following:
    • Node ID: Is the Node ID correct? A mismatch can cause issues.
    • Serial Number: Is the serial number correct?
    • Model: Is the model correctly identified?
    • Management Address: Even if you don't have dedicated management IPs, is anymanagement address configured (even if it's just the TEP address)? Sometimes, a misconfigured management address can cause problems.
  • Firmware Version: Check that the 6.0.8(f) firmware is listed as installed and active.
  • Firmware Upgrade History: Look for the upgrade history for the specific leaf switch. Did the upgrade complete successfully, or did it fail with an error? The error messages might provide clues.

2. Check APIC Health (Remotely):

  • APIC Cluster Status: Ensure that all three APICs are healthy and in the "fully fit" state. A degraded APIC cluster can cause communication problems.
  • Faults: Check for any active faults in the APIC GUI (Faults tab). Filter by severity (critical, major, minor) and look for any faults related to the leaf switch or the infrastructure VLAN.
  • Events: Check the event logs for any events related to the leaf switch or the upgrade process.

KVS7
Level 1
Level 1

Thanks Ash. Yes all that is good. It's a brand new build out of the box. Leafs were discovered automaticaly and registered, everything was fully fit and software was installed via the correct upgrade path. Everything was fine. Management IPs and TEPs are installed and pingable. However, when the leafs reboot, they down the APIC eth2-1 or 2-2 ports just from a reboot, so I feel like the issue is something else and not the upgrade.

Hello @KVS7 ,

I am not sure if I understood the current status of your fabric. You wrote earlier that the switches were unreachable after the upgrade. Then you wrote that everything is discovered and registered. 

Does the question relate to the unregistered state or to the leaf reboot impacting the APIC reachability?

We saw a bug report that shows reboots after upgrades sends the switches into ROMMON mode but we're still testing. We're also testing on our other network that mirrors the bad network and noticed the same issues. 

Review Cisco Networking for a $25 gift card

Save 25% on Day-2 Operations Add-On License