cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
2560
Views
0
Helpful
2
Replies

ACI Fabric Full Reboot - Issue

HPCS
Level 1
Level 1

My ACI managed datacenter recently went through a full power down and startup due to planned maintenance. All ACI devices were powered completely down for 24+ hours. Once maintenance was complete, the ACI fabric was restarted in the following order. OOB Switch - APIC 1 – APIC 1 directly attached leaf – Spine 1 & 2 – All leaf switches – APIC 2 & 3. Firmware is 14.2(6h) on all switches and 4.2(6h) on the controllers.

 

Power down of fabric was APIC 3,2,1 through the GUI followed by Spines, then all leafs. Power was pulled from all devices.

 

ACI rebooted into a “Fabric Discovery in progress” state on all spine and leaf switches and “fabric recovery has not yet started” on APIC’s 2&3. Running the command “show discoveryissue” on the spine resulted in mostly all fails. An hour into this state the “show discoveryissue” started to go from Fail to Ok. The fabric slowly started to recovery and eventually came back up completely in around 2-3 hours.

 

The issues were fully resolved by the time an IE was able to get involved but no issue/cause was discovered.

 

Any thoughts on this? Thanks. 

1 Accepted Solution

Accepted Solutions

Robert Burns
Cisco Employee
Cisco Employee

This is likely bug CSCvy87277.  The public bug search seems to be down currently, so pasting the info here.

Affected Version
includes 4.2(6h) (as well as many other versions
Fixed in Version: 5.2(3e) and later
Symptom:
While the leaf is coming back up, it is seen that apicconnectivity.xml file is empty and thus controller facing ports on the leaf nodes stay in an admin-down state until bootstrap timer completes (~90 minutes), forcing the links to come up.

Conditions:
Power Cycle/outage event in the following order:
1. APICs get power cycled/downed due to power event
2. While APICs are down still and APIC facing ports are down, Leaf nodes get power cycled/downed due to power event
Workaround: The leaf ports facing the APIC can be shut/no shut to force them to come up prior to the bootstrap timer forcing them up. 
Robert

View solution in original post

2 Replies 2

Robert Burns
Cisco Employee
Cisco Employee

This is likely bug CSCvy87277.  The public bug search seems to be down currently, so pasting the info here.

Affected Version
includes 4.2(6h) (as well as many other versions
Fixed in Version: 5.2(3e) and later
Symptom:
While the leaf is coming back up, it is seen that apicconnectivity.xml file is empty and thus controller facing ports on the leaf nodes stay in an admin-down state until bootstrap timer completes (~90 minutes), forcing the links to come up.

Conditions:
Power Cycle/outage event in the following order:
1. APICs get power cycled/downed due to power event
2. While APICs are down still and APIC facing ports are down, Leaf nodes get power cycled/downed due to power event
Workaround: The leaf ports facing the APIC can be shut/no shut to force them to come up prior to the bootstrap timer forcing them up. 
Robert

HPCS
Level 1
Level 1

Thanks for the update Robert. For this case, would I still have full connectivity within my fabric (not including APICs)? A few systems devices connected to the ACI fabric were booted for troubleshooting purposes and were unable to talk both within the fabric and outside. This connectivity did resolve itself.

Save 25% on Day-2 Operations Add-On License