cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
2107
Views
15
Helpful
0
Comments
Abey K. George
Cisco Employee
Cisco Employee

Overview

This document covers the important troubleshooting data to be collected and data collection procedures when you experience an issue with ACI fabric. The collection of the relevant troubleshooting data is very critical to root cause the issue and take necessary steps to prevent the similar issue happening again. If you engage Cisco ACI Solution Support TAC for troubleshooting an issues, the TAC engineer will be asking for those data for their analysis... So, I suggest you to collect the relevant data/logs before we loose those important evidence.

 

Here are some general issue scenario groups and required troubleshooting data for these categories. However, additional data which is not covered in this document may be required based on the complexity of the issue.

 

APIC controller related issue

This category of issues only involves APIC controller (fabric nodes are not involved). APIC controller connectivity issues to vCenter/SCVMM, APIC management (SNMP, AAA, OOB, etc) related issues, and Controller upgrade related issues are common such issues. If your initial assessment of the issue confirms that the suspected area is only APIC controllers, collect the following troubleshooting data.

 

1. Tech-support logs from all APIC controllers. If the issue is appeared after the upgrade, don't forget to check "Include pre-upgrade logs" option in the export On-Demand TechSupport policy.

2. Configuration export. This would be useful to review the configuration as the techsupport doesn't include the config.

 

The following guide can be used for your reference to collect the data.

https://supportforums.cisco.com/t5/data-center-documents/generating-and-downloading-techsupport-files-from-aci-fabric/ta-p/3185813

 

https://supportforums.cisco.com/t5/data-center-documents/aci-on-demand-techsupport-collection-when-first-opening-an-sr/ta-p/3215947

 

https://supportforums.cisco.com/t5/data-center-documents/aci-apic-configuring-an-export-policy-using-the-gui/ta-p/3305088

 

 

ACI fabric related issue

This category of issues involves both APIC controller and ACI fabric switches, so it is important to collect troubleshooting data from APIC controllers and fabric switches in this case. However, it will be cumbersome to collect all fabric switch troubleshooting data from a large size fabric. So, it is important to figure-out the impacted/involved fabric switches and collect the data from only those switches. So, the following guidelines shall be used to isolate the issues to relevant fabric switch. Please use one or combination of below steps to for collecting data depending upon the type of issues.

 

1. For all fabric related issues, please ensure to collect the troubleshooting data from APIC controller.  Hence, follow the previous section "APIC controller related issue" and collect all data as indicated in the above section.

 

2. If you are experiencing an endpoint connectivity issue, collect the leaf techsupport logs from the node where the endpoint is learned (or expected to learn). If the impacted host is connected to ACI fabric over a vPC, please ensure to collect the techsupport logs from vPC pair switches.

 

3. If your endpoint is failed to communicate with a remote device which is learned (or expected to learn) in an another leaf node (or vPC pair nodes), please ensure to collect the techsupport logs from those leaf(s) as well.

 

4. If your endpoint is failed to communicate with a remote device which is outside the ACI fabric (L3Out, L2Out), please ensure to collect the techsupport logs from the border leaf(s). If the neighbor is connected to ACI fabric over a vPC, please ensure to collect the techsupport logs from vPC pair switches.

 

5. If you'd experience an issue with routing protocol neighbor-ship or route learning an re-distribution, please ensure to collect the techsupport logs from the related border leaf(s)

 

6. If your impacted endpoint for connectivity is hosted in VMWare vCenter environment, please ensure to collect Diagnostic information/System logs from the ESXi host where the impacted endpoint is hosted. The KB articl https://kb.vmware.com/s/article/653 is a a useful reference.

 

7. If your integration involves VMM integration with AVS, follow the AVS troubleshooting guide to collect the relevant data.

https://www.cisco.com/c/en/us/td/docs/switches/datacenter/nexus1000/avs/troubleshoot/5-2-1-SV3-1-x/b_AVS_Troubleshooting_5-2-1-SV3-1-x/overview_of_troubleshooting.html

 

8. If your connectivity issues involves communication across multiple pods (multi-pod), collect the techsupport logs from IPN connected SPINE switches & IPN devices.

 

 

Note: This is a growing document, and it will be update as we have more scenarios added.

 

 

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: