I was setting up 2 Fabrics today.
#1 is already set up and upgraded with a little bit of pain and reloading and manual help, but it is up and active and alive.
#2 Fabric behaves like this.
Apic sees first leaf, I registered it for the first time (running 4.1.1) and after registration it went inactive and stayed inactive.
(fixed this on first fabric by upgradeing the APIC to 4.2.3)
Ultimately I managed to register the leaf, upgrade APIC and the first leaf as well thinking...ok, it will not go inactive and will
not cause trouble, when on the same software. Good.
Now the first leaf is active, I see both spines discovered (even the second leaf), I correctly register them, but they stay there, discovered and do not transition to Active.
Apic and first leaf are running 14.2(3l), 2 spines and other 3 leafs are running 14.1(1j).
Spines show this:
User Access Verification
(none) login: admin
Fabric discovery in progress, show commands are not fully functional
Logout and Login after discovery to continue to use show commands.
so there is something going on
I already wiped the fabric completely, setup-clean-config.sh on switches,
acidiag touch clean, touch setup, reboot on APIC and setup the APIC again
but it behaves the same way.
Any hints please?
PS: any hints for good technical blog/book, short, about technicalities of Fabric and Multisite, inserting services so i do not need to read 5 x 500 pages books to understand it all will be appreciated as well. Im ccie r&s but need to transition to bgp evpn and aci.
First, make sure all fabric nodes are running same version.
Also, try the following command on spine/leaf switches which are experiencing problems:
leaf101# show discoveryissues
This command will assist in the diagnosis of common discovery issues.
Depending on the outputs, you might want to take different approach in continuing the troubleshooting.
You can follow the t-shoot steps from Troubleshooting ACI book, Chapter 1 - initial fabric setup.
I manually setup all the 4 leaves and 2 spines to the same software,
Did complete cleanup of the whole topology (APIC + all the switches), setup the apic
And ended up in the same situation as before.
Spines are discovered, I register them and they stay there.
The first leaf shows this:
DCB-L2101-Rxy# show discoveryissues
Checking the platform type................LEAF!
Check01 - System state - in-service [ok]
Check02 - DHCP status [ok]
TEP IP: 172.18.74.64 Node Id: 2101 Name: DCB-L2101-Rxy
Check03 - AV details check [ok]
Check04 - IP rechability to apic [ok]
Ping from switch to 172.18.64.1 passed
Check05 - infra VLAN received [ok]
Check06 - LLDP Adjacency [ok]
Found adjacency with SPINE
Found adjacency with APIC
Check07 - Switch version [ok]
version: n9000-14.2(3l) and apic version: 4.2(3l)
Check08 - FPGA/BIOS out of sync test [ok]
Check09 - SSL check [check]
SSL certificate details are valid
Check10 - Downloading policies [ok]
Check11 - Checking time [ok]
Check12 - Checking modules, power and fans [FAIL]
Power supply state is shut
Ignore if it's a redudant power supply
Check13 - infra VLAN received [FAIL]
No isis adjacencies found
Ignore if first leaf/spine in an ACI fabric or remote leaf
What is weard, the first leaf is shown as inactive after some time (noticed now)...maybe thats why those spines have troubles.
I will reload the APIC one more time and lets see.
The output looks ok, as is the first discovered leaf. And yes, if the first leaf goes into inactive state, the rest of the fabric cannot be discovered correctly. How is the connectivity between APIC and first leaf? What ports you use on the controller to connect to leaf? Is the interface up? Does it go down, when the leaf goes down?
I had a guy flap the interfaces of APIC, suddenly I saw this (the output below) and Spines went from
"Nodes Pending Registration" to "Registered Nodes"
And subsequently I waas able to register all the leaves.
Another weard thing is, I checked the cabling
On the APIC inter
2-1 is connected to 1/1 on leaf1
2-2 (and this one is down) is connected to 1/1 on leaf2
should not this interface be up?
and last quetion from my todays questionaire...
and it just flooded me with millions of rows.
Is there any proper cleanup procedure for all the logging on APICs please?
DCB-APIC1# [ 3828.268697] Modify tunnel received
[ 3828.273065] [tep0]: vers 4 prot 17 ihl 5 daddr 0x406612ac saddr 0x100000a 0x0 type 0x0
[ 3828.282955] Setting l3 proxy ip 0x406612ac
[ 3828.291674] Modify tunnel received
[ 3828.296052] [tep1]: vers 4 prot 17 ihl 5 daddr 0x406612ac saddr 0x200000a 0x0 type 0x0
[ 3828.305928] Setting l3 proxy ip 0x406612ac
[ 3828.314316] Modify tunnel received
[ 3828.318710] [tep2]: vers 4 prot 17 ihl 5 daddr 0x406612ac saddr 0x300000a 0x0 type 0x0
[ 3828.328597] Setting l3 proxy ip 0x406612ac
[ 3828.335534] Modify tunnel received
[ 3828.339917] [tep3]: vers 4 prot 17 ihl 5 daddr 0x406612ac saddr 0x400000a 0x0 type 0x0
[ 3828.349782] Setting l3 proxy ip 0x406612ac
[ 3828.358022] Modify tunnel received
[ 3828.362423] [tep4]: vers 4 prot 17 ihl 5 daddr 0x406612ac saddr 0x500000a 0x0 type 0x0
[ 3828.372290] Setting l3 proxy ip 0x406612ac
[ 3828.380595] Modify tunnel received
[ 3828.384982] [tep5]: vers 4 prot 17 ihl 5 daddr 0x406612ac saddr 0x600000a 0x0 type 0x0
[ 3828.394875] Setting l3 proxy ip 0x406612ac
[ 3828.403432] Modify tunnel received
[ 3828.407776] [tep6]: vers 4 prot 17 ihl 5 daddr 0x406612ac saddr 0x700000a 0x0 type 0x0
[ 3828.417171] Setting l3 proxy ip 0x406612ac
[ 3828.425285] Modify tunnel received
[ 3828.429425] [tep7]: vers 4 prot 17 ihl 5 daddr 0x406612ac saddr 0x800000a 0x0 type 0x0
[ 3828.438861] Setting l3 proxy ip 0x406612ac
[ 3828.453129] Modify tunnel received
[ 3828.457360] [teplo-1]: vers 4 prot 17 ihl 5 daddr 0x406612ac saddr 0x0 0x0 type 0x0
[ 3828.466509] Setting l3 proxy ip 0x406612ac
[ 3828.548086] Modify tunnel received
[ 3828.552226] [tep0]: vers 4 prot 17 ihl 5 daddr 0x416612ac saddr 0x100000a 0x0 type 0x1
[ 3828.561661] Setting l2 proxy ip 0x416612ac
[ 3828.568260] Modify tunnel received
[ 3828.572371] [tep1]: vers 4 prot 17 ihl 5 daddr 0x416612ac saddr 0x200000a 0x0 type 0x1
[ 3828.581768] Setting l2 proxy ip 0x416612ac
[ 3828.590094] Modify tunnel received
[ 3828.594220] [tep2]: vers 4 prot 17 ihl 5 daddr 0x416612ac saddr 0x300000a 0x0 type 0x1
[ 3828.603614] Setting l2 proxy ip 0x416612ac
[ 3828.611852] Modify tunnel received
[ 3828.616011] [tep3]: vers 4 prot 17 ihl 5 daddr 0x416612ac saddr 0x400000a 0x0 type 0x1
[ 3828.625484] Setting l2 proxy ip 0x416612ac
[ 3828.633780] Modify tunnel received
[ 3828.637913] [tep4]: vers 4 prot 17 ihl 5 daddr 0x416612ac saddr 0x500000a 0x0 type 0x1
[ 3828.647313] Setting l2 proxy ip 0x416612ac
[ 3828.655545] Modify tunnel received
[ 3828.659701] [tep5]: vers 4 prot 17 ihl 5 daddr 0x416612ac saddr 0x600000a 0x0 type 0x1
[ 3828.669154] Setting l2 proxy ip 0x416612ac
[ 3828.677537] Modify tunnel received
[ 3828.681671] [tep6]: vers 4 prot 17 ihl 5 daddr 0x416612ac saddr 0x700000a 0x0 type 0x1
[ 3828.691084] Setting l2 proxy ip 0x416612ac
[ 3828.699399] Modify tunnel received
[ 3828.703563] [tep7]: vers 4 prot 17 ihl 5 daddr 0x416612ac saddr 0x800000a 0x0 type 0x1
[ 3828.713034] Setting l2 proxy ip 0x416612ac
[ 3828.727568] Modify tunnel received
[ 3828.731807] [teplo-1]: vers 4 prot 17 ihl 5 daddr 0x416612ac saddr 0x0 0x0 type 0x1
[ 3828.740973] Setting l2 proxy ip 0x416612ac
Do yo have APIC M3 or L3?
If yes, note that the VIC 1455 has 4 ports, port-1, port-2, port-3, and port-4 from left to right.
Port-1 and port-2 is one pair, corresponding to eth2-1 on APIC and port-3 and port-4 is another pair, corresponding to eth2-2 on APIC. Only one connection is allowed for each pair. For example, you can connect one cable to either port-1 or port-2, and connect another cable to either port-3 or port-4 (please do not connect two cables on any pair).
The fabric clean is pretty simple:
acidiag touch clean acidiag touch setup acidiag reboot
On leaf/spine switches:
OH!!!!, oh, my fault! facepalm
Moved the cable to port-3 and it is up. bond has 2intfs up.
...and the problem with spines was probably caused by this, cause when those ports were flapped one after another it suddenly worked.
Thank you very much!
Any hint on the log cleanup on APICs please?
Appreciate your attention and speed of response.
No worries. I have seen the miscabling for APIC M3/L3 several times, that's why I knew about it :-)
About log cleanup, sorry about suggesting the fabric clean reload. I misunderstood the question.
Normally, you should not worry too much about the APIC and leaf/spine switches logs. This is because, first, some of the dme logs are read only for user admin, and second, the log files are by default rotating and archived:
-rw-r--r-- 2 ifc admin 13069288 Mar 31 15:09 svc_ifc_policymgr.bin.log -rw-r--r-- 1 root root 1678019 Mar 30 10:36 svc_ifc_policymgr.bin.log.1.gz -rw-r--r-- 1 root root 1514324 Mar 30 11:39 svc_ifc_policymgr.bin.log.2.gz -rw-r--r-- 1 root root 1436470 Mar 30 15:03 svc_ifc_policymgr.bin.log.3.gz
APIC will not run out of space because of the logs (unless there is a bug in the software).
If you really really want to clear the logs, you can see which logs are having admin ownership (ls -l) and echo "" > filename. Or you can open a TAC case and ask them to login using root and clear the logs for all of the logfiles. But again, I really think this is unnecessary.