08-18-2020 10:25 AM
Hi I have just built a new ACI Fabric ver 4.2(3j) I configured the first APIC , registered all the nodes OK. I then built the 2nd and 3rd APICs with identical Fabric Name, ID, Pod ID and Infra VLAN However the 2nd and 3rd APICs do not join the cluster.
When I try and login to one of these APICs I get the message.
'REST Endpoint user authorization datastore is not initialized check fabric membership status of this fabric node'
Thinking I must have made a typo when setting up the 1st APIC I then reset all 3 APICs
'acidiag touch clean' '
acidiag touch setup'
'acidiag reboot'
And all Fabric Nodes
'setup-clean-config.sh'
I then rebuilt the fabric again, just accepting all APIC defaults , just specifying the infra VLAN 3930.
Cluster configuration ...
Enter the fabric name: ACI Fabric1
Fabric ID: 1
Number of controllers: 3
Controller name apic1
POD ID: 1
Controller ID: 1
TEP address pool: 10.0.0.0/16
Infra VLAN ID: 3930
Multicast address pool: 225.0.0.0/15
Again registered all nodes OK, the built 2nd APIC again accepting all defaults
Cluster configuration ...
Enter the fabric name: ACI Fabric1
Fabric ID: 1
Number of controllers: 3
Controller name apic2
POD ID: 1
Controller ID: 2
TEP address pool: 10.0.0.0/16
Infra VLAN ID: 3930
Multicast address pool: 225.0.0.0/15
And again the 2nd APIC did not join the cluster.and displays the below message when logging into the GUI of the 2nd APIC.
'REST Endpoint user authorization datastore is not initialized check fabric membership status of this fabric node'
I'm sure the fabric settings I typed in all match on both APICs and there are no whitespaces in Fabric names etc..
Any ideas?
Thanks in advance.
Colin
acidiag avread from both APICs
APIC1
apic1# acidiag avread
Local appliance ID=1 ADDRESS=10.0.0.1 TEP ADDRESS=10.0.0.0/16 ROUTABLE IP ADDRESS=0.0.0.0 CHASSIS_ID=7a8eb93a-e144-11ea-879b-65cd6756013c
Cluster of 1 lm(t):1(zeroTime) appliances (out of targeted 3 lm(t):1(2020-08-18T11:23:18.960+00:00)) with FABRIC_DOMAIN name=ACI Fabric1 set to version=4.2(3j) lm(t):1(2020-08-18T11:23:29.944+00:00); discoveryMode=PERMISSIVE lm(t):0(zeroTime); drrMode=OFF lm(t):0(zeroTime); kafkaMode=OFF lm(t):0(zeroTime)
appliance id=1 address=10.0.0.1 lm(t):1(2020-08-18T11:21:50.220+00:00) tep address=10.0.0.0/16 lm(t):1(2020-08-18T11:21:50.220+00:00) routable address=0.0.0.0 lm(t):1(zeroTime) oob address=43.218.45.14/24 lm(t):1(2020-08-18T11:21:57.415+00:00) version=4.2(3j) lm(t):1(2020-08-18T11:21:57.902+00:00) chassisId=7a8eb93a-e144-11ea-879b-65cd6756013c lm(t):1(2020-08-18T11:21:57.902+00:00) capabilities=0X3EEFFFFFFFFF--0X2020--0X1 lm(t):1(2020-08-18T11:28:27.760+00:00) rK=(stable,present,0X206173722D687373) lm(t):1(2020-08-18T11:21:57.421+00:00) aK=(stable,present,0X206173722D687373) lm(t):1(2020-08-18T11:21:57.421+00:00) oobrK=(stable,present,0X206173722D687373) lm(t):1(2020-08-18T11:21:57.421+00:00) oobaK=(stable,present,0X206173722D687373) lm(t):1(2020-08-18T11:21:57.421+00:00) cntrlSbst=(APPROVED, WZP24231391) lm(t):1(2020-08-18T11:21:57.902+00:00) (targetMbSn= lm(t):0(zeroTime), failoverStatus=0 lm(t):0(zeroTime)) podId=1 lm(t):1(2020-08-18T11:21:50.220+00:00) commissioned=YES lm(t):1(zeroTime) registered=YES lm(t):1(2020-08-18T11:21:50.220+00:00) standby=NO lm(t):1(2020-08-18T11:21:50.220+00:00) DRR=NO lm(t):0(zeroTime) apicX=NO lm(t):1(2020-08-18T11:21:50.220+00:00) virtual=NO lm(t):1(2020-08-18T11:21:50.220+00:00) active=YES(2020-08-18T11:21:50.220+00:00) health=(applnc:255 lm(t):1(2020-08-18T11:23:04.601+00:00) svc's)
---------------------------------------------
*******Additional elements outside of cluster*******
appliance id=2 address=10.0.0.2 lm(t):104(2020-08-18T11:51:03.608+00:00) tep address=0.0.0.0 lm(t):0(zeroTime) routable address=0.0.0.0 lm(t):0(zeroTime) oob address=0.0.0.0 lm(t):0(zeroTime) version= lm(t):0(zeroTime) chassisId=98c35844-e148-11ea-b955-d5bd81fad0b6 lm(t):104(2020-08-18T11:51:03.608+00:00) capabilities=0XFFFFFFF--0X2020--0 lm(t):0(zeroTime) rK=(stable,absent,0) lm(t):0(zeroTime) aK=(stable,absent,0) lm(t):0(zeroTime) oobrK=(stable,absent,0) lm(t):0(zeroTime) oobaK=(stable,absent,0) lm(t):0(zeroTime) cntrlSbst=(DO_SOMETHING, WZP242312SB) lm(t):104(2020-08-18T11:51:03.608+00:00) (targetMbSn= lm(t):0(zeroTime), failoverStatus=0 lm(t):0(zeroTime)) podId=1 lm(t):104(2020-08-18T11:51:03.608+00:00) commissioned=NO lm(t):1(zeroTime) registered=NO lm(t):0(zeroTime) standby=NO lm(t):104(2020-08-18T11:51:03.608+00:00) DRR=NO lm(t):0(zeroTime) apicX=NO lm(t):104(2020-08-18T11:51:03.608+00:00) virtual=NO lm(t):0(zeroTime) active=NO(zeroTime) health=(applnc:1 lm(t):0(zeroTime))
---------------------------------------------
clusterTime=<diff=1 common=2020-08-18T11:57:27.006+00:00 local=2020-08-18T11:57:27.005+00:00 pF=<displForm=0 offsSt=0 offsVlu=0 lm(t):1(2020-08-18T11:23:19.028+00:00)>>
---------------------------------------------
apic1#
APIC2
apic2 login: rescue-user
********************************************************************************
Fabric discovery in progress, show commands are not fully functional
Logout and Login after discovery to continue to use show commands.
********************************************************************************
apic2# acidiag avread
Local appliance ID=2 ADDRESS=10.0.0.2 TEP ADDRESS=10.0.0.0/16 ROUTABLE IP ADDRESS=0.0.0.0 CHASSIS_ID=98c35844-e148-11ea-b955-d5bd81fad0b6
Cluster of 2 lm(t):2(zeroTime) appliances (out of targeted 3 lm(t):2(zeroTime)) with FABRIC_DOMAIN name=ACI Fabric1 set to version=4.2(3j) lm(t):2(zeroTime); discoveryMode=PERMISSIVE lm(t):0(zeroTime); drrMode=OFF lm(t):0(zeroTime); kafkaMode=OFF lm(t):0(zeroTime)
appliance id=1 address=10.0.0.1 lm(t):2(2020-08-18T11:51:16.046+00:00) tep address=0.0.0.0 lm(t):0(zeroTime) routable address=0.0.0.0 lm(t):0(zeroTime) oob address=0.0.0.0 lm(t):0(zeroTime) version= lm(t):0(zeroTime) chassisId=7a8eb93a-e144-11ea-879b-65cd6756013c lm(t):2(2020-08-18T11:51:16.045+00:00) capabilities=0XFFFFFFF--0X2020--0 lm(t):0(zeroTime) rK=(stable,absent,0) lm(t):0(zeroTime) aK=(stable,absent,0) lm(t):0(zeroTime) oobrK=(stable,absent,0) lm(t):0(zeroTime) oobaK=(stable,absent,0) lm(t):0(zeroTime) cntrlSbst=(UNDEFINED, ) lm(t):0(zeroTime) (targetMbSn= lm(t):0(zeroTime), failoverStatus=0 lm(t):0(zeroTime)) podId=0 lm(t):0(zeroTime) commissioned=YES lm(t):2(zeroTime) registered=YES lm(t):2(2020-08-18T11:51:16.901+00:00) standby=NO lm(t):0(zeroTime) DRR=NO lm(t):0(zeroTime) apicX=NO lm(t):0(zeroTime) virtual=NO lm(t):0(zeroTime) active=NO(zeroTime) health=(applnc:1 lm(t):0(zeroTime))
appliance id=2 address=10.0.0.2 lm(t):2(2020-08-18T11:51:04.892+00:00) tep address=10.0.0.0/16 lm(t):2(2020-08-18T11:51:04.892+00:00) routable address=0.0.0.0 lm(t):2(zeroTime) oob address=43.218.45.15/24 lm(t):2(2020-08-18T11:51:10.698+00:00) version=4.2(3j) lm(t):2(2020-08-18T11:51:10.793+00:00) chassisId=98c35844-e148-11ea-b955-d5bd81fad0b6 lm(t):2(2020-08-18T11:51:10.793+00:00) capabilities=0X3EEFFFFFFFFF--0X2020--0X2 lm(t):2(2020-08-18T11:51:10.793+00:00) rK=(stable,present,0X206173722D687373) lm(t):2(2020-08-18T11:51:10.704+00:00) aK=(stable,present,0X206173722D687373) lm(t):2(2020-08-18T11:51:10.704+00:00) oobrK=(stable,present,0X206173722D687373) lm(t):2(2020-08-18T11:51:10.704+00:00) oobaK=(stable,present,0X206173722D687373) lm(t):2(2020-08-18T11:51:10.704+00:00) cntrlSbst=(APPROVED, WZP242312SB) lm(t):2(2020-08-18T11:51:10.793+00:00) (targetMbSn= lm(t):0(zeroTime), failoverStatus=0 lm(t):0(zeroTime)) podId=1 lm(t):2(2020-08-18T11:51:04.892+00:00) commissioned=YES lm(t):2(zeroTime) registered=YES lm(t):2(2020-08-18T11:51:04.892+00:00) standby=NO lm(t):2(2020-08-18T11:51:04.892+00:00) DRR=NO lm(t):0(zeroTime) apicX=NO lm(t):2(2020-08-18T11:51:04.892+00:00) virtual=NO lm(t):2(2020-08-18T11:51:04.892+00:00) active=YES(2020-08-18T11:51:04.892+00:00) health=(applnc:112 lm(t):2(2020-08-18T11:51:35.795+00:00) svc's)
---------------------------------------------
clusterTime=<diff=0 common=2020-08-18T11:55:36.619+00:00 local=2020-08-18T11:55:36.619+00:00 pF=<displForm=1 offsSt=0 offsVlu=0 lm(t):0(zeroTime)>>
---------------------------------------------
apic2#
Solved! Go to Solution.
08-24-2020 02:38 AM
The issue is now resolved, I had to contact Cisco TAC as I was right out of ideas.
I was hitting bug CSCvu62127 where some new APICs where shipped with invalid certificates.
https://bst.cloudapps.cisco.com/bugsearch/bug/CSCvu62127
Before:
EUGBCHESVR01-APIC-ACI# acidiag verifyapic
openssl_check: certificate details
subject= serialNumber=PID:APIC-SERVER-M3 SN:WZP242406QN,CN=WZP242406QN
issuer= CN=Cisco Manufacturing CA,O=Cisco Systems
notBefore=Jul 30 07:25:43 2020 GMT
notAfter=May 14 20:25:41 2029 GMT
openssl_check: passed
ssh_check: passed
all_checks: passed
After:
EUGBCHESVR01-APIC-ACI# acidiag verifyapic
openssl_check: certificate details
subject= CN=WZP242406QN,serialNumber=PID:APIC-SERVER-M3 SN:WZP242406QN
issuer= CN=Cisco Manufacturing CA,O=Cisco Systems
notBefore=Aug 21 11:52:40 2020 GMT
notAfter=May 14 20:25:41 2029 GMT
openssl_check: passed
ssh_check: passed
all_checks: passed
Also, we used below commands to verify:
Total Objects shown: 1
# fault.Inst
code : F3031
ack : no
annotation :
cause : cert-invalid
Thanks for the help.
Regards
Colin
08-18-2020 11:45 AM
Hi @colin.lynch
You can verify that the setup script values match using these two commands:
APIC# cat /data/data_admin/sam_exported.config APIC# avread
*Note that th second command is "avread" and not the "acidiag avread" (though output of both contains the same info but the new cmd is cleaner and easier to read).
Also, there is also this command available in 4.2, for troubleshooting APIC clustering issues:
apic1# acidiag cluster
Would you mind sharing the outputs of these commands?
Stay safe,
Sergiu
08-19-2020 05:30 AM
Thanks for the quick response, and useful commands @Sergiu.Daniluk
I have attached a side by side screen shot of the following outputs
APIC# cat /data/data_admin/sam_exported.config APIC# avread
apic1# acidiag cluster
Outputs below
APIC1
EUGBCHESVR01-APIC-ACI# acidiag cluster
Admin password:
Product-name = APIC-SERVER-M3
Serial-number = WZP24231391
Running...
Operational cluster size and target cluster size are not identical
Checking Core Generation: OK
Checking Wiring and UUID: switch(103) reports apic(2) has wireIssue: ctrlr- uuid-mismatch; switch(104) reports apic(2) has wireIssue: ctrlr-uuid-mismat ch
Checking AD Processes: Running
Checking All Apics in Commission State: OK
Checking All Apics in Active State: OK
Checking Fabric Nodes: OK
Checking Apic Fully-Fit: OK
Checking Shard Convergence: OK
Ping OOB IPs:
APIC-1: 43.218.45.14 - OK
Ping Infra IPs:
APIC-1: 10.0.0.1 - OK
Checking APIC Versions: Same (4.2(3j))
Checking SSL: OK
Done!
EUGBCHESVR01-APIC-ACI#
APIC2
apic2# acidiag cluster
Admin password:
Product-name = APIC-SERVER-M3
Serial-number = WZP242312SB
Running...
Operational cluster size and target cluster size are not identical
Checking Core Generation: OK
Checking Wiring and UUID: Password Required!
Checking AD Processes: Running
Checking All Apics in Commission State: OK
Checking All Apics in Active State: Inactive Apics: IFC-1
Checking Fabric Nodes: OK
Checking Apic Fully-Fit: Not Fully Fit Apics: IFC-1 IFC-2
Checking Shard Convergence: Password Required!
Ping OOB IPs:
APIC-1: Decommissioned. Invalid IP
APIC-2: 43.218.45.15 - OK
Ping Infra IPs:
APIC-1: 10.0.0.1 - Ping failed
APIC-2: 10.0.0.2 - OK
Checking APIC Versions: Cluster Version:4.2(3j) Imcompatible Apics: IFC-1()
Checking SSL: OK
Looking at the above outputs, it seems each APIC2 cannot reach APIC1.
It mentions that Leaf 103 and Leaf 104 are detecting a 'wire issue' on APIC 2 but I have confirmed that APIC2 is connected to the fabric correctly (Ports 1 and 3) this checks on on the LLDP tables on both leafs. APIC 1 is connected identically to leafs 101/102.
in the meantime I'll see if APIC 1 can PING all the other TEPs on the leafs/spines on the infra VLAN , same on APIC 2. As that is what seems to be the main issue.
Regards
Colin
08-22-2020 12:23 AM
Hi @colin.lynch
From the outputs you shared, I could see this issue:
Checking Wiring and UUID: switch(103) reports apic(2) has wireIssue: ctrlr- uuid-mismatch; switch(104) reports apic(2) has wireIssue: ctrlr-uuid-mismat ch
This is not a wiring issue. Is actually a UUID mismatch.
if you look at the avread output, you will see that the APIC1 reports APIC2's chassisID as 98c35844-.-blabla, while the same output on APIC2 shows different local chassisId - 327d4a22-.-blabla
APIC1's UUID is the same on both chassis. Can you verify the same output on APIC3? Could be possible that the APIC # to be overlapping.
Stay safe,
Sergiu
08-24-2020 02:38 AM
The issue is now resolved, I had to contact Cisco TAC as I was right out of ideas.
I was hitting bug CSCvu62127 where some new APICs where shipped with invalid certificates.
https://bst.cloudapps.cisco.com/bugsearch/bug/CSCvu62127
Before:
EUGBCHESVR01-APIC-ACI# acidiag verifyapic
openssl_check: certificate details
subject= serialNumber=PID:APIC-SERVER-M3 SN:WZP242406QN,CN=WZP242406QN
issuer= CN=Cisco Manufacturing CA,O=Cisco Systems
notBefore=Jul 30 07:25:43 2020 GMT
notAfter=May 14 20:25:41 2029 GMT
openssl_check: passed
ssh_check: passed
all_checks: passed
After:
EUGBCHESVR01-APIC-ACI# acidiag verifyapic
openssl_check: certificate details
subject= CN=WZP242406QN,serialNumber=PID:APIC-SERVER-M3 SN:WZP242406QN
issuer= CN=Cisco Manufacturing CA,O=Cisco Systems
notBefore=Aug 21 11:52:40 2020 GMT
notAfter=May 14 20:25:41 2029 GMT
openssl_check: passed
ssh_check: passed
all_checks: passed
Also, we used below commands to verify:
Total Objects shown: 1
# fault.Inst
code : F3031
ack : no
annotation :
cause : cert-invalid
Thanks for the help.
Regards
Colin
08-24-2020 04:30 AM
Glad to hear that the issue is resolved :-)
Cheers,
Sergiu
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide