09-04-2020 09:33 PM
Hi community,
I got a hand on a new ACI fabric (used for development first, might become production later). Before the first discovery, I configured all APICs, then use Leaf/Spine serial number to automate the fabric node addition via API. This is done so that when I plugged cables in, the fabric nodes would be discovered and provisioned themselves. I did plug APIC 1, 2 and 3 into the Leaf, in that order. I have done this for previous fabric and it worked every time. But this time, I forgot to check the Faults tab.
Things happened. APIC 1 saw all nodes as Inactive (TEP IP are assigned from the pool, though). When I checked and issue show discoveryissues on the directly connected Leaf, I can see that DHCP, infra VLAN, time and SSL check succeeded, but LLDP said I had ctrlr-uuid-mismatch. I can also see the Leaf changed its hostname to other than (none), but the admin username from APIC 1 wasn't pushed to it (so I still logged in using admin with no password). I couldn't login to APIC 2 or 3 because apparently they're not discovered.
I tried to factory reset the APIC 1 and directly connected Leaf. During the second time of discovery, I did it manually on Web GUI and could see the Leaf, but when I added it to the fabric it was still Inactive. When I checked the Faults tab, it shows F3031 - cert-invalid (probably due to APIC's cert, but this is APIC's default manufacturer cert).
I kind of have to wait until Service Contract started so that I could open a TAC case. But then, how would I proceed to factory the other two APICs? I tried using virtual KVM on CIMC but it wouldn't let me login, either.
Thanks a lot.
Solved! Go to Solution.
09-06-2020 08:12 AM
Hi @Timothy ACI
First, verify that datetime is consistent on all APICs and Leaf/Spine switches.
Second, verify that the certificates are valid:
On APIC: acidiag verifyapic
On Leafs: cd /securedata/ssl && openssl x509 -noout -subject -in server.crt
For your reference: https://quickview.cloudapps.cisco.com/quickview/bug/CSCva68310
Correct pattern: /serialNumber=PID:<PID> SN:<Serial number>/CN=<Serial number> Incorrect Pattern: /CN=<Serial number>/serialNumber=PID:<PID> SN:<Serial number>
Stay safe,
Sergiu
09-06-2020 11:10 AM - edited 09-06-2020 11:10 AM
Hello @Timothy ACI,
Just to add to Sergiu's accurate response, you can try to login to APIC-2 and APIC-3 with username rescue-user. It should be valid only because these APICs have not been able to form a cluster with APIC-1 due to the cert issue.
Going back to the original issue, you need to urgently engage Cisco TAC to assist you in generating new certs for your APICs (just check if all of them are impacted following Sergiu's suggestion)
Regards.
09-07-2020 01:41 AM
Hello @Timothy ACI,
I'm glad you can now login to APICs 2 & 3. You need to verify on those if the cert fault is also seen with moquery -c faultInst -f 'fault.Inst.code=="F3031"'
If so, check the cert pattern as you are already doing.
This problem is with the APIC controllers and not the Switches. You will need to engage Cisco TAC to get support in generating new certs for the APIC controllers impacted.
Regards.
09-06-2020 08:12 AM
Hi @Timothy ACI
First, verify that datetime is consistent on all APICs and Leaf/Spine switches.
Second, verify that the certificates are valid:
On APIC: acidiag verifyapic
On Leafs: cd /securedata/ssl && openssl x509 -noout -subject -in server.crt
For your reference: https://quickview.cloudapps.cisco.com/quickview/bug/CSCva68310
Correct pattern: /serialNumber=PID:<PID> SN:<Serial number>/CN=<Serial number> Incorrect Pattern: /CN=<Serial number>/serialNumber=PID:<PID> SN:<Serial number>
Stay safe,
Sergiu
09-06-2020 09:22 PM
Thanks for your responses @Sergiu.Daniluk and @Hector Gustavo Serrano Gutierrez,
I was able to login to the other APICs using rescue-user. All leaves and spines all have the correct pattern for the certificate. The APIC though, while it appears on acidiag verifyapic to pass all the check, on the GUI it still shows F3031. Upon further investigation I could see it has the wrong pattern (if that applies to the APIC as well?)
09-07-2020 01:41 AM
Hello @Timothy ACI,
I'm glad you can now login to APICs 2 & 3. You need to verify on those if the cert fault is also seen with moquery -c faultInst -f 'fault.Inst.code=="F3031"'
If so, check the cert pattern as you are already doing.
This problem is with the APIC controllers and not the Switches. You will need to engage Cisco TAC to get support in generating new certs for the APIC controllers impacted.
Regards.
09-12-2020 03:56 AM
Thanks Hector for your response.
Just had TAC regenerate certificates for all 3 APICs, saying there's probably a bug with my batch of APIC-SERVER-L3.
09-06-2020 11:10 AM - edited 09-06-2020 11:10 AM
Hello @Timothy ACI,
Just to add to Sergiu's accurate response, you can try to login to APIC-2 and APIC-3 with username rescue-user. It should be valid only because these APICs have not been able to form a cluster with APIC-1 due to the cert issue.
Going back to the original issue, you need to urgently engage Cisco TAC to assist you in generating new certs for your APICs (just check if all of them are impacted following Sergiu's suggestion)
Regards.
09-16-2020 01:45 AM
I will just leave this one here as it is related to the topic:
https://www.cisco.com/c/en/us/support/docs/field-notices/705/fn70594.html
Cheers,
Sergiu
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide