07-10-2023 06:48 AM
Hey!
So we have 3 apics that are extermly slow, both the CLI and the GUI so we wanted to upgrade them, but the upgrade failed due to unknown reasons.
When trying to understand and troubleshoot We ran into something weird.
phmoapc-166014-24# acidiag cluster
Admin password:
Running...
Checking Wiring and UUID: OK
Checking AD Processes: Running
Checking All Apics in Commission State: OK
Checking All Apics in Active State: OK
Checking Fabric Nodes: OK
Checking Apic Fully-Fit: Not Fully Fit Apics: IFC-1 IFC-2 IFC-3
Checking Shard Convergence: OK
Checking Leadership Degration: Non optimal leader for shards : 3:1,3:2,3:4,3:5,3:7,3:8,3:10,3:11,3:13,3:16,3:17,3:19,3:20,3:22,3:23,3:25,3:26,3:28,3:31,6:1,6:2,6:4,6:5,6:7,6:8,6:10,6:11,6:13,6:16,6:17,6:19,6:20,6:22,6:23,6:25,6:26,6:28,6:31,6:32,9:1,9:2,9:4,9:5,9:7,9:8,9:10,9:11,9:13,9:16,9:17,9:19,9:20,9:22,9:23,9:25,9:26,9:28,9:29,9:31,10:1,10:2,10:4,10:5,10:7,10:8,10:10,10:11,10:13,10:14,10:16,10:17,10:19,10:20,10:22,10:23,10:25,10:26,10:28,10:31,11:1,11:2,11:4,11:5,11:7,11:8,11:10,11:11,11:13,11:14,11:16,11:17,11:19,11:20,11:22,11:23,11:25,11:26,11:28,11:31,14:1,14:2,14:4,14:5,14:7,14:8,14:10,14:11,14:13,14:14,14:16,14:17,14:19,14:20,14:22,14:23,14:25,14:26,14:28,14:31,16:1,16:2,16:4,16:5,16:7,16:10,16:11,16:13,16:14,16:16,16:17,16:19,16:20,16:22,16:23,16:25,16:26,16:28,16:31,22:1,22:2,22:4,22:5,22:7,22:8,22:10,22:11,22:13,22:14,22:16,22:17,22:19,22:20,22:22,22:25,22:28,22:31,22:32,23:1,23:2,23:4,23:5,23:7,23:8,23:10,23:11,23:13,23:14,23:16,23:17,23:19,23:20,23:22,23:23,23:25,23:26,23:28,23:31,33:1,34:1,34:2,34:4,34:5,34:7,34:8,34:10,34:11,34:13,34:16,34:17,34:19,34:20,34:22,34:23,34:25,34:26,34:28,34:31,35:1,35:2,35:4,35:5,35:7,35:8,35:10,35:11,35:13,35:14,35:16,35:17,35:19,35:22,35:25,35:26,35:28,35:31,36:1,39:1,39:2,39:4,39:5,39:7,39:8,39:10,39:11,39:13,39:14,39:16,39:17,39:19,39:20,39:22,39:23,39:25,39:26,39:28,39:31
Ping OOB IPs:
APIC-1: 192.168.199.9 - OK
APIC-2: 192.168.199.10 - OK
APIC-3: 192.168.199.11 - OK
Ping Infra IPs:
APIC-1: 10.0.0.1 - OK
APIC-2: 10.0.0.2 - OK
APIC-3: 10.0.0.3 - Ping failed
Checking APIC Versions: Cluster Version:5.2(7g) Imcompatible Apics: IFC-1(5.2(4e)) IFC-2(5.2(4e)) IFC-3(5.2(4e))
Checking SSL: OK
Full file system(s): None
Done!
For some reason the ping between the infra IPs fails, does someone has a clue where to begin searching for the error?
Solved! Go to Solution.
08-13-2023 06:50 AM
I finally found the problem.
I had 4 connections between APIC3&1 and the leafs instead of 2.
VIC 1455 is used on the APICs. All 4-ports on this NIC were cabled to the Leaf.
after shut 2 ports it worked!
07-10-2023 06:56 AM
Is the 3rd APIC is the same or different Pod as 1 & 2?
Robert
07-10-2023 06:58 AM
all in the same pod.
07-10-2023 07:41 AM
Check both fabric links from APIC3 CLI:
acidiag run lldptool in eth2-1 | grep topo
acidiag run lldptool in eth2-2 | grep topo
If all looks fine, initiate a ping from APIC3 > APIC1 / APIC2's TEP address
Robert
07-10-2023 07:51 AM
phmoapc-166018-25# acidiag run lldptool in eth2-1 | grep topo
topology/pod-1/paths-103/pathep-[eth1/1]
topology/pod-1/node-103
phmoapc-166018-25# acidiag run lldptool in eth2-2 | grep topo
topology/pod-1/paths-104/pathep-[eth1/1]
topology/pod-1/node-104
07-10-2023 09:01 AM
From APIC3 and one other APIC:
acidiag avread
acidiag rvread
Robert
07-10-2023 02:43 PM - edited 07-10-2023 02:50 PM
Hi @Rem Markov ,
TIP: Let me modify @Robert Burns ' suggestion.
Instead of issuing the commands...
acidiag avread
acidiag rvread
... ditch the acidiag part and just issue the commands:
avread
rvread
You'll get a much more readable output, and (in the case of rvread
), some extra information. And BTW - fnvread
also give virtually the same information as acidiag fnvread
and saves typing.
Caveat: avread
does not give you timestamps or the complete serial number. (Which is why the output is more readable)
07-10-2023 11:04 PM - edited 07-10-2023 11:09 PM
APIC 3:
```
phmoapc-166018-25# avread
Cluster:
-------------------------------------------------------------------------
fabricDomainName MedOne
discoveryMode PERMISSIVE
clusterSize 3
version apic-5.2(7g)
drrMode OFF
operSize 3
APICs:
-------------------------------------------------------------------------
APIC 1 APIC 2 APIC 3
version 5.2(4e) 5.2(4e) 5.2(4e)
address 10.0.0.1 10.0.0.2 10.0.0.3
oobAddress 192.168.199.9/24 192.168.199.10/24 192.168.199.11/24
routableAddress 0.0.0.0 0.0.0.0 0.0.0.0
tepAddress 10.0.0.0/16 10.0.0.0/16 10.0.0.0/16
podId 1 1 1
chassisId 8ebc498a-.-7af699ea 043f74d8-.-67674e20 596a5378-.-1857d6f8
cntrlSbst_serial (APPROVED,WMP251900CG) (APPROVED,WMP251900D4) (APPROVED,WMP251900BK)
active YES YES YES
flags cra- cra- cra-
health 255 255 255
phmoapc-166018-25# rvread
\- unexpected state; /-unexpected mutator;
s->R 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32lcl
r->R123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123lcl
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
Replicas are in expected states and are mutated by proper apic's
---------------------------------------------
clusterTime=<diff=0 common=2023-07-11T06:03:53.660+00:00 local=2023-07-11T06:03:53.660+00:00 pF=<displForm=0 offsSt=0 offsVlu=0 lm(t):3(2022-06-06T07:34:08.367+00:00)>>
Non optimal leader for shards : 3:1,3:2,3:4,3:5,3:7,3:8,3:10,3:11,3:13,3:14,3:16,3:17,3:19,3:20,3:22,3:23,3:25,3:26,3:28,3:29,3:31,3:32,6:1,6:2,6:4,6:5,6:7,6:8,6:10,6:11,6:13,6:14,6:16,6:17,6:19,6:20,6:22,6:23,6:25,6:26,6:28,6:29,6:31,6:32,9:1,9:2,9:4,9:5,9:7,9:8,9:10,9:11,9:13,9:14,9:16,9:17,9:19,9:20,9:22,9:23,9:25,9:26,9:28,9:29,9:31,9:32,10:1,10:2,10:4,10:5,10:7,10:8,10:10,10:11,10:13,10:14,10:16,10:17,10:19,10:20,10:22,10:23,10:25,10:26,10:28,10:29,10:31,10:32,11:1,11:2,11:4,11:5,11:7,11:8,11:10,11:11,11:13,11:14,11:16,11:17,11:19,11:20,11:22,11:23,11:25,11:26,11:28,11:29,11:31,11:32,14:1,14:2,14:4,14:5,14:7,14:8,14:10,14:11,14:13,14:14,14:16,14:17,14:19,14:20,14:22,14:23,14:25,14:26,14:28,14:29,14:31,14:32,16:1,16:2,16:4,16:5,16:7,16:8,16:10,16:11,16:13,16:14,16:16,16:17,16:19,16:20,16:22,16:23,16:25,16:26,16:28,16:29,16:31,16:32,22:1,22:2,22:4,22:5,22:7,22:8,22:10,22:11,22:13,22:14,22:16,22:17,22:19,22:20,22:22,22:23,22:25,22:26,22:28,22:29,22:31,22:32,23:1,23:2,23:4,23:5,23:7,23:8,23:10,23:11,23:13,23:14,23:16,23:17,23:19,23:20,23:22,23:23,23:25,23:26,23:28,23:29,23:31,23:32,33:1,34:1,34:2,34:4,34:5,34:7,34:8,34:10,34:11,34:13,34:14,34:16,34:17,34:19,34:20,34:22,34:23,34:25,34:26,34:28,34:29,34:31,34:32,35:1,35:2,35:4,35:5,35:7,35:8,35:10,35:11,35:13,35:14,35:16,35:17,35:19,35:20,35:22,35:23,35:25,35:26,35:28,35:29,35:31,35:32,36:1,39:1,39:2,39:4,39:5,39:7,39:8,39:10,39:11,39:13,39:14,39:16,39:17,39:19,39:20,39:22,39:23,39:25,39:26,39:28,39:29,39:31,39:32
---------------------------------------------
clusterTime=<diff=1 common=2023-07-11T06:03:54.448+00:00 local=2023-07-11T06:03:54.447+00:00 pF=<displForm=0 offsSt=0 offsVlu=0 lm(t):3(2022-06-06T07:34:08.367+00:00)>>
```
APIC 2:
```
phmoapc-166014-25# avread
Cluster:
-------------------------------------------------------------------------
fabricDomainName MedOne
discoveryMode PERMISSIVE
clusterSize 3
version apic-5.2(7g)
drrMode OFF
operSize 3
APICs:
-------------------------------------------------------------------------
APIC 1 APIC 2 APIC 3
version 5.2(4e) 5.2(4e) 5.2(4e)
address 10.0.0.1 10.0.0.2 10.0.0.3
oobAddress 192.168.199.9/24 192.168.199.10/24 192.168.199.11/24
routableAddress 0.0.0.0 0.0.0.0 0.0.0.0
tepAddress 10.0.0.0/16 10.0.0.0/16 10.0.0.0/16
podId 1 1 1
chassisId 8ebc498a-.-7af699ea 043f74d8-.-67674e20 596a5378-.-1857d6f8
cntrlSbst_serial (APPROVED,WMP251900CG) (APPROVED,WMP251900D4) (APPROVED,WMP251900BK)
active YES YES YES
flags cra- cra- cra-
health 255 255 255
phmoapc-166014-25# rvread
\- unexpected state; /-unexpected mutator;
s->R 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32lcl
r->R123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123lcl
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
Replicas are in expected states and are mutated by proper apic's
---------------------------------------------
clusterTime=<diff=198200 common=2023-07-11T06:05:21.084+00:00 local=2023-07-11T06:02:02.884+00:00 pF=<displForm=0 offsSt=0 offsVlu=0 lm(t):3(2022-06-06T07:34:08.367+00:00)>>
Non optimal leader for shards : 3:1,3:2,3:4,3:5,3:7,3:8,3:10,3:11,3:13,3:14,3:16,3:17,3:19,3:20,3:22,3:23,3:25,3:26,3:28,3:29,3:31,3:32,6:1,6:2,6:4,6:5,6:7,6:8,6:10,6:11,6:13,6:14,6:16,6:17,6:19,6:20,6:22,6:23,6:25,6:26,6:28,6:29,6:31,6:32,9:1,9:2,9:4,9:5,9:7,9:8,9:10,9:11,9:13,9:14,9:16,9:17,9:19,9:20,9:22,9:23,9:25,9:26,9:28,9:29,9:31,9:32,10:1,10:2,10:4,10:5,10:7,10:8,10:10,10:11,10:13,10:14,10:16,10:17,10:19,10:20,10:22,10:23,10:25,10:26,10:28,10:29,10:31,10:32,11:1,11:2,11:4,11:5,11:7,11:8,11:10,11:11,11:13,11:14,11:16,11:17,11:19,11:20,11:22,11:23,11:25,11:26,11:28,11:29,11:31,11:32,14:1,14:2,14:4,14:5,14:7,14:8,14:10,14:11,14:13,14:14,14:16,14:17,14:19,14:20,14:22,14:23,14:25,14:26,14:28,14:29,14:31,14:32,16:1,16:2,16:4,16:5,16:7,16:8,16:10,16:11,16:13,16:14,16:16,16:17,16:19,16:20,16:22,16:23,16:25,16:26,16:28,16:29,16:31,16:32,22:1,22:2,22:4,22:5,22:7,22:8,22:10,22:11,22:13,22:14,22:16,22:17,22:19,22:20,22:22,22:23,22:25,22:26,22:28,22:29,22:31,22:32,23:1,23:2,23:4,23:5,23:7,23:8,23:10,23:11,23:13,23:14,23:16,23:17,23:19,23:20,23:22,23:23,23:25,23:26,23:28,23:29,23:31,23:32,33:1,34:1,34:2,34:4,34:5,34:7,34:8,34:10,34:11,34:13,34:14,34:16,34:17,34:19,34:20,34:22,34:23,34:25,34:26,34:28,34:29,34:31,34:32,35:1,35:2,35:4,35:5,35:7,35:8,35:10,35:11,35:13,35:14,35:16,35:17,35:19,35:20,35:22,35:23,35:25,35:26,35:28,35:29,35:31,35:32,36:1,39:1,39:2,39:4,39:5,39:7,39:8,39:10,39:11,39:13,39:14,39:16,39:17,39:19,39:20,39:22,39:23,39:25,39:26,39:28,39:29,39:31,39:32
---------------------------------------------
clusterTime=<diff=198200 common=2023-07-11T06:05:21.848+00:00 local=2023-07-11T06:02:03.648+00:00 pF=<displForm=0 offsSt=0 offsVlu=0 lm(t):3(2022-06-06T07:34:08.367+00:00)>>
```
07-11-2023 05:02 AM
A lastly do a ping from APIC3 ==> APIC1 & 2 using it's TEP address
Robert
07-11-2023 05:42 AM
Ping from APIC3 to APIC1:
phmoapc-166018-25# ping 10.0.0.1
PING 10.0.0.1 (10.0.0.1) 56(84) bytes of data.
--- 10.0.0.1 ping statistics ---
9 packets transmitted, 0 received, 100% packet loss, time 8133ms
Ping from APIC3 to APIC2:
phmoapc-166018-25# ping 10.0.0.2
PING 10.0.0.2 (10.0.0.2) 56(84) bytes of data.
64 bytes from 10.0.0.2: icmp_seq=1 ttl=61 time=0.179 ms
64 bytes from 10.0.0.2: icmp_seq=2 ttl=61 time=0.127 ms
64 bytes from 10.0.0.2: icmp_seq=3 ttl=61 time=0.202 ms
--- 10.0.0.2 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2051ms
rtt min/avg/max/mdev = 0.127/0.169/0.202/0.033 ms
07-11-2023 07:05 AM - edited 07-11-2023 07:08 AM
Let's complete the test and ping from APIC2 => APIC1 and then to APIC3 (see if initiating traffic in the reverse direction has any impact).
So far we know we have an infra connectivity issue between APIC3 & APIC1. Let's get a clear understanding of the full extent.
Robert
07-11-2023 07:10 AM
Ping apic2 to apic1:
phmoapc-166014-25# ping 10.0.0.1
PING 10.0.0.1 (10.0.0.1) 56(84) bytes of data.
64 bytes from 10.0.0.1: icmp_seq=1 ttl=61 time=0.224 ms
64 bytes from 10.0.0.1: icmp_seq=2 ttl=61 time=0.289 ms
--- 10.0.0.1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 0.224/0.256/0.289/0.036 ms
Ping apic2 to apic3:
phmoapc-166014-25# ping 10.0.0.3
PING 10.0.0.3 (10.0.0.3) 56(84) bytes of data.
64 bytes from 10.0.0.3: icmp_seq=1 ttl=61 time=0.110 ms
64 bytes from 10.0.0.3: icmp_seq=2 ttl=61 time=0.192 ms
--- 10.0.0.3 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 0.110/0.151/0.192/0.041 ms
07-11-2023 07:23 AM
Immeidatley after successfully pinging APIC3 FROM APIC2, does the reverse work? The address would be resolved if that's the issue.
Robert
07-11-2023 07:29 AM
I didn't check immediately, but it does work in general. The problem is that Apic 3 and Apic 1 can't ping each other .
07-11-2023 05:17 AM
I'd also suggest a controller reboot of each controller, starting with APIC3. Don't proceed with the next controller until the cluster returns to fully fit state.
Robert
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide