cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1032
Views
10
Helpful
5
Replies

Multiple disk failure in APIC

Abhishek Nanda
Level 1
Level 1

One of the APIC from a cluster of 5 has its two disks failed. I would like to know the details of the storage. I can see from the CIMC there are two raid volumes. One is on raid-1 and the other is on raid-0. While Im looking at the other APICs the boot variable is set to true on raid-1. Should I copy the configuration of other APICs such as PD 1,2 as VD-1(raid-1)(boot), PD 3 as VD-2(raid-0). Here PD 1 is SSD and 2&3 are HDDs. When I received the replaced disk

I tried to install OS keeping boot option once on different VDs but it was stuck at the boot-up process. Now I got both the disks replaced. Can anyone bring some light how does this work. And how do shards replicate in a 5 controller cluster.

1 Accepted Solution

Accepted Solutions

Hi @Abhishek Nanda 

yes you can check the states of the shards/replicas.

The basic commands are:

 

acidiag rvread 
acidiag rvread <service>
acidiag rvread <service> <shard>
acidiag rvread <service> <shard> <replica>

Check the table1 for list of services:

 

https://www.cisco.com/c/en/us/td/docs/switches/datacenter/aci/apic/sw/1-x/troubleshooting/b_APIC_Troubleshooting/b_APIC_Troubleshooting_appendix_010001.html#reference_E1C4EF57684F4736AEE735FEC3B35CD3

 

Also there is an example of how to check the leader for a shard:

 

apic1# acidiag rvread 6 3  
(6,3,1)  st:6 lm(t):3(2014-10-16T08:48:20.238+00:00) le: reSt:LEADER voGr:0 cuTerm:0x19 lCoTe:0x18 
    lCoIn:0x1800000000001b2a veFiSt:0x31 veFiEn:0x31 lm(t):3(2014-10-16T08:48:20.120+00:00) 
    lastUpdt 2014-10-16T09:08:30.240+00:00
(6,3,2)  st:6 lm(t):1(2014-10-16T08:47:25.323+00:00) le: reSt:FOLLOWER voGr:0 cuTerm:0x19 lCoTe:0x18 
    lCoIn:0x1800000000001b2a veFiSt:0x49 veFiEn:0x49 lm(t):1(2014-10-16T08:48:20.384+00:00) lp: clSt:2 
    lm(t):1(2014-10-16T08:47:03.286+00:00) dbSt:2 lm(t):1(2014-10-16T08:47:02.143+00:00) stMmt:1 
    lm(t):0(zeroTime) dbCrTs:2014-10-16T08:47:02.143+00:00 lastUpdt 2014-10-16T08:48:20.384+00:00
(6,3,3)  st:6 lm(t):2(2014-10-16T08:47:13.576+00:00) le: reSt:FOLLOWER voGr:0 cuTerm:0x19 lCoTe:0x18 
    lCoIn:0x1800000000001b2a veFiSt:0x43 veFiEn:0x43 lm(t):2(2014-10-16T08:48:20.376+00:00) 
    lastUpdt 2014-10-16T09:08:30.240+00:00

 

Stay safe,

Sergiu

 

View solution in original post

5 Replies 5

Sergiu.Daniluk
VIP Alumni
VIP Alumni

Hi @Abhishek Nanda 

I believe you will find all the answers to your questions about SDD replacement on APICs here:

https://www.cisco.com/c/en/us/support/docs/cloud-systems-management/application-policy-infrastructure-controller-apic/215166-apic-ssd-replacement.html

 

About the shards replication: APIC uses a replication factor of 3, meaning each shard has 3 instances/replicas (one active and two backup) across the cluster, regardless of the number of APIC nodes present in the cluster. This means that the replicas will be distributed across all 5 APICs. Because of these, is important to know what happens in case of node failures:

- if one Node fails - all good, cluster is still in RW

- if two Node fails - depending on the distribution, SOME shards will definitely be in read-only. Something like this:

Screenshot 2022-05-11 085117.png

 

Hope it helps,

Sergiu

Hi Sergiu, Have a good day!

 

athank you for this valuable reply. Is there any way to see how many shards are created and who leads which shard.

Hi @Abhishek Nanda 

yes you can check the states of the shards/replicas.

The basic commands are:

 

acidiag rvread 
acidiag rvread <service>
acidiag rvread <service> <shard>
acidiag rvread <service> <shard> <replica>

Check the table1 for list of services:

 

https://www.cisco.com/c/en/us/td/docs/switches/datacenter/aci/apic/sw/1-x/troubleshooting/b_APIC_Troubleshooting/b_APIC_Troubleshooting_appendix_010001.html#reference_E1C4EF57684F4736AEE735FEC3B35CD3

 

Also there is an example of how to check the leader for a shard:

 

apic1# acidiag rvread 6 3  
(6,3,1)  st:6 lm(t):3(2014-10-16T08:48:20.238+00:00) le: reSt:LEADER voGr:0 cuTerm:0x19 lCoTe:0x18 
    lCoIn:0x1800000000001b2a veFiSt:0x31 veFiEn:0x31 lm(t):3(2014-10-16T08:48:20.120+00:00) 
    lastUpdt 2014-10-16T09:08:30.240+00:00
(6,3,2)  st:6 lm(t):1(2014-10-16T08:47:25.323+00:00) le: reSt:FOLLOWER voGr:0 cuTerm:0x19 lCoTe:0x18 
    lCoIn:0x1800000000001b2a veFiSt:0x49 veFiEn:0x49 lm(t):1(2014-10-16T08:48:20.384+00:00) lp: clSt:2 
    lm(t):1(2014-10-16T08:47:03.286+00:00) dbSt:2 lm(t):1(2014-10-16T08:47:02.143+00:00) stMmt:1 
    lm(t):0(zeroTime) dbCrTs:2014-10-16T08:47:02.143+00:00 lastUpdt 2014-10-16T08:48:20.384+00:00
(6,3,3)  st:6 lm(t):2(2014-10-16T08:47:13.576+00:00) le: reSt:FOLLOWER voGr:0 cuTerm:0x19 lCoTe:0x18 
    lCoIn:0x1800000000001b2a veFiSt:0x43 veFiEn:0x43 lm(t):2(2014-10-16T08:48:20.376+00:00) 
    lastUpdt 2014-10-16T09:08:30.240+00:00

 

Stay safe,

Sergiu

 

Hi Sergiu,

Thank you for the reply and for sharing your knowledge. It is really helpful  

Your very welcome! Happy to hear that information I shared is useful to you.

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community:

Save 25% on Day-2 Operations Add-On License