Re: Unraveling the Concept of Database Sharding in Cisco ACI APICs

tomy.tim · ‎12-06-2023

Introduction to the World of Cisco ACI APICs
At the heart of the Cisco ACI (Application Centric Infrastructure) infrastructure lies an essential component known as the APIC (Application Policy Infrastructure Controller). This key element is implemented in a cluster that forms the backbone of policy management and network configuration. Among the fundamental elements of this architecture is the crucial concept of "Database Sharding", a practice that deserves the attention of IT professionals in order to fully understand how this powerful technology works.
In this article, we'll explore the architecture of the Cisco APIC Cluster and the innovative database technology that drives it.

The Cisco APIC Cluster:
Cisco APIC operates in a cluster made up of at least three controllers. The scalability of the cluster is directly proportional to the needs of the ACI deployment and is scaled based on transaction rate requirements. The beauty of this model lies in its flexibility, where any controller in the cluster can serve any user for any operation. In addition, the addition or removal of controllers can take place transparently, guaranteeing dynamic and adaptable management. The APICs form a cluster and communicate with each other using the network provided by leaf and spine. This efficient approach allows continuous synchronization of policies and configurations throughout the cluster, ensuring a cohesive and integrated environment.

Figura 1

High availability of the APIC
For a robust production environment, it is essential to deploy a minimum of three redundant APIC nodes, but scalability is flexible, allowing the cluster to be expanded up to seven nodes. In laboratory environments, it is feasible to manage the ACI mesh even with a single node, provided it is properly configured. The direct relationship between the number of APIC nodes and the number of managed switches is crucial to achieving the desired transaction rate performance between them. According to Cisco's general scalability guidelines, the following recommendations should be followed:

- Three-node cluster: up to 80 leaf switches.
- Four-node cluster: up to 200 leaf switches.
- Cluster of five or six nodes: up to 400 leaf switches.
- Seven-node cluster: 400-500 leaf switches.
- For implementations requiring an even greater number of switches, it is advisable to consider implementing the Multi-Site architecture, using several APIC clusters.

APIC discovery and clustering is an automated process based on consistent initial configuration and LLDP information. As illustrated in Figure 2, an APIC cluster, crucial for performance optimization, employs the horizontal database scaling technique called sharding. A "shard" is essentially a unit of data, a subset of the database, or a group of database rows distributed across the nodes of the cluster to enable parallel processing and data protection in the event of a failure. Each APIC database shard has exactly three replicas, distributed across the cluster based on shard layouts defined for each cluster size.

The data itself is placed on the shards as a result of a hash function. One APIC takes on the role of master (or shard leader) on a specific shard and holds read and write rights to edit it; the other two are used as read-only replicas. When a master becomes inactive, the remaining APICs negotiate among themselves to determine who will become the new master. Now we can understand the origin of the APIC quorum. Only shards that have at least one backup replica are writable, justifying the need for at least two of the three APICs available to configure the fabric ACI.

Figure 2 APIC sharding

But be aware! When expanding the number of nodes in the APIC cluster to more than three, it is important to note that we will still maintain three replicas for each shard. Therefore, more nodes does not necessarily equate to more reliability and redundancy. Each additional APIC mainly contributes to greater scalability and the ability to manage a greater number of switches in the fabric.

Let's analyze a scenario with a larger APIC cluster, made up of five nodes, during an eventual failure. If we lose two nodes, technically we still maintain a quorum. However, take a look at Figure 3. With this layout of shards, some will be in a read-only state, while others can still be written to. It is strongly recommended to avoid any changes in this situation and prioritize restoring the cluster as soon as possible. If more than two nodes are lost, there is a high probability that some information will be irreversibly lost.

Figure 3 APIC Three+ node cluster failure

Your aim should always be to distribute the APICs across the ICA structure in such a way as to guarantee their protection against failures of more than two nodes at a time. of more than two nodes at a time. Of course, within a single mesh, you don't have much room for maneuver. In this case, try distributing the APIC nodes in different racks (or rows) and connecting each of them to independent power outlets.

In Multi-Pod architecture, it is recommended to connect two APICs in Pod 1 and the third APIC in one of the other Pods. In adverse situations, there is the possibility of losing two nodes, which would result in the absence of a quorum and an ACI loop operating in read-only mode. To get around this challenge, there is an alternative that involves preparing and deploying a standby APIC in advance, as illustrated in Figure 4. This hardware device is identical to the main one in terms of initial configuration, being instructed not to form a cluster with other APICs. Instead, it remains in the background, ready to spring into action in the event of a failure. While in standby mode, the APIC does not replicate data or participate in ACI operations.

However, the main cluster is aware of the presence of the standby APIC and allows any failed node to be replaced by it when necessary. The active APIC replicates its database to the standby APIC, re-establishing a read and write cluster to ensure quorum continuity.

Figure 4 Multi-Pod Standby APIC node

If you plan to deploy between 80 and 200 leaves in the Multi-Pod environment, the best performance and redundancy can be obtained by using four clusters of four nodes, distributed as much as possible around the pods. In a project with only two pods, place an additional standby APIC in each pod.

For a five-node cluster in MultiPOD, refer to the recommended distribution shown in Table 5.

Table 5 Five-Node APIC Cluster in ACI Multi-Pod

*Note: When you completely lose some shards during the POD1 outage, there is still a chance of data recovery using a procedure called ID Recovery. It uses configuration snapshots and must be performed by the business unit or Cisco technical support.
For a highly scaled ACI deployment consisting of more than 400 switches, a seven-node APIC cluster distribution should be implemented as shown in Table 6. The expert analyzing this approach would not deploy that number of switches and APICs in just two pods. You would end up with four nodes in the first pod and, in the event of a failure, there is no guarantee that the configuration can be recovered. For me, a safe minimum would be at least four pods.

Table 6 Seven-Node APIC Cluster in ACI Multi-Pod

*Note: the same ID recovery procedure also applies here

The APIC as a Database:

Each ACI service is organized and stored as part of a database. These services are encapsulated in shards, with each shard replicated three times to ensure high availability. For each shard, there is a leader and two followers, contributing to a robust and fault-tolerant architecture.

The APIC cluster status should always remain "Fully Fit", indicating the health of all APICs, as illustrated in figure 2. If the health status of one or more APIC controllers in the cluster is not fully fit, such as "Data Layer Partially Divergent", there may be problems with the APIC infrastructure communication or APIC processes. In this case, it is recommended that you avoid making any changes to the configuration until the problem has been properly diagnosed and resolved.

Figura 2

When administering the Cisco APIC cluster, several tasks can be performed, each requiring a specific procedure:
1. APIC cluster size expansion:
- Operation that increases the configured number of APICs in the cluster from a size of N to N+1, within legal limits.
- During expansion, discovery and expansion take place sequentially based on the APIC ID numbers. Initially, APIC1 is expanded, followed by APIC2, APIC3, and so on.

2. Reducing the size of the APIC cluster:
- Operation that decreases the configured number of APICs, going from a size of N to N-1, within legal limits.
- During shrinking, it is necessary to disable the last APIC in the cluster first, following the reverse sequential path. It starts with shutting down APIC4 (if it is the last APIC ID), followed by APIC3, APIC2 and finally APIC1.

3. Replacing an APIC in the cluster:
- Disable the APIC you want to replace and commission the replacement APIC using the same configuration and image as the APIC being replaced.

4. Preparing a cold standby APIC in the cluster:
- A standby APIC provides a cold standby option, recommended in Cisco ACI Stretched Fabric or Multipod deployments, where the possibility of an APIC split-brain situation is higher.
- It is recommended to have at least three APICs active in the cluster and one or more APICs in standby.

5. Shutting down and restarting APICs in the cluster:
- When necessary, you can shut down APICs in the cluster, for example when moving them to another location.
- Make sure you bring the APIC online as soon as possible and check that all the controllers in the cluster return to a fully functional state.

Database Technology
Before we delve into the explanation, I have always wanted to understand the concept of the three replicas that is applied in the Cisco ACI solution, and in order to pursue this understanding, I embarked on an in-depth study of Databases - Sharding. Only in this way was I able to fully clarify the subject.

Database Technology - Sharding:
A key component in the scalability of Cisco APIC Clustering is the database technology called sharding. The APIC configuration database is divided into logical subsets called shards, each with three replicas to ensure redundancy and resilience. These shards are distributed evenly among the APICs, promoting an effective workload balance.
These shards are distributed among the APICs using a predetermined hash function to determine data assignment. This approach not only optimizes performance, but also increases reliability, as the loss of one APIC does not compromise the entire system.

Scalability and Reliability:
The strength of this approach lies in the ability to scale dynamically as needs grow. The predetermined hash function and static shard layout determine how shards are assigned to devices. This database-specific methodology provides remarkable scalability, allowing APIC to manage large data sets efficiently and reliably.

Why Three Replicas or More is Better:

1. Resilience to Failure:
Having three replicas means that even if one of them fails, we still have two copies available. This creates a fundamental robustness, essential for mitigating unexpected failures.
2. Continuous Availability:
Maintaining three replicas helps to guarantee the continuous availability of services, even during planned maintenance or unexpected failures. We will always have at least two copies available to meet demands.
3. Rapid recovery:
In cases of failure, recovery becomes faster. With three copies, we can restore functionality quickly, minimizing the impact on operations.

In short, the practice of having three replicas is an investment in the reliability and resilience of IT systems. This not only protects against failures and interruptions, but also allows operations to continue smoothly. By understanding the value of redundancy, we can build more reliable and resilient IT environments, which are fundamental to the success of modern organizations.

The explanation of database sharding in the context of Cisco APIC clustering involves understanding how the APIC configuration database is managed to offer scalability and reliability. Let's break this explanation down into parts:
1. Cluster formation:
- The Cisco APIC (Application Policy Infrastructure Controller) is an integral part of Cisco's software-defined networking (SDN) solution.
- Clustering in the context of APIC involves creating groups of APICs that work together to manage the network infrastructure.
- An APIC cluster consists of several designated active APICs and designated standby APICs.

2. APIC heartbeats:
- Heartbeats are periodic signals sent between the APICs in a cluster to check the availability and status of each APIC.
- They are essential to ensure that each APIC is aware of the status of the others in the cluster.
- If an active APIC fails, the standby APICs can take over its functions to ensure continuity of infrastructure management.

3. Database Sharding:
- Sharding is a technique used in databases to divide large data sets into smaller parts called shards.
- In the case of APIC, the configuration database is partitioned into logical subsets called shards.
- Each shard has three replicas to ensure redundancy and resilience.
- These shards are distributed evenly among the APICs in the cluster.

4. Distribution of Shards:
- The distribution of shards among the APICs is done in order to balance the workload and ensure operational efficiency.
- A predetermined hash function is used to determine the data assignment of the shards.
- A static shard layout defines how shards are assigned to specific devices in the cluster.

5. Benefits of Sharding Technology:
- Scalability: Allows the system to grow efficiently as more resources are required by distributing the data load among the APICs.
- Reliability: The presence of replica shards guarantees high availability. If one APIC fails, the others can continue operating with replicas of the corresponding shards.

Conclusion:
In summary, the Cisco APIC Cluster in the ACI infrastructure represents the backbone of policy, configuration and service management. The combination of dynamic cluster architecture and sharding database technology offers a robust, scalable and reliable solution to meet the growing demands of modern networks. By understanding the inner workings of this ecosystem, network professionals can optimize the performance and effectiveness of their infrastructures. Cisco APIC Clustering is not only a response to current needs, but also a preparation for future challenges in the evolution of network infrastructures.

What if 3 APICs have not yet been installed?
When you enable the Cisco ACI mesh, you may have a single APIC or two APICs before you have a fully functional cluster. This is not the desired end state, but Cisco ACI allows you to configure the fabric with one APIC or with two APICs because the bootstrap is considered an exception.

- When you start a mesh with a single APIC, the shards are not in the minority because the cluster has not yet been fully tuned.
- This is a special case designed to support Zero Touch Provisioning.
- There is only one replica of each shard.
- The APIC will create new replicas as soon as a new APIC is added.
- This explains why, when you install the ACI fabric the first time, you can operate the fabric even with a single APIC

What if all the APICs are inactive?

Traffic forwarding persists for existing and new sessions. Some important considerations include:
- Leafs operate completely self-sufficiently in traffic forwarding and policy enforcement.
- Link failures are automatically recovered to ensure service continuity.
- The addition of new endpoints may or may not be successful, depending on the resolution and timing of the implementation.
- The effectiveness of Vmotion is also subject to the resolution and timing of the implementation.
- In the event of problems, such as the need to recover the Fabric ID, the existence of a snapshot can be crucial. In such cases, Cisco Technical Support (TAC) or Cisco Business Unit (BU) can provide assistance to facilitate recovery.

If you enjoyed the article, don't forget to leave your like or kudos!

Reference:

https://www.linkedin.com/posts/janjanovic_cisco-aci-zero-to-hero-a-comprehensive-activity-7014321865728925696-tYyt/

https://www.ciscolive.com/on-demand/learning-maps/data-center/aci.html

Tomy Tim

LearnWithSalman · ‎01-29-2025

Nice article Tomy, thank you.
For those who prefer video contents, I would like to mention my blog: Tips You Need to Know About: ACI Clustering & Sharding

joon12 · ‎01-29-2025

Great breakdown of Cisco ACI APIC clustering! The explanation of database sharding and redundancy strategies is particularly insightful. This article is a valuable resource for IT professionals looking to deepen their understanding of ACI infrastructure.