Single Spine Fabric Upgrade Behaviour

jakobsvc · ‎12-02-2019

Hello,

today I upgraded a Single-Spine ACI Fabric (Lab) (1x baby-spine (C9336PQ), 2x leafs (C93180YC-EX)) from 3.2(6n) to 4.2(2g).

During the upgrade I had a ping from running from Server A (connected via VPC to Leaf1 and Leaf2) to Server B (connected via VPC to Leaf1 and Leaf2) and a ping from my machine to both servers.

At the moment when the spine rebooted all VPCs on the leaf that was still running and waiting for its VPC-Peer to come back up, went down on the local side. So, both servers went offline.

The single-homed hosts (Leaf Access Ports) connected to the same switch stayed online.

Now I am not sure if this is the normal behaviour, that when all uplinks to the spines are down, that the VPCs go down as well - Port Tracking is disabled!

Maybe someone can explain the technical reason why this is happening.

Thanks in advance

joezersk · ‎12-02-2019

Just off the top of my head, could it be because the VPC peer link traverses the spine?

Claudia de Luna · ‎12-02-2019

Hi @jakobsvc,

This behavior is not surprising at all in a single spine deployment. Remember that, as @joezersk pointed out, you don't connect two leafs operating in a vpc protection group (pair) to eachother because the Peer Link happens across the fabric which means across the spine. The spines are pretty important in an ACI fabric and so its best practice to have redundancy there. Remember they serve as your fabric route reflectors, they carry location tables for all hosts and are the go-to for unknown traffic just to name a few pretty critical functions.

I would not expect to get the hitless upgrade experience you certainly can achieve with ACI with a single spine, I'm afraid.

jakobsvc · ‎12-03-2019

Hi, thanks for your input.
I'm aware that the peer-link is formed over the fabric, but I thought that it would trigger a split-brain-behaviour (secondary-vpc-peer would go operational primary) and the ports would stay up. I also found Cisco-slides from 2015 that mention implicit uplink tracking in the context of VPCs, that would explain this behaviour, but is never to be read elsewhere.

Claudia de Luna · ‎12-03-2019

Hi @jakobsvc,

You are right that there is no good information in one place. If you recall that the spines perform the RR function, ISIS for host routes in the fabric, and COOP, and then consider that a design with a single spine has no redundancy for those critical functions it may make a bit more sense. Also, while they are called vPCs in ACI, the behave a bit differently under to hood. They don't use Cisco Fabric Services (CFS) (this is where I've seen those split brain discussions) but leverage IFS (ACI Fabric Services) which is based on ZMQ (also used by COOP which sends endpoint info to the spines). Not sure if any of that helps at all but I've included an old CL presentation below with two slides you might find useful.

I also recommend looking at the Endpoint Movement and Bounce Entries section of the ACI Fabric Endpoint Learning White Paper which is quite good.

Bottom line, you won't get the level of redundancy you should have in your data center with a single spine, but I suspect you know that already!

Form an 2015 CL presentation