Solved: FI L1/L2 links loss

delahais · ‎11-20-2012

Hi,

One of my customers asks me about the L1/L2 links loss between the 2 FI.

On my mind, this case must not be happened, that why we have 2 links but I do not know what have happened exactly in this case... the 2 FI become master in the UCS cluster....

any idea or already tester ?

thx

Robert Burns · ‎11-22-2012

The L1/L2 are by the Data Management Engine (DME) to sync config only - there's no datapath on these links.

In the event these links both go down, there's a back mechanism built into the system to prevent split brain. Each Chassis has an SEEPROM (flash). Within the SEEPROM there is a shared area that each FI can read & write to, and two sub-areas; one owned by each FI in which only that FI can write to. The sub-areas are used to store information about config DB ownership in the even an FI fails, changes are made and it's re-introduced. This prevents "stale" config over-writing any changes made while one FI was offline.

If there's a cluster llink (L1/L2) failure, the FI's read and write small counters to the shared area. This serves as a secondary heartbeat to alert the other FI is still active. During this time the cluster puts itself into a "Failed-Link" state and prevents any primary/secondary elections. Everything stays as-si - Primary remains primary, secondary remains secondary.

Regards,

Robert

View solution in original post

Robert Burns · ‎11-22-2012

The L1/L2 are by the Data Management Engine (DME) to sync config only - there's no datapath on these links.

In the event these links both go down, there's a back mechanism built into the system to prevent split brain. Each Chassis has an SEEPROM (flash). Within the SEEPROM there is a shared area that each FI can read & write to, and two sub-areas; one owned by each FI in which only that FI can write to. The sub-areas are used to store information about config DB ownership in the even an FI fails, changes are made and it's re-introduced. This prevents "stale" config over-writing any changes made while one FI was offline.

If there's a cluster llink (L1/L2) failure, the FI's read and write small counters to the shared area. This serves as a secondary heartbeat to alert the other FI is still active. During this time the cluster puts itself into a "Failed-Link" state and prevents any primary/secondary elections. Everything stays as-si - Primary remains primary, secondary remains secondary.

Regards,

Robert