11-20-2012 01:56 AM - edited 03-01-2019 10:43 AM
Hi,
One of my customers asks me about the L1/L2 links loss between the 2 FI.
On my mind, this case must not be happened, that why we have 2 links but I do not know what have happened exactly in this case... the 2 FI become master in the UCS cluster....
any idea or already tester ?
thx
Solved! Go to Solution.
11-22-2012 10:33 AM
The L1/L2 are by the Data Management Engine (DME) to sync config only - there's no datapath on these links.
In the event these links both go down, there's a back mechanism built into the system to prevent split brain. Each Chassis has an SEEPROM (flash). Within the SEEPROM there is a shared area that each FI can read & write to, and two sub-areas; one owned by each FI in which only that FI can write to. The sub-areas are used to store information about config DB ownership in the even an FI fails, changes are made and it's re-introduced. This prevents "stale" config over-writing any changes made while one FI was offline.
If there's a cluster llink (L1/L2) failure, the FI's read and write small counters to the shared area. This serves as a secondary heartbeat to alert the other FI is still active. During this time the cluster puts itself into a "Failed-Link" state and prevents any primary/secondary elections. Everything stays as-si - Primary remains primary, secondary remains secondary.
Regards,
Robert
11-22-2012 10:33 AM
The L1/L2 are by the Data Management Engine (DME) to sync config only - there's no datapath on these links.
In the event these links both go down, there's a back mechanism built into the system to prevent split brain. Each Chassis has an SEEPROM (flash). Within the SEEPROM there is a shared area that each FI can read & write to, and two sub-areas; one owned by each FI in which only that FI can write to. The sub-areas are used to store information about config DB ownership in the even an FI fails, changes are made and it's re-introduced. This prevents "stale" config over-writing any changes made while one FI was offline.
If there's a cluster llink (L1/L2) failure, the FI's read and write small counters to the shared area. This serves as a secondary heartbeat to alert the other FI is still active. During this time the cluster puts itself into a "Failed-Link" state and prevents any primary/secondary elections. Everything stays as-si - Primary remains primary, secondary remains secondary.
Regards,
Robert
11-22-2012 10:35 PM
Just to add to it, there is great doc about UCSM architecture.
Split brain scenario is discussed under " Availability " section
Cisco UCS Manager Architecture
http://www.cisco.com/en/US/prod/collateral/ps10265/ps10281/white_paper_c11-525344.html
HTH
Padma
11-23-2012 08:32 AM
thx Robert and Padma, for your reply.
By searching, i have found this url with pretty nice explaination :
http://ucsguru.com/2012/11/07/ha-with-ucsm-integrated-rack-mounts/
best regards.
Nicolas.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide