Solved: Re: OSPF DR/BDR election with VLANs

penguinp · ‎08-30-2023

Hello,

Could anyone explain how DR and BDR are supposed to be elected in this environment?

I configured OSPF for all of the routers, and provided each router the networks adjacent to its interfaces. But I did not change any priorities so I could see how router IDs would be determined, and how DR and BDR would be elected.

Each

g0/0

on R1 and R3 has no IP address, they have two sub-interfaces respectively.

Initially, I assigned

VLAN 10 to g0/0.1 on R1, VLAN 20 to g0/0.2 on R1, VLAN 30 to g0/0.1 on R3, and VLAN 40 to g0/0.2 on R3

Then I recreated the same environment, but assigned

VLAN 30 to g0/0.2 on R3, and VLAN 40 to g0/0.1

on R3.

I have tried a couple of other variations, and the process of DR/BDR election was different every single time. Sometimes the loopback address on R2 was elected as its router ID and also DR/BDR, sometimes an IP address from one of R1/R3's interfaces was randomly elected as DR/BDR.

The process was extremely inconsistent and it seems like there is no rules or patterns. I assume it differs depending on the pairs of sub-interfaces and VLANs, but I am not quite sure since I am only new to this field.

I would be grateful if anyone could explain this phenomenon to me. Please let me know if the diagram or my exposition is not enough to draw a conclusion. I have some logs which I could show you.

Peter Paluch · ‎08-31-2023

My friends,

Please allow me to join.

@penguinp , there are a few things on which we need to get to a mutual understanding.

Regarding the OSPF Router ID (RID) selection, Cisco IOS-based routers follow this algorithm:

1) Use the RID manually configured in this OSPF process. Stop here if RID has been selected, otherwise continue with the following step.

2) Among all non-shutdown loopbacks in the same VRF as the OSPF process, pick the highest IP that is not yet owned as a RID by another OSPF process. Stop here if RID has been selected, otherwise continue with the following step.

3) Among all non-shutdown non-loopback interfaces in the same VRF as the OSPF process, pick the highest IP that is not yet owned as a RID by another OSPF process.

Usually, when there are no multiple VRFs on the router and no multiple OSPF processes running, the algorithm above is explained in a simplified way: Use the manually configured RID; if not configured, use the highest IP from among non-shutdown loopbacks; if still no luck, use the highest IP from all other remaining non-shutdown interfaces.

Once the OSPF process chooses its RID, this RID remains constant unless you manually configure it to a different value or remove the manually configured value and restart the OSPF process - or remove and readd the OSPF process completely. Removing or readdressing the interface whose IP has been used as the RID, or adding new interfaces after the OSPF process has chosen its RID, has no effect on the RID of the running OSPF process.

This algorithm is deterministic - assuming that you only start a single OSPF process on a router at any given time, it will always produce the same OSPF RID. Therefore, I would first suggest keeping a close eye on the OSPF RIDs chosen by your individual routers and checking if they are picked according to the rules above. If not, let's dig deeper into that.

Then, regarding the DR/BDR elections, keep in mind that they are performed on a per-multiaccess-segment basis (or, somewhat imprecisely, on a per-IP-subnet basis), not on a per-area or per-entire-network basis. I am pointing this out to be sure that when talking about DR/BDR, we always have to look at the given subnet, all the routers present in that subnet, and then on their OSPF interface priorities in that subnet, and finally their RIDs.

In addition, DR/BDR elections in OSPF are non-preemptive (and hence not fully deterministic - they depend on timing). Once a DR is chosen on a segment, another router coming later cannot hijack the DR role even if it has a higher priority and/or a higher RID. The same is valid for BDR.

And this particular behavior makes me wonder if it could perhaps explain what you are experiencing. If a router that has just come online on a segment sees that there is no DR or BDR elected, it will wait for 40 seconds (it's always the same as the Dead interval) and then elect the DR and BDR from its own viewpoint. But if the DR or BDR are already elected, the new router will simply accept them.

So if you bring up one router to start sending and receiving OSPF packets on a segment, and then need more than 40 seconds to bring up a second router to start sending and receiving OSPF packets on the same segment, the first router will elect itself as the DR and the second one won't be able to win the elections.

Could this explain what you saw?

Best regards,
Peter

View solution in original post

David Ruess · ‎08-30-2023

Hello,

There are rules that need to be followed for the DR/BDR election. For each device running OSPF it will select a router ID per process in the following order:

1. Configure RID <- if you manually configure an RID in the OSPF process this takes precedence

2. Highest Loopback interface in the UP/UP state

3. Highest physical (subinterface) IP address in the UP/UP state.

The possible reason for variation is the order you configure things such as bringing up OSPF before all the interfaces are configured. Remember the RID is per device/OSPF process so it could use the loopback for one device/process and a subinterface for another device/process.

Try configuring the RID (recommended anyway to not get obscure results when rebooting).

Feel free to provide logs and ask further if needed.

-David

penguinp · ‎08-30-2023

Thank you for replying.

I made sure to bring up OSPF after I finished configuring all the other stuff each time. And I even tried clearing IP OSPF process multiple times. But the results were different depending on which VLANs I assigned to which sub-interfaces.

I should make it clear that the results were actually consistent for the same pairs. For example, if I assigned VLAN X to sub-interface X, and VLAN Y to sub-interface Y, the result would be the same regardless how many times I try. But once I changed the pairs, VLAN X to sub-interface Y and VLAN Y to sub-interface X for instance, the results, including their RIDs and DR/BDR would be different. That's what I meant by "inconsistent".

I could avoid getting those obscure results by configuring RID manually. But I am just curious to see if there is any reasons behind the phenomenon I experienced.

Could VLAN potentially change OSPF process?

Giuseppe Larosa · ‎08-31-2023

Hello @penguinp ,

there is also a timing aspect on DR/BDR election that can play a role . each interface has a wait time equal to 40 seconds where the device waits for other devices to speak before electing itself to DR if no one is heard within the wait time.

If during your tests you make changes slowly or fast this WAIT timer comes to play and it change the result making the first active router on the VLAN to become the DR even if later better devices are added they cannot preempt.

Hope to help

Giuseppe

David Ruess · ‎08-31-2023

No i don’t believe VLANs change the ospf process. Can you provide the output of a switch that changed the RID when you changed I terrace parameters.

Secondly I will note that clearing the OSPF process DOES NOT change the RID. That only happens on the reload of the router or a manual configuration and then removal of that manual configuration.
-David

Peter Paluch · ‎08-31-2023