DR&BDR

ntmanjunath · ‎08-04-2016

I have configure OSPF in broadcast domain & all three routers are elected DR , its not taken highest IP address of interface in this configuration what could be the issue.

R2#sh ip ospf neighbor

Neighbor ID Pri State Dead Time Address Interface
172.16.10.1 1 FULL/DR 00:00:30 10.10.30.2 Ethernet1/1
40.40.40.1 1 FULL/DR 00:00:30 10.10.20.2 Ethernet0/0
192.168.1.1 1 FULL/DR 00:00:39 10.10.10.1 Ethernet1/0

R1

interface Ethernet0/0
bandwidth 1000
ip address 10.10.20.2 255.255.255.0
full-duplex
!
interface Ethernet1/0
bandwidth 2000
ip address 10.10.40.1 255.255.255.0
full-duplex
!
interface Ethernet1/1
bandwidth 9000
ip address 40.40.40.1 255.255.255.0
full-duplex
!
interface Ethernet1/2
no ip address
shutdown
half-duplex
!
interface Ethernet1/3
no ip address
shutdown
half-duplex
!
router ospf 1
log-adjacency-changes
network 10.10.20.0 0.0.0.255 area 0
network 10.10.40.0 0.0.0.255 area 0
network 40.40.40.0 0.0.0.255 area 0
!

R2

interface Ethernet0/0
bandwidth 90000
ip address 10.10.20.1 255.255.255.0
full-duplex
!
interface Ethernet1/0
bandwidth 7000
ip address 10.10.10.2 255.255.255.0
full-duplex
!
interface Ethernet1/1
bandwidth 8000
ip address 10.10.30.1 255.255.255.0
half-duplex
!
interface Ethernet1/2
no ip address
shutdown
half-duplex
!
interface Ethernet1/3
no ip address
shutdown
half-duplex
!
router ospf 1
log-adjacency-changes
network 10.10.10.0 0.0.0.255 area 0
network 10.10.20.0 0.0.0.255 area 0
network 10.10.30.0 0.0.0.255 area 0
!

R3

interface Ethernet0/0
bandwidth 90000
ip address 10.10.40.2 255.255.255.0
full-duplex
!
interface Ethernet1/0
bandwidth 10000
ip address 10.10.30.2 255.255.255.0
full-duplex
!
interface Ethernet1/1
bandwidth 300
ip address 172.16.10.1 255.255.255.0
full-duplex
!
interface Ethernet1/2
no ip address
shutdown
half-duplex
!
interface Ethernet1/3
no ip address
shutdown
half-duplex
!
router ospf 1
log-adjacency-changes
network 10.10.30.0 0.0.0.255 area 0
network 10.10.40.0 0.0.0.255 area 0
network 172.16.10.0 0.0.0.255 area 0

r4

interface Ethernet0/0
bandwidth 6000
ip address 10.10.10.1 255.255.255.0
full-duplex
!
interface Ethernet1/0
bandwidth 5000
ip address 192.168.1.1 255.255.255.0
full-duplex
!
interface Ethernet1/1
no ip address
shutdown
half-duplex
!
interface Ethernet1/2
no ip address
shutdown
half-duplex
!
interface Ethernet1/3
no ip address
shutdown
half-duplex
!
router ospf 1
log-adjacency-changes
network 10.10.10.0 0.0.0.255 area 0
network 192.168.1.0 0.0.0.255 area 0

chrihussey · ‎08-04-2016

It is not the highest IP of the interface but the higher OSPF router ID. Take a look at those. By default the router will use the highest loopback IP, if there are none configured then it uses the highest active IP on the router.

ntmanjunath · ‎08-05-2016

please look at the configuration there is no loopback IP, manual router ID and higher OSPF priority,

so it has to elect the DR highest active IP on the router, so its not happening & all the routers are become DR as per the sh ip ospf neighbor in R2.

chrihussey · ‎08-05-2016

To clarify, each LAN segment has a DR. In the absence of a loopback interface and priority the router with the highest router ID becomes the DR. The router ID is determined by the highest active interface when the OSPF process was started and not necessarily just the highest IP.

Do a "show ip ospf" on each of the devices and compare the router IDs. I assume R2 will have the lowest.

Prashant Sheshasayee · ‎08-10-2016

DR&BDR

Router having highest priority is DR
Router having second highest priority is BDR
The default priority value is 1
In case of a tie Router with a highest Router id becomes DR and second highest Router id is BDR
If Router Priority is 0 it cannot become the DR or BDR.
Router which is not a DR or BDR is called as DROTHER.

Refer to the above scenario and config how ospf works.

paul driver · ‎08-05-2016

Hello

Isn't necessary the case the higher rid, The router that's powers up before any other and starts ospf will promote itself to be the DR irrespective if it has RID priority lower than the rest.

res

Paul

Please rate and mark as an accepted solution if you have found any of the information provided useful.
This then could assist others on these forums to find a valuable answer and broadens the community’s global network.

Kind Regards
Paul

paul driver · ‎08-04-2016

Hello

Once a DR has been elected, It cannot be substituted by another router with even an higher opsf priority ..etc, This can only be changed by manually forcing the clearing of the opsf process ( clear ip ospf process), or a reload or the interface disconnects

res
Paul

Please rate and mark as an accepted solution if you have found any of the information provided useful.
This then could assist others on these forums to find a valuable answer and broadens the community’s global network.

Kind Regards
Paul

AScisco1990 · ‎12-31-2020

Paul/community,

I am running a Cisco Nexus 3548 switch with OS 6.0(2)A1(1d). Recently, we had an incident that caused our tcp connections on our servers to fail (which run ospf using quagga). These servers were configured for ospf with default values for priorities while the switch was configured for a priority of 255 which should automatically make it a DR.

After much investigation we pinpointed that this was due to some DR/BDR election process that was occurring even though there were no issues with the DR. During investigation, we determined that the DR was the switch while the BDR was one of the servers. This is expected in that subnet as there were no other BDR capable devices other than the servers running quagga ospf. What issue I have is that once you remove the BDR from the equation, there in theory should just be a new BDR election but shouldn't disrupt the network as there is already an existing DR.

What we found during our testing in non-production hours, was that by changing the priority of one of the servers to 0 to prevent it from participating in the election process & restarting the ospfd process, we noticed that a new BDR took over which was another server on that subnet, which makes sense as the next available device that has a priority of 1 or higher will go through the election for BDR. However, what I didn't expect is that both BDR and DR election would occur during that time. And that is what i saw from watching the logs for ospf on the switch.

The question I have is whether anyone has seen this before as I believe once the DR is in place, if the BDR goes down this would not affect the DR and would not initiate a DR election and cause disruption in the network.

Is this an OS bug or is this an expected behavior?

thanks

Richard Burts · ‎01-01-2021

First a detail: this post starts with a description of some problem with tcp in the network. I am puzzled that a problem with tcp would impact OSPF since OSPF does not tcp (or udp) but uses its own IP protocol number which is 89.

We do not know much about the environment here. There is mention of Nexus 3548 switch and some servers. All in 1 vlan? in multiple vlans? any other devices involved with OSPF?

As I understand the explanation the switch was the DR and some server was the BDR. They changed the OSPF priority for the BDR server and restarted the OSPF process on that server. Is my understanding correct? Were there changes on any other server? Any changes on the switch?

What I would expect would be that the switch continues as DR and a new BDR is elected. If you observed something other than that I would like to see the log messages. If the switch operation as DR was interrupted I would regard that as not the expected behavior.

HTH

Rick

AScisco1990 · ‎01-01-2021

Richard,

Thanks for the reply. I'm sorry if there was some confusion in my description of the issue. The TCP problem was not the cause of the issue, it was the symptom of the issue. The issue was a DR re-election that occurred on the primary switch when the BDR was taken down for a server vlan which is the only vlan on that primary switch and is connected to a backup switch on a different vlan

Regarding the devices all participating in OSPF, there are a total of probably about 30 servers all running ospf through quagga and two switches, one primary which is the Cisco 3548 and a backup switch which is a Juniper. All servers are dual-homed with one nic on a network let's say 10.10.100.0/24 and the other nic on a backup network as 10.20.100.0/24

There is a BGP connection that connects to an external network and receives advertised routes that the servers need to reach to for TCP connections. The BGP is redistributed to the OSPF network for the servers and OSPF is used locally to dynamically update the routes on the servers in case one nic dies or the primary switch dies.

Before this issue was discovered, each server had a default priority set so that it could potentially be a BDR and take over as DR if the DR failed. However, what we noticed during after hours troubleshooting is that once the BDR has been taken down, this caused both a BDR election as well as a DR election and caused disruption of the network which prevented the servers from reaching the remote networks (through TCP) for a short period of time. The only changes to the servers was to remove it from participating in the election process by setting the priority to 0 while there were no changes to the switches.

Based on RFC 2328, this is should not happen as once there is a DR and if BDR failed, there should only be a BDR election and not cause disruption. This is not what I see happening in this switch that is why I was asking if anyone else has seen this as I believe this maybe an OS bug.

Thanks,

Richard Burts · ‎01-01-2021

Thanks for the additional information. There are still some things about what you describe that I am not clear about. You continue to describe a problem with tcp. I am not clear whether this is really a problem with something in the tcp protocol or whether the problem is really with the more general tcp/ip protocol. Perhaps this is just a detail with semantics. But perhaps it is something more significant and so I would like to understand it better.

I had been not clear about the reasons for running OSPF when the switch has a single vlan, with about 30 servers and another switch in the vlan. It is helpful to understand that the real reason for running OSPF is to be able to advertise to the servers routes that originate in BGP. But then I am a bit confused when you describe the servers that are dual homed with 2 different subnets. How do switches with a single vlan relate to servers with dual nic and 2 different IP subnets?

From an architecture/design perspective I would suggest that if there are 2 switches and 30 servers running OSPF that it makes good sense to configure all servers with OSPF priority 0. There is no benefit in having any server become BDR.

Going back to the important question in your post - in my understanding of the documentation, and consistent with my experience, if the BDR stops working there should be a new election for BDR and not a new election for DR. I have limited experience with Nexus (and no experience with Nexus 3548). I know that in some aspects the Nexus OS does not do exactly the same as traditional IOS. Perhaps this is one example of that. If you have a service agreement for this switch I would suggest opening a case with Cisco TAC as the best way to resolve your question.

HTH

Rick

AScisco1990 · ‎01-01-2021

When I say the problem is TCP, I'm really talking about unicast so the problem would also be present if we had applications that use UDP unicast but in this case all our applications use TCP.

The environment has one vlan on each switch and each server has two nics with separate networks connected to each switch (each network is associated with a vlan). If the primary switch fails then the servers will dynamically learn the routes to use the secondary nic which is setup in a bond (LACP) & go out the secondary switch to the external network. If just a nic fails on the primary, then the secondary nic will dynamically learn the routes to route through the primary switch, hence the OSPF setup.

I agree that servers should be set to priority 0, that is what I am struggling with our server group to implement this as this is not under my control. However, this does not answer the question of why DR election was initiated when BDR fails.

I will definitely be reaching out to TAC but wanted to get other people's input if they have ever seen this in their environment whether it is with Nexus or any other switch platform.

Thanks for your input

paul driver · ‎01-02-2021

Hello

you say the server was shutdown and it has multiple interfaces assigned to ospf -was it then bdr for all of them or was those same links teamed for a single logical connection

Are the interconnects of these servers in a vPC on the nexus?

Please rate and mark as an accepted solution if you have found any of the information provided useful.
This then could assist others on these forums to find a valuable answer and broadens the community’s global network.

Kind Regards
Paul

AScisco1990 · ‎01-02-2021

Each server is set up as an active-backup bond. All the servers were eligible to be BDR but only one was elected to be BDR. When the ospf process was shutdown for the server that was BDR, both interfaces of that server became just DROther while another server was elected to be BDR on both the primary and backup NICs.

The interconnects are not on a vPC.

thanks,

paul driver · ‎01-02-2021

Hello

thanks for the feedback-

when you say active-backup bond then this is then a teamed connection on the sever side correct and i assume the server interfaces are directly connected to the same switch?

the reason why i am asking is you say the server isn’t in a vpc or is the nexus in a vpc domain but the server interconnects are not?

Lastly you say “ospf is shutdown” how do you do this and from where

Please rate and mark as an accepted solution if you have found any of the information provided useful.
This then could assist others on these forums to find a valuable answer and broadens the community’s global network.

Kind Regards
Paul

OSPF DR & BDR election