04-24-2017 11:07 PM - edited 03-05-2019 08:24 AM
Dear All,
Yesterday I have faced a major downtime due OSPF not injecting routing information to Routing table.
My setup is like follow.
Main Router -------- SWT --------> R1
|-------------> R2
|--------------> R3
All the routers are in Area 0
Main router Loopback/router-ID = X1
R1 Loopback/router-ID = X2
R2 Loopback/router-ID = X3
R3 Loopback/router-ID = X4
all are in the same same X/24 subnet.
*Suddenly connectivity between loopback IPs went down.
*show ip ospf database - shows it's learning information from his neighbors but
*show ip route ospf - showing empty'
*show ip ospf neighbors - shows all the available neighbors
After changing Main Routers Loopback/router-ID to = X10 everything went back to normal.
04-25-2017 12:34 AM
Hello,
since the issue was resolved after changing the router ID, the first thing that comes to mind is a duplicate router ID. If that happened, you should see log entries such as the one below:
%OSPF-4-DUP_RTRID1: Detected router with duplicate
router ID 1.1.1.1 in area 0
04-25-2017 06:45 AM
Dear Georg,
Thanks for the update........
I have checked all the logs and I couldn't find any logs related to duplicate IP .
04-25-2017 10:33 AM
I believe it was probably a software failure on the main router and most likely crashed loopback interface in the backend. It would be interesting to set it back to its original loopback address after making sure that no one else is using that address on the network, so you can get to the bottom of it. If no one made any changes then why would it go down and start working again after changing loopback address unless it was caused due to a software failure.
04-25-2017 10:46 AM
Hello,
check your flash for a crash file. If you issue the 'show stacks' command, there should be a 'crashinfo' file at the end. Can you try to find and post that file here ?
04-26-2017 01:59 AM
Dear George,
I issued the command you mentioned and following is the output. I couldn't find any crash info.
#show stacks
Minimum process stacks:
Free/Size Name
11208/12000 ISSU Infra API Delayed Registration Process
11240/12000 CDP BLOB
23208/24000 MRIB IPv4 Init Process
23240/24000 MRIB IPv6 Init Process
9624/12000 EEM Shell Director
7768/12000 MGMT VRF Process
5240/6000 state_change
11208/12000 Autoinstall
43992/48000 Setup
23016/24000 BootP Resolver
10968/12000 DHCPv6 Bulk LQ
11240/12000 Inspect Init Msg
11208/12000 SPAN Subsystem
8664/12000 IOSXE LIIN config
11288/12000 WEBUI CONFIG SEND PROC
116728/120000 EEM Auto Registration Proc
11240/12000 SASL MAIN
5240/6000 LIM WAVL
9512/12000 IPC ISSU Versioning Process
9352/12000 IPC ISSU Receive Process
118968/120000 script background loader
10776/12000 RTTY Client Registration
11048/12000 RADIUS INITCONFIG
22472/24000 DHCP Autoinstall
38360/48000 main-thread
5208/6000 Rom Random Update Process
11176/12000 URPF stats
46488/48000 TCP Command
23208/24000 Router Init
10008/12000 BGP Open
10440/12000 BGP Accepter
43416/48000 Exec
41208/48000 Virtual Exec
9976/12000 OSPF-10 Router
9576/12000 OSPF-10 Hello
9208/12000 FTP Write Process
19000/24000 Kron CLI Process
Interrupt level stacks:
Level Called Unused/Size Name
2 0 57520/65536 Network Interrupt
3 0 52224/65536 Aux Thread
04-26-2017 02:00 AM
Hi Cofee,
Good point !!!
Is there a way to confirm ??
04-26-2017 02:45 AM
Sorry I can't think of a way to find that out other than what George suggested. Just curious, did yo guys reboot the main router before changing loopback address? because that should have fixed the issue without changing the ip address if it had been a software failure.
04-25-2017 01:36 AM
Hello
What do the log buffers show for this outage!
Regards what you have mentioned I doubt it will be down to duplicate rid as a change to this would result in a new spf calculation and as such your peering would probably go into an int or 2 way state but you say this hasn't occurred and that all you neighbors are up
what ospf network type network are you running - sounds like a broadcast network type if so have you specified specific DR-BDR rtrs
Any change to the election process could result is lost of routing information but for this to happen again an interface needs to be reset or a clearing of the ospf process needs to occur
res
paul
04-25-2017 06:44 AM
Dear Paul,
Thanks for the update....
what ospf network type network are you running - sounds like a broadcast network type if so have you specified specific DR-BDR rtrs
We are using broadcast network type and haven't configured DR/BDR.
There's an one more change we have done.
Main Router Interfaces
Gi0/0/0 - UP/Configured ---> Involved in OSPF process
Gi0/0/1 - UP/Configured ---> OSPF is not configured.
Gi0/0/3 - Down
Loopback0 ---> X1
We have added a port channel configuration to Gi0/0/0 with following commands.
interface GigabitEthernet 0/0/3
channel-group 1
interface Port-channel 1
load-interval 30
interface GigabitEthernet 0/0/3
channel-group 1
04-25-2017 10:28 AM
Hello
This is more sounding like a designated router reelection being initiated between ospf neighbors on the lan segment due to the changes you made
As you haven't specified any specific DR/BDR manually then ospf would chose them based on each rtrs rid prioritys (highest being the most preferred) as the rtrs come on line in ospf
It also depends on what rtr comes up first and attains either the DR/BDR role as once elected even if a better rtr come on line or has its rid priority increased no preemption is allowed thus any better rtr would not be elected the DR/BDR until a change such like you stated was made causd a spf recalculation and new DR/BDR election.
res
Paul
04-26-2017 01:54 AM
Hi Paul,
Thanks for the info....
We suspected the same but then we have reset the Main Router and added the configurations from the beginning ( with out Port channeling ). Till we changed that IP we faced the OSPF issue.
Appreciate if you can help me to get an idea.
04-26-2017 05:42 AM
Hello
We suspected the same but then we have reset the Main Router and added the configurations from the beginning ( with out Port channeling ). Till we changed that IP we faced the OSPF issue
Resetting wouldn't make any difference as it would again initiate a SPF recalculation and if this rtrs loopback had a lower ip address than the rest of the loopbacks in your broadcast domain before any outage and was the preferred DR, Then it wont be when it comes back on line as you haven't specifically designated it to be and this time around it wont be the first rtr to come on line.
Getting back to your issue - Just to confirm
Without any changes whatsoever to your network you lost routing information in ospf but all ospf adjacencys were stable and ospf logging or the logging buffer reported no retransmission's or resetting of any interfaces,ospf peering or DR/BDR election process.
And when you changed the loopback address of the main ospf rtr from a lower ip to a higher address than any of the other rtrs on the same lan segment connectivity was restored?
Do these rtrs all connect to the same switching domain and if so what stp mode are you using?
Can you post a topology diagram of your network or a read out of the ospf topology database.
sh ip ospf database
sh ip ospf neighbor
sh ip ospf route
res
Paul
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide