Total OSPF Outage - Suspecte issue in Loopback IP

mmendis · ‎04-24-2017

Dear All,

Yesterday I have faced a major downtime due OSPF not injecting routing information to Routing table.

My setup is like follow.

Main Router -------- SWT --------> R1

|-------------> R2

|--------------> R3

All the routers are in Area 0

Main router Loopback/router-ID = X1

R1 Loopback/router-ID = X2

R2 Loopback/router-ID = X3

R3 Loopback/router-ID = X4

all are in the same same X/24 subnet.

What Happened in brief ??

*Suddenly connectivity between loopback IPs went down.

*show ip ospf database - shows it's learning information from his neighbors but

*show ip route ospf - showing empty'

*show ip ospf neighbors - shows all the available neighbors

Things Done.

After changing Main Routers Loopback/router-ID to = X10 everything went back to normal.

Any idea about the reason for this behavior ???

Georg Pauwen · ‎04-25-2017

Hello,

since the issue was resolved after changing the router ID, the first thing that comes to mind is a duplicate router ID. If that happened, you should see log entries such as the one below:

%OSPF-4-DUP_RTRID1: Detected router with duplicate
router ID 1.1.1.1 in area 0

mmendis · ‎04-25-2017

Dear Georg,

Thanks for the update........

I have checked all the logs and I couldn't find any logs related to duplicate IP .

cofee · ‎04-25-2017

I believe it was probably a software failure on the main router and most likely crashed loopback interface in the backend. It would be interesting to set it back to its original loopback address after making sure that no one else is using that address on the network, so you can get to the bottom of it. If no one made any changes then why would it go down and start working again after changing loopback address unless it was caused due to a software failure.

Georg Pauwen · ‎04-25-2017

Hello,

check your flash for a crash file. If you issue the 'show stacks' command, there should be a 'crashinfo' file at the end. Can you try to find and post that file here ?

mmendis · ‎04-26-2017

Dear George,

I issued the command you mentioned and following is the output. I couldn't find any crash info.

#show stacks
Minimum process stacks:
Free/Size   Name
11208/12000 ISSU Infra API Delayed Registration Process
11240/12000 CDP BLOB
23208/24000 MRIB IPv4 Init Process
23240/24000 MRIB IPv6 Init Process
9624/12000 EEM Shell Director
7768/12000 MGMT VRF Process
5240/6000   state_change
11208/12000 Autoinstall
43992/48000 Setup
23016/24000 BootP Resolver
10968/12000 DHCPv6 Bulk LQ
11240/12000 Inspect Init Msg
11208/12000 SPAN Subsystem
8664/12000 IOSXE LIIN config
11288/12000 WEBUI CONFIG SEND PROC
116728/120000 EEM Auto Registration Proc
11240/12000 SASL MAIN
5240/6000   LIM WAVL
9512/12000 IPC ISSU Versioning Process
9352/12000 IPC ISSU Receive Process
118968/120000 script background loader
10776/12000 RTTY Client Registration
11048/12000 RADIUS INITCONFIG
22472/24000 DHCP Autoinstall
38360/48000 main-thread
5208/6000   Rom Random Update Process
11176/12000 URPF stats
46488/48000 TCP Command
23208/24000 Router Init
10008/12000 BGP Open
10440/12000 BGP Accepter
43416/48000 Exec
41208/48000 Virtual Exec
9976/12000 OSPF-10 Router
9576/12000 OSPF-10 Hello
9208/12000 FTP Write Process
19000/24000 Kron CLI Process

Interrupt level stacks:
Level    Called Unused/Size Name
2           0 57520/65536 Network Interrupt
3           0 52224/65536 Aux Thread

mmendis · ‎04-26-2017

Hi Cofee,

Good point !!!

Is there a way to confirm ??

cofee · ‎04-26-2017

Sorry I can't think of a way to find that out other than what George suggested. Just curious, did yo guys reboot the main router before changing loopback address? because that should have fixed the issue without changing the ip address if it had been a software failure.

paul driver · ‎04-25-2017

Hello

What do the log buffers show for this outage!

Regards what you have mentioned I doubt it will be down to duplicate rid as a change to this would result in a new spf calculation and as such your peering would probably go into an int or 2 way state but you say this hasn't occurred and that all you neighbors are up

what ospf network type network are you running - sounds like a broadcast network type if so have you specified specific DR-BDR rtrs

Any change to the election process could result is lost of routing information but for this to happen again an interface needs to be reset or a clearing of the ospf process needs to occur

res

paul

Please rate and mark as an accepted solution if you have found any of the information provided useful.
This then could assist others on these forums to find a valuable answer and broadens the community’s global network.

Kind Regards
Paul

mmendis · ‎04-25-2017

Dear Paul,

Thanks for the update....

what ospf network type network are you running - sounds like a broadcast network type if so have you specified specific DR-BDR rtrs

We are using broadcast network type and haven't configured DR/BDR.

There's an one more change we have done.

Main Router Interfaces

Gi0/0/0 - UP/Configured ---> Involved in OSPF process

Gi0/0/1 - UP/Configured ---> OSPF is not configured.

Gi0/0/3 - Down

Loopback0 ---> X1

We have added a port channel configuration to Gi0/0/0 with following commands.

interface GigabitEthernet 0/0/3
channel-group 1

interface Port-channel 1
load-interval 30

interface GigabitEthernet 0/0/3
channel-group 1

paul driver · ‎04-25-2017

Hello

This is more sounding like a designated router reelection being initiated between ospf neighbors on the lan segment due to the changes you made

As you haven't specified any specific DR/BDR manually then ospf would chose them based on each rtrs rid prioritys (highest being the most preferred) as the rtrs come on line in ospf

It also depends on what rtr comes up first and attains either the DR/BDR role as once elected even if a better rtr come on line or has its rid priority increased no preemption is allowed thus any better rtr would not be elected the DR/BDR until a change such like you stated was made causd a spf recalculation and new DR/BDR election.

res

Paul

Please rate and mark as an accepted solution if you have found any of the information provided useful.
This then could assist others on these forums to find a valuable answer and broadens the community’s global network.

Kind Regards
Paul

mmendis · ‎04-26-2017

Hi Paul,

Thanks for the info....

We suspected the same but then we have reset the Main Router and added the configurations from the beginning ( with out Port channeling ). Till we changed that IP we faced the OSPF issue.

Appreciate if you can help me to get an idea.

paul driver · ‎04-26-2017

Hello

We suspected the same but then we have reset the Main Router and added the configurations from the beginning ( with out Port channeling ). Till we changed that IP we faced the OSPF issue

Resetting wouldn't make any difference as it would again initiate a SPF recalculation and if this rtrs loopback had a lower ip address than the rest of the loopbacks in your broadcast domain before any outage and was the preferred DR, Then it wont be when it comes back on line as you haven't specifically designated it to be and this time around it wont be the first rtr to come on line.

Getting back to your issue - Just to confirm

Without any changes whatsoever to your network you lost routing information in ospf but all ospf adjacencys were stable and ospf logging or the logging buffer reported no retransmission's or resetting of any interfaces,ospf peering or DR/BDR election process.

And when you changed the loopback address of the main ospf rtr from a lower ip to a higher address than any of the other rtrs on the same lan segment connectivity was restored?

Do these rtrs all connect to the same switching domain and if so what stp mode are you using?

Can you post a topology diagram of your network or a read out of the ospf topology database.

sh ip ospf database
sh ip ospf neighbor
sh ip ospf route

res
Paul

Please rate and mark as an accepted solution if you have found any of the information provided useful.
This then could assist others on these forums to find a valuable answer and broadens the community’s global network.

Kind Regards
Paul