cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
6785
Views
0
Helpful
15
Replies

Troubleshooting missing route in OSPF to OSPF redistribution

tommyboay
Level 1
Level 1

Hi all,

Back with another question / issue on my OSPF routing

I have 2 ASBR routers, AGFR01RTR03 and AGFR02RTR03, performing OSPF to OSPF redistribution in both ways for the same ASs.

They also do summarization for our private addressing scheme. It is all working just fine for that part (neighbors, summarization, redistribution).

AGDC01RTR01 --- AGDC02RTR01 (OSPF 1000 ABRs)

          |                           |

          |                           |

AGFR01RTR03 --- AGFR02RTR03 (OSPF 1000 / 53 ASBRs)

Let's focus on AGDC01RTR01 with a specific entry here (IP subnet is fake) :

Routing entry for 1.1.1.0/25

  Known via "ospf 1000", distance 110, metric 300, type inter area

  Last update from 10.2.244.76 on GigabitEthernet5/1, 1d03h ago

  Routing Descriptor Blocks:

  * 10.2.244.76, from 10.2.1.249, 1d03h ago, via GigabitEthernet5/1

      Route metric is 300, traffic share count is 1

Compare with AGDC02RTR01 :

Routing entry for 1.1.1.0/25

  Known via "ospf 1000", distance 110, metric 400, type inter area

  Last update from 10.2.244.121 on GigabitEthernet5/2, 1d03h ago

  Routing Descriptor Blocks:

  * 10.2.244.121, from 10.2.1.249, 1d03h ago, via GigabitEthernet5/2

      Route metric is 400, traffic share count is 1

I would expect AGFR01RTR03 (in full state on its peering with AGDC01RTR01) to learn that network from OSPF 1000, but no :

Routing entry for 1.1.1.0/25

  Known via "ospf 53", distance 110, metric 23000

  Tag 1000, type extern 2, forward metric 45

  Redistributing via ospf 1000

  Last update from 10.5.1.9 on GigabitEthernet1/2, 3d00h ago

  Routing Descriptor Blocks:

  * 10.5.1.9, from 10.5.0.134, 3d00h ago, via GigabitEthernet1/2

      Route metric is 23000, traffic share count is 1

      Route tag 1000

The route comes from OSPF 1000 being redistributed into OSPF 53 on AGFR02RTR03 (which tagged the route + adjusted metric to 23K).

HEre is the OSPF and route maps existing on AGFR01RTR03. Note that config is very close on AGFR02RTR03 (only the metric goes to 23K instead of 18K)

router ospf 1000

log-adjacency-changes

nsf

summary-address 10.4.0.0 255.254.0.0 tag 33

summary-address 172.18.0.0 255.255.0.0 tag 33

redistribute ospf 53 metric 18000 subnets route-map EU_US

network 10.2.244.80 0.0.0.3 area 0

router ospf 53

log-adjacency-changes

auto-cost reference-bandwidth 10000

nsf

network 10.5.1.8 0.0.0.7 area 0

summary-address 10.2.0.0 255.254.0.0 tag 1000

summary-address 172.16.0.0 255.255.0.0 tag 1000

redistribute ospf 1000 metric 18000 subnets route-map US_EU

route-map US_EU deny 10

match tag 33

!

route-map US_EU permit 20

set tag 1000

!

route-map EU_US deny 10

match tag 1000

!

route-map EU_US permit 20

set tag 33

Would you have any recommendation on how I could possibly debug this issue ? I'm a little confused on what to verify.

Tom

2 Accepted Solutions

Accepted Solutions

Hello Tom,

Opening a TAC case is not necessary yet, I believe. What you are experiencing is normal.

Running two OSPF processes and redistributing between them on various places (multipoint bidirectional redistribution) is a complicated stuff. What you are seeing here is merely a race condition: FR01 simply learnt about the offending network via OSPF process 53 sooner than via OSPF process 1000.

I suggest you first read the following document very, very carefully - it explains the common caveats when running multiple OSPF processes and redistributing between them.

http://www.cisco.com/en/US/tech/tk365/technologies_tech_note09186a00801069aa.shtml

Please feel welcome to ask further after reading that document!

Best regards,

Peter

View solution in original post

Tom,

So far, I am confused by the results of our experiment. None of what we are seeing makes sense to me. Would you mind performing another test?

The point of that experiment is to prohibit the OSPF process 1000 from installing the route 192.168.74.64/26 into the routing table, thereby allowing the OSPF process 53 to offer its own candidate - if it has any.

The configuration would be performed on FR01 as follows:

ip prefix-list Experiment deny 192.168.74.64/26

ip prefix-list Experiment permit 0.0.0.0/0 le 32

!

router ospf 1000

distribute-list prefix Experiment in

After a couple of seconds, have a look into the routing table about the network 192.168.74.64/26. I would be interested in seeing if a replacement route is installed into the routing table and where it is going to point towards to. Having the debug ip routing should be illustrative as well.

Please note that this experiment may very well result in temporary unreachability of the network 192.168.74.64/26, and should therefore be performed only in times of low volume.

Best regards,

Peter

View solution in original post

15 Replies 15

tommyboay
Level 1
Level 1

Opening TAC request.

Hello Tom,

Opening a TAC case is not necessary yet, I believe. What you are experiencing is normal.

Running two OSPF processes and redistributing between them on various places (multipoint bidirectional redistribution) is a complicated stuff. What you are seeing here is merely a race condition: FR01 simply learnt about the offending network via OSPF process 53 sooner than via OSPF process 1000.

I suggest you first read the following document very, very carefully - it explains the common caveats when running multiple OSPF processes and redistributing between them.

http://www.cisco.com/en/US/tech/tk365/technologies_tech_note09186a00801069aa.shtml

Please feel welcome to ask further after reading that document!

Best regards,

Peter

Hi Peter,

You're right once again. I modified the administrative distance in the "edge" AS to prevent external routes from being eligible when inter/intra area routes for the same destination exist.

I can close my TAC request and thank you once again for the help and knowledge

Tom

Hi all,

I have an update with an issue that doesn't make sense to me :

As shown in the original post, AGFR01RTR03 has two OSPF processes. One is 53 where the router is an ABR with several totally stubby and two NSSA areas. Focusing on one of my NSSAs, the next hop router inside that area has an NSSA E1 route.

agde04rtr03#show ip route 192.168.74.64

Routing entry for 192.168.74.64/26

  Known via "ospf 2000", distance 110, metric 464, type NSSA extern 1

  Last update from 10.5.129.3 on GigabitEthernet0/0, 2w0d ago

  Routing Descriptor Blocks:

  * 10.5.129.3, from 10.5.129.3, 2w0d ago, via GigabitEthernet0/0

      Route metric is 464, traffic share count is 1

The peering with agfr01rtr03 is in full state. Since external routes received from the other OSPF process is now set at a higher administrative distance, that router should now prefer my NSSA E1 route as long as it exist, right ? Well this is not the case :

agfr01rtr03#show ip route 192.168.74.64

Routing entry for 192.168.74.64/26

  Known via "ospf 1000", distance 115, metric 23000

  Tag 33, type extern 2, forward metric 304

  Redistributing via ospf 53

  Last update from 10.2.244.81 on Serial3/0/0, 00:01:29 ago

  Routing Descriptor Blocks:

  * 10.2.244.81, from 10.2.244.86, 00:01:29 ago, via Serial3/0/0

      Route metric is 23000, traffic share count is 1

      Route tag 33

I don't understand why the admin distance is not used as tie breaker in this scenario. If someone can explain that...

Thanks,

Tom

Hello Tom,

I hope you do not find it annoying that I am still occupying your threads here...

In order to understand your situation better, I need to see the following output from FR01:

show ip ospf database external 192.168.74.64

show ip ospf database 192.168.74.64

These outputs will produce one or more LSA-5 and LSA-7. Please be sure to also check the Advertising Router ID in all of these LSAs and please include information who these Router IDs are (which routers they correspond to).

Thank you!

Best regards,

Peter

Hi Peter,

You're very welcome to camp in my threads as long as you like

#show ip ospf database external 192.168.74.64

            OSPF Router with ID (10.5.5.101) (Process ID 2000)

            OSPF Router with ID (10.5.0.133) (Process ID 53)

                Type-5 AS External Link States

  Routing Bit Set on this LSA in topology Base with MTID 0
  LS age: 556
  Options: (No TOS-capability, DC)
  LS Type: AS External Link
  Link State ID: 192.168.74.64 (External Network Number )
  Advertising Router: agfr02rtr03.archon.net
  LS Seq Number: 80000027
  Checksum: 0x7058
  Length: 36
  Network Mask: /26
        Metric Type: 1 (Comparable directly to link state metric)
        MTID: 0
        Metric: 5555
        Forward Address: 10.177.98.21
        External Route Tag: 0


            OSPF Router with ID (10.2.244.82) (Process ID 1000)

                Type-5 AS External Link States

  Routing Bit Set on this LSA in topology Base with MTID 0
  LS age: 911
  Options: (No TOS-capability, DC)
  LS Type: AS External Link
  Link State ID: 192.168.74.64 (External Network Number )
  Advertising Router: agfr02rtr03-vzb-s3-0-0.archon.net
  LS Seq Number: 80000F55
  Checksum: 0x36BB
  Length: 36
  Network Mask: /26
        Metric Type: 2 (Larger than any link state path)
        MTID: 0
        Metric: 23000
        Forward Address: 0.0.0.0
        External Route Tag: 33

The show ip ospf database 192.168.74.64 is not accepted. Do you need the "show ip ospf database network 192.168.74.64 ?

Tom

Hello Tom,

I apologize - the command that did not work was supposed to say:

show ip ospf database nssa 192.168.74.64

Can you please post the output of that command now? Thank you!

Best regards,

Peter

No problem Peter.

Here it is :

# show ip ospf database nssa 192.168.74.64

           OSPF Router with ID (10.5.5.101) (Process ID 2000)

            OSPF Router with ID (10.5.0.133) (Process ID 53)

                Type-7 AS External Link States (Area 4)

  Routing Bit Set on this LSA in topology Base with MTID 0
  LS age: 379
  Options: (No TOS-capability, Type 7/5 translation, No DC)
  LS Type: AS External Link
  Link State ID: 192.168.74.64 (External Network Number )
  Advertising Router: agde04b2bfw01.archon.net
  LS Seq Number: 80002413
  Checksum: 0x538A
  Length: 36
  Network Mask: /26
        Metric Type: 1 (Comparable directly to link state metric)
        MTID: 0
        Metric: 444
        Forward Address: 10.177.98.17
        External Route Tag: 0


                Type-7 AS External Link States (Area 6)

  Routing Bit Set on this LSA in topology Base with MTID 0
  LS age: 681
  Options: (No TOS-capability, Type 7/5 translation, No DC)
  LS Type: AS External Link
  Link State ID: 192.168.74.64 (External Network Number )
  Advertising Router: agde06fw01.archon.net
  LS Seq Number: 80001479
  Checksum: 0x1084
  Length: 36
  Network Mask: /26
        Metric Type: 1 (Comparable directly to link state metric)
        MTID: 0
        Metric: 5555
        Forward Address: 10.177.98.21
        External Route Tag: 0

Tom,

Thank you. Which of these names displayed in the Advertising Router fields corresponds to the Router ID 10.2.244.86 please?

Best regards,

Peter

Peter,

That is the AGFR02RTR03 ID on the remote AS side (process 1000 where my local routes are redistributed)

AGDC01RTR01 --- AGDC02RTR01 (OSPF 1000 ABRs)

          |                           |

          |                           |

AGFR01RTR03 --- AGFR02RTR03 (OSPF 1000 / 53 ASBRs)

The AGFR01RTR03 router receives the redistributed route on the ospf 1000 process without redistributing it into 53 due to route-map blocking any tag 33 routes.

Hope this helps.

Tom

Thomas,

I am having a hard time wrapping my head around your topology, as what I am seeing still is just a couple of routers without really knowing where the individual areas are, what processes they are assigned to, where the routes are originated or redistributed, etc. A more illustrative exhibit of your topology including depiction of processes and areas would be helpful.

So far, what I am able to understand from the outputs:

1) FR01 has 4 LSAs regarding the same network:

In process 1000: one LSA:

LSA-5 generated byagfr02rtr03-vzb-s3-0-0.archon.net, tagged with tag 33 due to redistribution

In process 53: three LSAs:

LSA-5 generated by agfr02rtr03.archon.net, not tagged

LSA-7 generated by agde04b2bfw01.archon.net in area 4, not tagged

LSA-7 generated by agde06fw01.archon.net in area 6, not tagged

I assume that none of these router IDs corresponds to the FR01 itself, as if it was FR01 itself, it would in a way explain this issue.

2) For some reason, the process 1000 wins when installing the route although its AD is higher than default OSPF AD, namely, 115. If the OSPF process 53 selected its own version of the best candidate, it should have offered it to the routing table with the AD of 110 and it should have won. Depending on the version of NSSA implementation, FR01 would either prefer the N1 or E1 route (assuming it is not performing the LSA-7/LSA-5 translation itself).

I wonder whether this state is not simply a metastable remainder of a change in your network. I would personally suggest trying to use the clear ip route 192.168.74.64 255.255.255.192 to remove the route from your routing table on FR01 and let the OSPF processes compete again when installing the route. And even better, I would also suggest running the debug ip routing command before the clear ip route command - the debug should show us all changes performed to the routing table, which may yield some additional information about what is happening.

Would you mind performing those tests? As they may result in short-lived connectivity issues with the corresponding network, I suggest using a more quiet period when the intermittent connectivity won't be too harmful.

Best regards,

Peter

Hi Peter,

I just finished an updated schema of my OSPF area layout. I hope it will clear things a little bit.

I completely agree with all your remarks and wonder why AD is not becoming the tie breaker.

To answer your question, here is the output :

agfr01rtr03#clear ip route 192.168.74.64
agfr01rtr03#
033952: Sep 30 15:26:43 UTC: RT: del 192.168.74.64 via 10.2.244.81, ospf metric [115/23000]
033953: Sep 30 15:26:43 UTC: RT: delete subnet route to 192.168.74.64/26
033954: Sep 30 15:26:43 UTC: RT(multicast): delete subnet route to 192.168.74.64/26
033955: Sep 30 15:26:43 UTC: RT: updating ospf 192.168.74.64/26 (0x0) via 10.2.244.81 Se3/0/0
033956: Sep 30 15:26:43 UTC: RT: add 192.168.74.64/26 via 10.2.244.81, ospf metric [115/23000]
033957: Sep 30 15:26:43 UTC: RT: updating ospf 192.168.74.64/26 (0x0) via 10.5.4.26 Tu141
033958: Sep 30 15:26:43 UTC: RT: rib update return code: 17
033959: Sep 30 15:26:43 UTC: RT: updating ospf 192.168.74.64/26 (0x0) via 10.5.4.26 Tu141
033960: Sep 30 15:26:43 UTC: RT:
agfr01rtr03#rib update return code: 17
033961: Sep 30 15:26:43 UTC: RT: updating ospf 192.168.74.64/26 (0x0) via 10.5.4.46 Tu161
033962: Sep 30 15:26:43 UTC: RT: rib update return code: 17
033963: Sep 30 15:26:43 UTC: RT(multicast): network 192.168.74.0/24 is now subnetted
033964: Sep 30 15:26:43 UTC: RT(multicast): network 192.168.74.0 is now variably masked
033965: Sep 30 15:26:43 UTC: Replicated ndb 192.168.74.64/26 in table 0x8000 created
033966: Sep 30 15:26:43 UTC: Replicated ndb 457FDB24/4888E15C refcnt 1
                         

agfr01rtr03#show ip route 192.168.74.64
Routing entry for 192.168.74.64/26
  Known via "ospf 1000", distance 115, metric 23000
  Tag 33, type extern 2, forward metric 304
  Redistributing via ospf 53
  Last update from 10.2.244.81 on Serial3/0/0, 00:00:20 ago
  Routing Descriptor Blocks:
  * 10.2.244.81, from 10.2.244.86, 00:00:20 ago, via Serial3/0/0
      Route metric is 23000, traffic share count is 1
      Route tag 33

Hope this helps,

Tom

Tom,

So far, I am confused by the results of our experiment. None of what we are seeing makes sense to me. Would you mind performing another test?

The point of that experiment is to prohibit the OSPF process 1000 from installing the route 192.168.74.64/26 into the routing table, thereby allowing the OSPF process 53 to offer its own candidate - if it has any.

The configuration would be performed on FR01 as follows:

ip prefix-list Experiment deny 192.168.74.64/26

ip prefix-list Experiment permit 0.0.0.0/0 le 32

!

router ospf 1000

distribute-list prefix Experiment in

After a couple of seconds, have a look into the routing table about the network 192.168.74.64/26. I would be interested in seeing if a replacement route is installed into the routing table and where it is going to point towards to. Having the debug ip routing should be illustrative as well.

Please note that this experiment may very well result in temporary unreachability of the network 192.168.74.64/26, and should therefore be performed only in times of low volume.

Best regards,

Peter

Hi Peter,

I found out that the OSPF external route administrative distance was 115 on process 53 also. Therefore, no tie breaker was existing and would still be "first installed wins". I took out the setting on ospf 53 and

agfr01rtr03#show ip route 192.168.74.64

Routing entry for 192.168.74.64/26

  Known via "ospf 53", distance 110, metric 797, type extern 1

  Redistributing via ospf 1000

  Advertised by ospf 1000 metric 18000 subnets route-map EU_US

  Last update from 10.5.4.26 on Tunnel141, 00:06:13 ago

  Routing Descriptor Blocks:

  * 10.5.4.26, from 10.5.0.134, 00:06:13 ago, via Tunnel141

      Route metric is 797, traffic share count is 1

Not sure if I did that by mistake at some point. I'm very sorry to have mislead you. Thanks much again for the help. Each of your posts are giving me some tools to help myself next time. Hope you don't mind about it

Ps : Prior to that, I tried installing your prefix-list. It did the job but caused huge debug outputs with the hundreds of routes being deleted from tables.

Review Cisco Networking products for a $25 gift card