Solved: You're welcome!

sergeymolchanov · ‎03-30-2016

Hello! We detect strange situation in our product environment with OSPF. Screenshot from router console attached.

Neighror with ID 172.30.255.21 absent in neighbor table, but command sh ip route display route to network through this neighbor:

Routing Descriptor Blocks:

* 172.16.41.12, from 172.30.255.21

IP 172.16.41.12 absent in neighbour table also.

Why OSPF not delete this route if neighbor is down?

Rolf Fischer · ‎03-30-2016

A unidirectional condition in the VPLS could cause such a behavior; to test this I applied an access-list inbound on RT1's Gi0/1.906 and could see exactly what you have descirbed:

access-list 100 deny ospf host 172.16.41.12 any ! RT2
access-list 100 permit ip any any

After Dead timer expiry, RT2 is deleted from the neighbor-table:

%OSPF-5-ADJCHG: Process 1, Nbr 172.30.255.21 on FastEthernet0/0.906 from 2WAY to DOWN, Neighbor Down: Dead timer expired

The Type-5 LSA for prefix 10.77.0.0/20 is still in RT1's LSDB, but even if not, RT2 and the Designated Router (172.30.255.5) are still adjacent and the DR re-floods it periodically to RT1 and the other routers:

OSPF: received update from 172.30.255.5, FastEthernet0/0.906
OSPF: Rcv Update Type 5, LSID 10.77.0.0, Adv rtr 172.30.255.21, age 2, seq 0x80000006
      Mask /20

Now RT1 has to check that RT2 (the ASBR) is reachable before it installs the corresponding external route for this prefix. I don't want to go too deep in OSPF theory and I hope you are a bit familiar with network-type broadcast. RT1 still receives this Type-2 LSA from the DR:

  LS Type: Network Links
  Link State ID: 172.16.41.2 (address of Designated Router)
  Advertising Router: 172.30.255.5
  LS Seq Number: 8000000E
  Checksum: 0x2B0E
  Length: 44
  Network Mask: /24
        Attached Router: 172.30.255.5
        Attached Router: 172.30.255.2
        Attached Router: 172.30.255.4
        Attached Router: 172.30.255.21
        Attached Router: 172.30.255.26

There is no connectivity problem between RT2 and the DR, so RT2 is listed as attached to the common segment. An important characteristic of Ethernet is that when RT2 can speak with the DR and RT1 can speak with the DR as well, we can assume that RT1 can also speak directly to RT2. And OSPF follows the same logic when the network-type of an interface is broadcast. And so we can see that the route stays in the routing-table even when we lost RT2 as a neighbor:

Routing entry for 10.77.0.0/20
  Known via "ospf 1", distance 110, metric 120, type extern 1
  Last update from 172.16.41.12 on FastEthernet0/0.906, 03:17:12 ago
  Routing Descriptor Blocks:
  * 172.16.41.12, from 172.30.255.21, 03:17:12 ago, via FastEthernet0/0.906
      Route metric is 120, traffic share count is 1

A VPLS doesn't always have the same reliability as "real" Ethernet and the network-types broadcast and non-broadcast need to rely heavily on the underlying layer-2 network. That's one of the reasons why I often recommend to use the Point-to-Multipoint network-type over VPLS, which is generally more robust.

Was this the first occurance of this kind of problem?

View solution in original post

Rolf Fischer · ‎03-30-2016

Hi,

172.30.255.21 is the Router-ID (RID) of an ASBR and it doesn't necessarily have to be a directly connected neighbor - it just has to be reachable within the OSPF domain.

Could you please share the output of the commands

show ip ospf border-routers
show ip ospf database external 10.77.0.0

An OSPF RID is not really an IP-address and it depends on your configuration whether a routing-table entry for 172.30.255.21 exists or not.

HTH
Rolf

sergeymolchanov · ‎03-30-2016

I do not described topology of our network, and situation, when incident occurs, sorry..

All routers linked with each other via VPLS network (via ISP provider).

Router on screenshot is RT1 (RID 172.30.255.26, channel IP 172.16.41.14), router, who tells about network 10.77.0.0 is RT2 (RID 172.30.255.21, channel IP 172.16.41.12).

There are some another routers in VPLS network, who also knows about network 10.77.0.0.
The failure occured on ISP networks, neighbor RT2 goes to down on RT1. So, RT1 must be install route to network 10.77.0.0 via another neighbor (with another channel IP, see output of sh ip ospf nei), or clear the routing table, if another neighbors hasn't route to network 10.77.0.0.
but route through dead neighbor still remain in routing table. When ISP fixed the problem on their networks, neighbors goes up and all works correctly..

Because, output of commands is on working network from RT1:

#show ip ospf border-routers

OSPF Router with ID (172.30.255.26) (Process ID 999)

Base Topology (MTID 0)

Internal Router Routing Table
Codes: i - Intra-area route, I - Inter-area route

i 172.30.255.2 [20] via 172.16.41.11, GigabitEthernet0/1.906, ASBR, Area 0, SPF 1445
i 172.30.255.4 [20] via 172.16.41.1, GigabitEthernet0/1.906, ASBR, Area 0, SPF 1445
i 172.30.255.5 [20] via 172.16.41.2, GigabitEthernet0/1.906, ASBR, Area 0, SPF 1445
i 172.30.255.14 [20] via 172.16.41.5, GigabitEthernet0/1.906, ASBR, Area 0, SPF 1445
i 172.30.255.13 [20] via 172.16.41.4, GigabitEthernet0/1.906, ASBR, Area 0, SPF 1445
i 172.30.255.21 [20] via 172.16.41.12, GigabitEthernet0/1.906, ASBR, Area 0, SPF 1445
i 172.30.255.25 [20] via 172.16.41.13, GigabitEthernet0/1.906, ASBR, Area 0, SPF 1445
i 172.30.255.28 [20] via 172.16.41.6, GigabitEthernet0/1.906, ASBR, Area 0, SPF 1445
i 172.30.255.29 [20] via 172.16.41.7, GigabitEthernet0/1.906, ASBR, Area 0, SPF 1445

show ip ospf database external 10.77.0.0

OSPF Router with ID (172.30.255.26) (Process ID 999)

Type-5 AS External Link States

LS age: 1650
Options: (No TOS-capability, DC, Upward)
LS Type: AS External Link
Link State ID: 10.77.0.0 (External Network Number )
Advertising Router: 172.30.255.2
LS Seq Number: 80002942
Checksum: 0x31C
Length: 36
Network Mask: /20
Metric Type: 1 (Comparable directly to link state metric)
MTID: 0
Metric: 150
Forward Address: 0.0.0.0
External Route Tag: 0

LS age: 1388
Options: (No TOS-capability, DC, Upward)
LS Type: AS External Link
Link State ID: 10.77.0.0 (External Network Number )
Advertising Router: 172.30.255.21
LS Seq Number: 80004678
Checksum: 0xD614
Length: 36
Network Mask: /20
Metric Type: 1 (Comparable directly to link state metric)
MTID: 0
Metric: 100
Forward Address: 0.0.0.0
External Route Tag: 0

Rolf Fischer · ‎03-30-2016

Thanks for the additional information, that helps a lot.

We can see that RT1 can reach 9 ASBRs through a single multiaccess interface (Gi0/1.906), all with the same cost (20). Is 20 the OSPF cost of interface Gi0/1.906 or are there other routers between RT1 and the ASBRs?

We can also see that two external LSAs for prefix 10.77.0.0/20 are generated, one from router 172.30.255.2 with cost 150 and another from router 172.30.255.21 (RT2) with cost 100. So I would expect one route for this prefix with a cost of 120 (20+100) but your screenshot shows 115. Did you change anything in the meantime or is this part of the problem?

sergeymolchanov · ‎03-30-2016

All routers connected to each other via VPLS (ISP network), there are no addtional routers between them.

20 is OSPF cost set on interface Gi0/1.906

Yes, after that problem we changed cost on interface Gi0/1.906 from 15 to 20

Rolf Fischer · ‎03-30-2016

Well, I have a theory ;)

A unidirectional condition in the VPLS could cause such a behavior; to test this I applied an access-list inbound on RT1's Gi0/1.906 and could see exactly what you have descirbed:

access-list 100 deny ospf host 172.16.41.12 any ! RT2
access-list 100 permit ip any any

After Dead interval expiry RT2 is deleted from the neighbor-table, but from an LSDB's perspective RT2 is still attached to the segement in such a case, so RT1 considers it as a reachble ASBR and accepts it's external LSAs for route calculations.

I'll have to do some more testing and will come back later with more details.

Rolf Fischer · ‎03-30-2016

A unidirectional condition in the VPLS could cause such a behavior; to test this I applied an access-list inbound on RT1's Gi0/1.906 and could see exactly what you have descirbed:

access-list 100 deny ospf host 172.16.41.12 any ! RT2
access-list 100 permit ip any any

After Dead timer expiry, RT2 is deleted from the neighbor-table:

%OSPF-5-ADJCHG: Process 1, Nbr 172.30.255.21 on FastEthernet0/0.906 from 2WAY to DOWN, Neighbor Down: Dead timer expired

The Type-5 LSA for prefix 10.77.0.0/20 is still in RT1's LSDB, but even if not, RT2 and the Designated Router (172.30.255.5) are still adjacent and the DR re-floods it periodically to RT1 and the other routers:

OSPF: received update from 172.30.255.5, FastEthernet0/0.906
OSPF: Rcv Update Type 5, LSID 10.77.0.0, Adv rtr 172.30.255.21, age 2, seq 0x80000006
      Mask /20

Now RT1 has to check that RT2 (the ASBR) is reachable before it installs the corresponding external route for this prefix. I don't want to go too deep in OSPF theory and I hope you are a bit familiar with network-type broadcast. RT1 still receives this Type-2 LSA from the DR:

  LS Type: Network Links
  Link State ID: 172.16.41.2 (address of Designated Router)
  Advertising Router: 172.30.255.5
  LS Seq Number: 8000000E
  Checksum: 0x2B0E
  Length: 44
  Network Mask: /24
        Attached Router: 172.30.255.5
        Attached Router: 172.30.255.2
        Attached Router: 172.30.255.4
        Attached Router: 172.30.255.21
        Attached Router: 172.30.255.26

There is no connectivity problem between RT2 and the DR, so RT2 is listed as attached to the common segment. An important characteristic of Ethernet is that when RT2 can speak with the DR and RT1 can speak with the DR as well, we can assume that RT1 can also speak directly to RT2. And OSPF follows the same logic when the network-type of an interface is broadcast. And so we can see that the route stays in the routing-table even when we lost RT2 as a neighbor:

Routing entry for 10.77.0.0/20
  Known via "ospf 1", distance 110, metric 120, type extern 1
  Last update from 172.16.41.12 on FastEthernet0/0.906, 03:17:12 ago
  Routing Descriptor Blocks:
  * 172.16.41.12, from 172.30.255.21, 03:17:12 ago, via FastEthernet0/0.906
      Route metric is 120, traffic share count is 1

A VPLS doesn't always have the same reliability as "real" Ethernet and the network-types broadcast and non-broadcast need to rely heavily on the underlying layer-2 network. That's one of the reasons why I often recommend to use the Point-to-Multipoint network-type over VPLS, which is generally more robust.

Was this the first occurance of this kind of problem?

sergeymolchanov · ‎03-31-2016

Dear Rolf Fischer, thank you for advanced troubleshooting! I tested point-to-multipoint OSPF network type on testlab environment, in such case, all is OK.

Rolf Fischer · ‎03-31-2016

You're welcome!

It may interest you that we had this Discussion a few month ago where I tried to summarize the pros and cons of point-to-multipoint over a VPLS.

HTH
Rolf

OSPF route from dead neighbor