cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1000
Views
10
Helpful
7
Replies

OSPF P2MP Issue

terrygwazdosky
Level 1
Level 1

We're running OSPF point-to-multipoint on two different asynchronous CES clouds so that we can use neighbor statements to define the bandwidth of slower neighbors.  Up until now it has been working great, but now I'm experiencing an odd issue with one node on one of the clouds.

Recently I noticed that there was one node that shows the dead timer expiring for all the other neighbors at random times.  OSPF is never down for more than a second or two before re-establishing neighbor adjacencies.  The other neighbors are not logging dead timers expired, but do show that OSPF goes FULL with the problematic neighbor:  %OSPF-5-ADJCHG: Process 1, Nbr 10.226.1.56 on GigabitEthernet0/1 from LOADING to FULL, Loading Done.

The problematic node is an ASR1006.  The interface does not go down and is not showing any errors. 

Here are the things I've tried so far that have not helped:

  • Opened a trouble ticket with our CES service provider, but they have not been able to find an issue on their end. 
  • Increased the dead interval from 3 to 5 seconds on all nodes
  • Removed a shaping service policy from the interface
  • Replaced all the cables involved.
  • Upgraded the IOS from asr1000rp1-adventerprisek9.03.10.00.S.153-3.S to asr1000rp1-adventerprisek9.03.16.01a.S.155-3.

Here is the relevant config:

interface GigabitEthernet1/0/0
 bandwidth 50000
 ip address 10.226.126.1 255.255.255.224
 no ip redirects
 no ip proxy-arp
 ip flow monitor my-monitor input
 ip ospf authentication message-digest
 ip ospf message-digest-key 1 md5 abcdefg12345
 ip ospf network point-to-multipoint
 ip ospf dead-interval 5
 ip ospf hello-interval 1
 load-interval 30
 negotiation auto

!

router ospf 1
 router-id 10.226.1.56
 ispf
 log-adjacency-changes detail
 auto-cost reference-bandwidth 10000
 timers lsa arrival 80
 passive-interface default
 no passive-interface GigabitEthernet1/0/0

 network 10.226.126.0 0.0.0.31 area 0

 neighbor 10.226.126.13 cost 2000
 neighbor 10.226.126.12 cost 3333
 neighbor 10.226.126.30 cost 200
 neighbor 10.226.126.11 cost 2000
 neighbor 10.226.126.2 cost 5000
 neighbor 10.226.126.3 cost 3333
 neighbor 10.226.126.4 cost 5000
 neighbor 10.226.126.5 cost 3333
 neighbor 10.226.126.6 cost 1000
 neighbor 10.226.126.7 cost 5000
 neighbor 10.226.126.8 cost 5000
 neighbor 10.226.126.9 cost 5000
 neighbor 10.226.126.10 cost 5000

Thank you for any input/insight you can provide.
 

7 Replies 7

Rolf Fischer
Level 9
Level 9

Hi,

I don't see the 'non-broadcast' keyword in the 'ip ospf network point-to-multipoint' line, have you configured this interface to use multicast hellos?

Rolf 

Rolf - yes, it is using multicast hellos.

I remembered this older post: https://supportforums.cisco.com/discussion/12279701/issue-ospf-point-multipoint-over-ces-cloud, here you used unicast. Have you changed that on all routers on that segment?

Communication between the routers should work in either case but the per-neighbor cost assignment normally requires the non-broadcast type.

Wow, good memory.  :)  At the time I had tried both to get around the issue I was having.  Once you helped me with the proxy arp fix (thanks again!) I tried both and settled on multicast hellos so that I didn't have to define all neighbors on each node, just the ones that have lower bandwidth.  This particular node is one of two with 50Mb connections and the rest vary.

Here's the "show ip ospf interface" output which shows the neighbor costs are correct:

GigabitEthernet1/0/0 is up, line protocol is up
  Internet Address 10.226.126.1/27, Area 0, Attached via Network Statement
  Process ID 1, Router ID 10.226.1.56, Network Type POINT_TO_MULTIPOINT, Cost: 200
  Topology-MTID    Cost    Disabled    Shutdown      Topology Name
        0           200       no          no            Base
  Transmit Delay is 1 sec, State POINT_TO_MULTIPOINT
  Timer intervals configured, Hello 1, Dead 5, Wait 5, Retransmit 5
    oob-resync timeout 40
    Hello due in 00:00:00
  Supports Link-local Signaling (LLS)
  Cisco NSF helper support enabled
  IETF NSF helper support enabled
  Can be protected by per-prefix Loop-Free FastReroute
  Can be used for per-prefix Loop-Free FastReroute repair paths
  Index 1/6/6, flood queue length 0
  Next 0x0(0)/0x0(0)/0x0(0)
  Last flood scan length is 1, maximum is 37
  Last flood scan time is 0 msec, maximum is 3 msec
  Neighbor Count is 13, Adjacent neighbor count is 13
    Adjacent with neighbor 192.168.255.209
     Cost in topology Base, MTID-0 is 5000
    Adjacent with neighbor 10.21.255.255
     Cost in topology Base, MTID-0 is 5000
    Adjacent with neighbor 10.20.255.255
     Cost in topology Base, MTID-0 is 5000
    Adjacent with neighbor 10.12.255.255
     Cost in topology Base, MTID-0 is 5000
    Adjacent with neighbor 192.168.255.204
     Cost in topology Base, MTID-0 is 1000
    Adjacent with neighbor 192.168.255.197
     Cost in topology Base, MTID-0 is 3333
    Adjacent with neighbor 192.168.255.206
     Cost in topology Base, MTID-0 is 5000
    Adjacent with neighbor 192.168.255.205
     Cost in topology Base, MTID-0 is 3333
    Adjacent with neighbor 192.168.255.203
     Cost in topology Base, MTID-0 is 5000
    Adjacent with neighbor 10.122.255.255
     Cost in topology Base, MTID-0 is 2000
    Adjacent with neighbor 10.226.1.9
     Cost in topology Base, MTID-0 is 200
    Adjacent with neighbor 10.6.255.255
     Cost in topology Base, MTID-0 is 3333
    Adjacent with neighbor 10.102.255.255
     Cost in topology Base, MTID-0 is 2000
  Suppress hello for 0 neighbor(s)
  Cryptographic authentication enabled
    Youngest key id is 1

Thanks, I just wanted to be sure that the network types match.

I always thought (for whatever reasons) that the neighbor cost command doesn't work on broadcast interfaces, obviously that's not true and documentation is clear on this point.

Unfortunately I don't have a good idea how to troubleshoot this issue. The hello interval is only 1 second, so debugging OSPF packets from a particular neighbor to the buffer is probably not recommendable.

I got the config from reading this article: http://www.netcraftsmen.com/using-ospf-point-to-multipoint-on-ethernet/. ; Since implementing it I've found conflicting documentation from Cisco, most of it saying this should only work with unicast.

As to the issue, I tried another router and got the same result.  After I told the service provider this they looked again and found errors on one of their OC rings.  So it looks like this mystery is solved.

Thanks for comming back and telling us how you solved the issue.

And thanks for teaching me something new about the IOS OSPF implementation ;)

Your configuration looked good and after your troubleshooting steps I would have focussed on the SP network as well. I didn't want to recommend something like IP SLA tracking because your outages were very short and I always try to avoid probes with such short intervals in production environments. Another idea was to set up an additional peering between the ASR1006 and another router on this segment with another routing protocol (for instance iBGP) and let BFD to the link monitoring, but I've never done this over a VPLS.

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community:

Review Cisco Networking products for a $25 gift card