01-18-2016 11:12 AM - edited 03-05-2019 03:08 AM
We're running OSPF point-to-multipoint on two different asynchronous CES clouds so that we can use neighbor statements to define the bandwidth of slower neighbors. Up until now it has been working great, but now I'm experiencing an odd issue with one node on one of the clouds.
Recently I noticed that there was one node that shows the dead timer expiring for all the other neighbors at random times. OSPF is never down for more than a second or two before re-establishing neighbor adjacencies. The other neighbors are not logging dead timers expired, but do show that OSPF goes FULL with the problematic neighbor: %OSPF-5-ADJCHG: Process 1, Nbr 10.226.1.56 on GigabitEthernet0/1 from LOADING to FULL, Loading Done.
The problematic node is an ASR1006. The interface does not go down and is not showing any errors.
Here are the things I've tried so far that have not helped:
Here is the relevant config:
interface GigabitEthernet1/0/0
bandwidth 50000
ip address 10.226.126.1 255.255.255.224
no ip redirects
no ip proxy-arp
ip flow monitor my-monitor input
ip ospf authentication message-digest
ip ospf message-digest-key 1 md5 abcdefg12345
ip ospf network point-to-multipoint
ip ospf dead-interval 5
ip ospf hello-interval 1
load-interval 30
negotiation auto
!
router ospf 1
router-id 10.226.1.56
ispf
log-adjacency-changes detail
auto-cost reference-bandwidth 10000
timers lsa arrival 80
passive-interface default
no passive-interface GigabitEthernet1/0/0
network 10.226.126.0 0.0.0.31 area 0
neighbor 10.226.126.13 cost 2000
neighbor 10.226.126.12 cost 3333
neighbor 10.226.126.30 cost 200
neighbor 10.226.126.11 cost 2000
neighbor 10.226.126.2 cost 5000
neighbor 10.226.126.3 cost 3333
neighbor 10.226.126.4 cost 5000
neighbor 10.226.126.5 cost 3333
neighbor 10.226.126.6 cost 1000
neighbor 10.226.126.7 cost 5000
neighbor 10.226.126.8 cost 5000
neighbor 10.226.126.9 cost 5000
neighbor 10.226.126.10 cost 5000
Thank you for any input/insight you can provide.
01-18-2016 11:43 AM
Hi,
I don't see the 'non-broadcast' keyword in the 'ip ospf network point-to-multipoint' line, have you configured this interface to use multicast hellos?
Rolf
01-18-2016 11:45 AM
Rolf - yes, it is using multicast hellos.
01-18-2016 11:52 AM
I remembered this older post: https://supportforums.cisco.com/discussion/12279701/issue-ospf-point-multipoint-over-ces-cloud, here you used unicast. Have you changed that on all routers on that segment?
Communication between the routers should work in either case but the per-neighbor cost assignment normally requires the non-broadcast type.
01-18-2016 12:02 PM
Wow, good memory. :) At the time I had tried both to get around the issue I was having. Once you helped me with the proxy arp fix (thanks again!) I tried both and settled on multicast hellos so that I didn't have to define all neighbors on each node, just the ones that have lower bandwidth. This particular node is one of two with 50Mb connections and the rest vary.
Here's the "show ip ospf interface" output which shows the neighbor costs are correct:
GigabitEthernet1/0/0 is up, line protocol is up
Internet Address 10.226.126.1/27, Area 0, Attached via Network Statement
Process ID 1, Router ID 10.226.1.56, Network Type POINT_TO_MULTIPOINT, Cost: 200
Topology-MTID Cost Disabled Shutdown Topology Name
0 200 no no Base
Transmit Delay is 1 sec, State POINT_TO_MULTIPOINT
Timer intervals configured, Hello 1, Dead 5, Wait 5, Retransmit 5
oob-resync timeout 40
Hello due in 00:00:00
Supports Link-local Signaling (LLS)
Cisco NSF helper support enabled
IETF NSF helper support enabled
Can be protected by per-prefix Loop-Free FastReroute
Can be used for per-prefix Loop-Free FastReroute repair paths
Index 1/6/6, flood queue length 0
Next 0x0(0)/0x0(0)/0x0(0)
Last flood scan length is 1, maximum is 37
Last flood scan time is 0 msec, maximum is 3 msec
Neighbor Count is 13, Adjacent neighbor count is 13
Adjacent with neighbor 192.168.255.209
Cost in topology Base, MTID-0 is 5000
Adjacent with neighbor 10.21.255.255
Cost in topology Base, MTID-0 is 5000
Adjacent with neighbor 10.20.255.255
Cost in topology Base, MTID-0 is 5000
Adjacent with neighbor 10.12.255.255
Cost in topology Base, MTID-0 is 5000
Adjacent with neighbor 192.168.255.204
Cost in topology Base, MTID-0 is 1000
Adjacent with neighbor 192.168.255.197
Cost in topology Base, MTID-0 is 3333
Adjacent with neighbor 192.168.255.206
Cost in topology Base, MTID-0 is 5000
Adjacent with neighbor 192.168.255.205
Cost in topology Base, MTID-0 is 3333
Adjacent with neighbor 192.168.255.203
Cost in topology Base, MTID-0 is 5000
Adjacent with neighbor 10.122.255.255
Cost in topology Base, MTID-0 is 2000
Adjacent with neighbor 10.226.1.9
Cost in topology Base, MTID-0 is 200
Adjacent with neighbor 10.6.255.255
Cost in topology Base, MTID-0 is 3333
Adjacent with neighbor 10.102.255.255
Cost in topology Base, MTID-0 is 2000
Suppress hello for 0 neighbor(s)
Cryptographic authentication enabled
Youngest key id is 1
01-18-2016 01:19 PM
Thanks, I just wanted to be sure that the network types match.
I always thought (for whatever reasons) that the neighbor cost command doesn't work on broadcast interfaces, obviously that's not true and documentation is clear on this point.
Unfortunately I don't have a good idea how to troubleshoot this issue. The hello interval is only 1 second, so debugging OSPF packets from a particular neighbor to the buffer is probably not recommendable.
01-21-2016 12:18 PM
I got the config from reading this article: http://www.netcraftsmen.com/using-ospf-point-to-multipoint-on-ethernet/. ; Since implementing it I've found conflicting documentation from Cisco, most of it saying this should only work with unicast.
As to the issue, I tried another router and got the same result. After I told the service provider this they looked again and found errors on one of their OC rings. So it looks like this mystery is solved.
01-21-2016 10:53 PM
Thanks for comming back and telling us how you solved the issue.
And thanks for teaching me something new about the IOS OSPF implementation ;)
Your configuration looked good and after your troubleshooting steps I would have focussed on the SP network as well. I didn't want to recommend something like IP SLA tracking because your outages were very short and I always try to avoid probes with such short intervals in production environments. Another idea was to set up an additional peering between the ASR1006 and another router on this segment with another routing protocol (for instance iBGP) and let BFD to the link monitoring, but I've never done this over a VPLS.
Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: