Solved: Re: eigrp neighbors are flapping

nani · ‎12-28-2018

Eigrp neighbors are flapping sporadically.

here is the config.

R1>>>>>>>>

crypto isakmp policy 10
encr aes
authentication pre-share
group 2
crypto isakmp key 6 USER address 192.168.1.2 255.255.255.252
!
!
crypto ipsec transform-set AES-SHA esp-aes esp-sha-hmac
mode transport
!
crypto map WARNE 10 ipsec-isakmp
set peer 192.168.1.2
set transform-set AES-SHA
match address 100

interface GigabitEthernet0/0
ip address 192.168.1.1 255.255.255.252
duplex full
speed 100
crypto map WARNE

router eigrp 12

network 192.168.1.1 0.0.0.0

neighbor 192.168.1.2 g0/0

xxxxxx

xxxxx

redistribute static

ip route 0.0.0.0 0.0.0.0 192.168.1.2

R2>>>>>>>>

crypto isakmp policy 10
encr aes
authentication pre-share
group 2
crypto isakmp key 6 USER address 192.168.1.1 255.255.255.252
!
!
crypto ipsec transform-set AES-SHA esp-aes esp-sha-hmac
mode transport
!
crypto map WARNE 10 ipsec-isakmp
set peer 192.168.1.1
set transform-set AES-SHA
match address 100

interface GigabitEthernet0/0
ip address 192.168.1.2 255.255.255.252
duplex full
speed 100
crypto map WARNE

router eigrp 12

network 192.168.1.2 0.0.0.0

neighbor 192.168.1.1 g0/0

xxxxxx

xxxxx

we are currently running on static neighbors with ipsec and having logs as below

R1> "interface peer termination received"

R2> "Hold time expired"

"retries exceed"

Configured IP SLA echo and see failures sometimes which is random, opened a ticket with ISP and they are saying they were having no issues on their circuit (VPLS)

do any one experienced the same, can any one help to solve this one

ip sla 1

icmp-echo 192.168.1.1

threshold 5

timeout 1500

frequency 5

ip sla schedule 1 life forever start-time now

nani · ‎01-03-2019

Hi Richard,

Here is the SLA configuration that we applied, We applied sla on both sides of the circuit and both have failures.

ip sla 1

icmp-echo 192.168.1.1

threshold 5

timeout 1500

frequency 5

ip sla schedule 1 life forever start-time now

track 1 sla 1

here is the partial display of sla stats

Retun code: over threshould

No of success: 564

no of Failures:9

Thank you

View solution in original post

Richard Burts · ‎01-03-2019

Thanks for the information. Just trying to be sure that my understanding is correct that the address 192.168.1.1 is directly connected (in the same subnet as your interface) and that the ICMP does not go through the encrypted tunnel. In that case I believe that we can reasonable say that the failures do represent a problem with transport.

HTH

Rick

HTH

Rick

View solution in original post

Georg Pauwen · ‎12-28-2018

Hello,

post the output of:

sh ip eigrp neighbors detail

from both sides.

That said, you have multicast (via the 'network' command) and unicast (via the 'neighbor' command) configured to establish the neighbors. Try and use just one (start with only the network):

So both sides should look like this:

R1

!

router eigrp 12

network 192.168.1.1 0.0.0.0

--> no neighbor 192.168.1.2 g0/0

R2

!

router eigrp 12

network 192.168.1.2 0.0.0.0

--> no neighbor 192.168.1.1 g0/0

nani · ‎12-28-2018

Thank you Georg for quick reply, If we have network statement they are not forming the neighbors as we are running on IPSEC and I believe IPSEC do not allow Multicast traffic.

I have no access to those systems and I did not saw any q count on that neighbors at any time of the Neighbor relation if that you want to see.

Peter Paluch · ‎12-28-2018

Georg, Nani,

Please allow me to join.

Nani, the messages you are seeing suggest that the EIGRP traffic is being lost between your R1 and R2. The meaning of the messages is:

"interface peer termination received": The neighbor sent us a Hello packet with an indication that the neighbor is tearing down the adjacency with us.
"Hold timer expired": We have not received any valid Hello packet from the neighbor for the last Hold seconds (by default 15)
"retries exceeded": We tried to retransmit a reliable EIGRP packet (Update, Query, Reply, SIA-Query, SIA-Reply) for too many times without receiving any Acknowledgment from the neighbor.

You wrote that the "peer termination" message appears on R1, and "Hold timer expired" / "retries exceeded" appear on R2. Is this always the case, or do the R1 and R2 also report other (perhaps mixed) reasons for the adjacency flaps?

Either way, we are definitely looking at a connectivity issue, and in your case, it is unicast connectivity since you are using the neighbor commands. One question, though: You are using IPsec, yes, but I assume that the IPsec only encrypts the transit traffic between R1 and R2, not the traffic originated and terminated on 192.168.1.1 and 192.168.1.2, respectively - because that would prevent IPsec from even coming up in the first place. Therefore, I suspect more that it is your local policy, rather than IPsec, that forces you to go for purely unicast EIGRP connectivity. Would you agree? In any case, if you share the ACL 100, we will be able to tell for sure.

Therefore, we first need to identify the reason for the unicast connectivity flaps between R1 and R2.

I have noted that you are already using an IP SLA probe to ping 192.168.1.2. What is the probe used for? My suggestion is to use the probe (or create a new one) to keep one router continuously ping the other, and find out how often does the outage occur, and how long does it last. For this, it would be nice to send the ping every 1s and use a timeout of 2s, and refer to the IP SLA probe in a phony track object such as:

track 1 ip sla 1

You do not need to use it anywhere, but every time the IP SLA probe 1 fails or recovers, the track object will go down or back up, and this event will be logged. This way, you can track when did the ping fail, how long did it fail, and when did it come back up.

Sorry for not being able to suggest more here but you see, we are hunting down an issue that may either be caused by the IPsec, or by the elementary IP connectivity. EIGRP is very likely only a victim here.

Best regards,
Peter

nani · ‎12-28-2018

Hi peter!

Thank you for responce, ACL is alow any any and IPsec is up and active were able to encrypt the traffic. However, we have the same setup running at different location which seems lo working good. I would like to add the track statement which you send and will see if it give any more details.

For trouble shooting more, we created a bfd and saw a log at a particular time RX down which we are assuming there is a problem in the circuit.

Richard Burts · ‎12-28-2018

It is helpful to know that your acl for crypto is permit ip any any. Cisco advises not to do this as it may cause problems. I suggest that you create a new acl and as Peter has suggested in that acl specify transit traffic (and perhaps eigrp traffic).

HTH

Rick

HTH

Rick

nani · ‎12-29-2018

Thank you Richard, I will make those changes.

nani · ‎01-02-2019

We are making those changes and will let you know the result.

Thank you.

nani · ‎01-03-2019

Hello,

We bring back GRE tunnel up on DEC 31, and running on multicast routing. we experienced the neighbor drops on 1 st and since then neighbors are stable. However, one of our server lost visibility to the PC connected to that network and IP SLA have some failures. Is it good enough to say it is Transport issue.

Thank you

Richard Burts · ‎01-03-2019

It is good to know that you brought back GRE tunnel. That is a better environment for running EIGRP over ipsec vpn. It is interesting that the neighbors have been stable but that a server lost visibility to some PC. It is probably more significant that IP SLA is reporting some failures. Can you provide some detail of how you set up IP SLA (what address is it monitoring, what threshold, etc)?

HTH

Rick

HTH

Rick

nani · ‎01-03-2019

Hi Richard,

Here is the SLA configuration that we applied, We applied sla on both sides of the circuit and both have failures.

ip sla 1

icmp-echo 192.168.1.1

threshold 5

timeout 1500

frequency 5

ip sla schedule 1 life forever start-time now

track 1 sla 1

here is the partial display of sla stats

Retun code: over threshould

No of success: 564

no of Failures:9

Thank you

Richard Burts · ‎01-03-2019

Thanks for the information. Just trying to be sure that my understanding is correct that the address 192.168.1.1 is directly connected (in the same subnet as your interface) and that the ICMP does not go through the encrypted tunnel. In that case I believe that we can reasonable say that the failures do represent a problem with transport.

HTH

Rick

HTH

Rick

nani · ‎01-03-2019

Yes, It is directly connected subnet. We are working with the ISP from the day of issue started they are unable to find the issue on their circuit.