cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
2274
Views
5
Helpful
3
Replies

ACI Monitoring for endpoint tunnel flapping

vv0bbLeS
Level 1
Level 1

Hello all,

 

We recently had an issue where an endpoint was flapping between 2 different tunnels (TEP's) on an ACI leaf, i.e. the leaf was learning about the endpoint from 2 other leafs at the same time, causing the next hop leaf to constantly change to reach that endpoint.

 

For example, Leaf A learned endpoint 10.0.0.1 from Leaf D, but also from Leaf F (because of some downstream Flow Forwarding coming up through Leaf F). This caused Leaf A to constantly change which Leaf to use as the next hop for sending traffic to endpoint 10.0.0.1. So if you were on Leaf A and you ran vsh_lc -c "show system internal epmc endpoint ip 10.0.0.1" , the result might show you that to reach 10.0.0.1, you would use Leaf A's tunnel interface pointing to Leaf D, but if you ran that same command 2 seconds later, the result might show you that you would use Leaf A's tunnel pointing to Leaf F. This "next hop leaf" to reach 10.0.0.1 was constantly flapping on Leaf A, causing packet loss for hosts connected to Leaf A that were trying to reach 10.0.0.1.

 

Is there any kind of monitoring in ACI that would check for this type of behavior?

0xD2A6762E
1 Accepted Solution

Accepted Solutions

Sergiu.Daniluk
VIP Alumni
VIP Alumni

Hi @vv0bbLeS 

Feature you are looking for is Rogue EP Control.

System > System Settings > Endpoint Controls > Rogue EP Control

 

This is what the feature does, description from ACI Endpoint Learning whitepaper:

"""

Rogue EP Control is meant to protect the ACI fabric against issues such as a specific flapping endpoint due to inappropriate configurations or designs. 

...

With the Rogue EP Control enabled, once the endpoint is marked as rogue, a fault is raised and learning is disabled for the endpoint only, which allows other endpoints in the same bridge domain to function as usual.

"""

 

Hope it helps.

 

Stay safe,

Sergiu

View solution in original post

3 Replies 3

Sergiu.Daniluk
VIP Alumni
VIP Alumni

Hi @vv0bbLeS 

Feature you are looking for is Rogue EP Control.

System > System Settings > Endpoint Controls > Rogue EP Control

 

This is what the feature does, description from ACI Endpoint Learning whitepaper:

"""

Rogue EP Control is meant to protect the ACI fabric against issues such as a specific flapping endpoint due to inappropriate configurations or designs. 

...

With the Rogue EP Control enabled, once the endpoint is marked as rogue, a fault is raised and learning is disabled for the endpoint only, which allows other endpoints in the same bridge domain to function as usual.

"""

 

Hope it helps.

 

Stay safe,

Sergiu

Just as a side note, if you have particular Endpoints such as Active/Standby L4-7 devices which may share virtual IPs/MACs, Rogue EP detection may kick in against these endpoints if they violate the move rate (default of 4 moves in 60sec).  This puts the Endpoint into a freeze state for 30mins which may cause issues.  You can now exempt certain endpoints from being impacted by Rogue EP detection like this where MAC/IP moves may be expected.  The exemption doesn't completely disable COOP dampening, but it does relax it (3000 moves within 10mins).
From 5.2(3) you can add exempted MACs to the Bridge Domains. See attached.
rogue Excemption List.png

Robert

@Sergiu.Daniluk and @Robert Burns  thank you so much for your replies! This looks like just what I'm looking for, thank you! And yes we are soon to be upgraded to 5.2(3) so we can make use of that exception feature. Thanks again!

0xD2A6762E

Save 25% on Day-2 Operations Add-On License