cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
255
Views
0
Helpful
7
Replies

ACI Rogue Endpoint Control - Reduce Hold Timer

waschminator
Level 1
Level 1

Hello,

is there anybody out that has experience with reducing the hold timer in ACI for rogue endpoints? the default is 1800 seconds...we want to reduce to the minimum (300). the reason is based on our architcture (routing is outside ACi and we have issues with hsrp flaps) so we want to mitigate that until we have changed the whole arhcitecture.

btw. the exception list is not an option because we need 700 entries and the maximal number is 100. 

i am wondering if the reduction cudl cause any issues or if anybody has any experience with that.

 

br + thx in  advance

 

7 Replies 7

Remi-Astruc
Cisco Employee
Cisco Employee

Hi @waschminator ,

Based on what you describe, reducing the Hold timer (time the EP learning is frozen) would just reduce your outage down to 5 minutes... I would recommend trying to solve your issue instead.

Depending on how many flaps occur during your HSRP failover, you can try to increase the Multiplication Factor up to 10. If Rogue still kicks in, try to also decrease the Detection Interval down to 30 seconds. These will make the Rogue protection more permissive.

Side note, you may realize that ACI is not really intended to be designed for simple L2 switching, and if you have 700+ BDs, I would highly suggest to move to L3 with GW on ACI to leverage a lot of more benefits from the product.

Regards

Remi Astruc

i totally agree with you that having L3 gateway in ACI would be beneficial but due to historical reasons it is not the case and when i joined the company this setup was already there. we are plannign to change it but it will take some time and anyway we have to mitigate the risk now.

also solving the issue would be fine but the issue happens all 6-18 months and then we have a full datacenter outage due to a 2 minutes flap of hsrp (whereby the root cause can not be found by cisco and us).

so we are where we are...anyway the idea to be less aggressive is an interesting one. i will think about it

thx for your ideas

 

I understand. I meant trying to solve the issue, by changing these Rogue settings I mentioned.

Remi Astruc

i let you know the outcome.

kastquestion: multiplication factor...maximum is 10...regaridng documentation...but i am able to configure 65535....? any idea why and what is correct?

Any value between 10 and 65535 will fallback down to 10 in the hardware. It is a cosmetic bug (CSCwc61314).

Remi Astruc

ah--great....last question...if an endpoint is rogue...is then just the learning disabled or is also the traffic dropped. the documenation says so, but i think traffic is not dropped.

Hold Interval (sec): Interval in seconds after the endpoint is declared rogue, where it is kept static so learning is prevented and the traffic to and from the rogue endpoint is dropped. 

Cisco APIC Basic Configuration Guide, Release 6.0(x) - Provisioning Core ACI Fabric Services [Cisco Application Policy Infrastructure Controller (APIC)] - Cisco

Right, traffic is not dropped by the Leaf itself during the Hold interval. Switching still occurs to and from. However, if the learning is arbitrarily sticked on one of the "flapping" sides, it's likely to have some traffic disruption.

Remi Astruc

Review Cisco Networking for a $25 gift card

Save 25% on Day-2 Operations Add-On License