05-05-2025 05:40 AM
Hello,
is there anybody out that has experience with reducing the hold timer in ACI for rogue endpoints? the default is 1800 seconds...we want to reduce to the minimum (300). the reason is based on our architcture (routing is outside ACi and we have issues with hsrp flaps) so we want to mitigate that until we have changed the whole arhcitecture.
btw. the exception list is not an option because we need 700 entries and the maximal number is 100.
i am wondering if the reduction cudl cause any issues or if anybody has any experience with that.
br + thx in advance
05-05-2025 09:44 AM
Hi @waschminator ,
Based on what you describe, reducing the Hold timer (time the EP learning is frozen) would just reduce your outage down to 5 minutes... I would recommend trying to solve your issue instead.
Depending on how many flaps occur during your HSRP failover, you can try to increase the Multiplication Factor up to 10. If Rogue still kicks in, try to also decrease the Detection Interval down to 30 seconds. These will make the Rogue protection more permissive.
Side note, you may realize that ACI is not really intended to be designed for simple L2 switching, and if you have 700+ BDs, I would highly suggest to move to L3 with GW on ACI to leverage a lot of more benefits from the product.
Regards
05-05-2025 02:36 PM
i totally agree with you that having L3 gateway in ACI would be beneficial but due to historical reasons it is not the case and when i joined the company this setup was already there. we are plannign to change it but it will take some time and anyway we have to mitigate the risk now.
also solving the issue would be fine but the issue happens all 6-18 months and then we have a full datacenter outage due to a 2 minutes flap of hsrp (whereby the root cause can not be found by cisco and us).
so we are where we are...anyway the idea to be less aggressive is an interesting one. i will think about it
thx for your ideas
05-05-2025 11:40 PM
I understand. I meant trying to solve the issue, by changing these Rogue settings I mentioned.
05-07-2025 05:57 AM
i let you know the outcome.
kastquestion: multiplication factor...maximum is 10...regaridng documentation...but i am able to configure 65535....? any idea why and what is correct?
05-08-2025 02:52 AM
Any value between 10 and 65535 will fallback down to 10 in the hardware. It is a cosmetic bug (CSCwc61314).
05-08-2025 03:01 AM - edited 05-08-2025 03:02 AM
ah--great....last question...if an endpoint is rogue...is then just the learning disabled or is also the traffic dropped. the documenation says so, but i think traffic is not dropped.
Hold Interval (sec): Interval in seconds after the endpoint is declared rogue, where it is kept static so learning is prevented and the traffic to and from the rogue endpoint is dropped.
05-08-2025 06:58 AM
Right, traffic is not dropped by the Leaf itself during the Hold interval. Switching still occurs to and from. However, if the learning is arbitrarily sticked on one of the "flapping" sides, it's likely to have some traffic disruption.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide