cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
Announcements
384
Views
0
Helpful
3
Replies

F5 HA failover issue with ACI design

We have an ACI(Multipod) & F5 (ACT-STBY) setup in our environment build in Network centric mode. Some strange issue occurred in last migration, when we failover from Active to Standby and again back from Standby to Active F5 , it is seen that traffic gets freeze and Rogue end point policy got kickoff which marked some of the VIP traffic as Rogue.

 

There is an Article from F5 which talks about disabling IP data plane on the VRF, I would like to understand from techies over here if anyone has gone through this sort of design and issue and what are the best practices followed in this scenario either on ACI/F5.

https://support.f5.com/csp/article/K44023455

3 REPLIES 3
Robert Burns
Cisco Employee

Ravindra,

You have two options (depending on the version you're running).  

If you're running a release prior to 5.2, then your only option is to disable IP Dataplane Learning on the VRF used by the F5 devices.  Depending on your VRF design this may or may-not be desirable.  Where customers are sharing a single VRF across all devices, I've seen customer instead create a new VRF for F5 devices, then disable IP learning only on that VRF.  This way you maintain the benefits from IP DP Learning on your regular endpoints. 

If you're running 5.2, then you can take advantage of the Endpoint-level IP learning disable feature.  This can be applied to a /32 endpoint (like your F5 VIP), subnet or entire BD.  More details on this feature in this link, but this would be your best option if you're at the required version or later.   https://www.cisco.com/c/en/us/solutions/collateral/data-center-virtualization/application-centric-infrastructure/white-paper-c11-739989.html#IPDataplaneLearningperhost 

Robert 

Thanks Robert for swift comment, I have a TAC raised as well, will chase for a downtime and will apply for one of the solution and confirm on activity updates.

STEPAN JANKOVIC
Beginner

Hello Ravindra,

let me add our experience, it may help. We are facing issues with HA systems which are using single MAC address and are hosting many virtual IPs (sharing the same MAC). When we have Rogue Endpoint Detection turned on, and HA system does switchover, multiple IPs, (which are sharing the same MAC) are moving. ACI considers this as a single endpoint moving multiple times and freezes learning, causing severe outage. As a workaround we have R.E.D. turned off (which is undesirable). We raised TAC case, it was considered as a new bug.

https://bst.cloudapps.cisco.com/bugsearch/bug/CSCvw91341

Bug is probably already corrected in 14.2(7f) and in last 15.2. We had no chance to test it as we have 15.1.

Good Luck!

Stepan