CSR1000v High Availability capability with AWS

jellybeanshiba · ‎11-20-2018

Dear community,

Currently I'm trying to implement what is described in the following documentation – to let CSR modify VPC Route Table, taking over a route usually forwarded to another CSR, in case of it's failure.

I could already configure what is described in the documentation, but – When there are two CSRs actively forwarding traffic for each different Availability Zone (AZ), and a failover (VPC Route Table modification made by CSR) happens, nothing happens after when affected CSR is back normal. But when the affected CSR is back, I'd like the VPC Route Table to get modified again, so that it's original entry will be restored. Is there any way to do that?

Also, how do both CSRs know, which one of them should do the take over? I mean, if they're both running normally but BFD gets's down, they should both try do a take over (VPC Route Table modification)? But only one CSR does actually try to take over – as shown below.

CSR1

*Nov 20 14:39:18.057: %BFDFSM-6-BFD_SESS_DOWN: BFD-SYSLOG: BFD session ld:4097 handle:1,is going Down Reason: ECHO FAILURE
*Nov 20 14:39:18.058: %VXE_CLOUD_HA-6-BFDEVENT: VXE BFD peer 172.17.1.2 interface Tunnel1 transitioned to down
*Nov 20 14:39:18.058: %BFD-6-BFD_SESS_DESTROYED: BFD-SYSLOG: bfd_session_destroyed,  ld:4097 neigh proc:EIGRP, handle:1 act
*Nov 20 14:39:18.119: %DUAL-5-NBRCHANGE: EIGRP-IPv4 1: Neighbor 172.17.1.2 (Tunnel1) is down: BFD peer down notified
*Nov 20 14:39:19.740: %VXE_CLOUD_HA-6-SUCCESS: VXE Cloud HA BFD state transitioned, HA node 1 event route update successful

CSR2

*Nov 20 14:39:16.512: %DUAL-5-NBRCHANGE: EIGRP-IPv4 1: Neighbor 172.17.1.1 (Tunnel1) is down: interface down
*Nov 20 14:39:16.512: %BFD-6-BFD_SESS_DESTROYED: BFD-SYSLOG: bfd_session_destroyed,  ld:4097 neigh proc:EIGRP, handle:1 act
*Nov 20 14:39:17.920: %BFDFSM-6-BFD_SESS_DOWN: BFD-SYSLOG: BFD session ld:4097 handle:1,is going Down Reason: ECHO FAILURE
*Nov 20 14:39:17.920: %VXE_CLOUD_HA-6-BFDEVENT: VXE BFD peer 172.17.1.1 interface Tunnel1 transitioned to down
*Nov 20 14:39:18.511: %LINEPROTO-5-UPDOWN: Line protocol on Interface Tunnel1, changed state to down
*Nov 20 14:39:18.512: %LINK-5-CHANGED: Interface Tunnel1, changed state to administratively down

Does anybody have an experience in such configuration? Any advice would be very much appreciated.

Regards,

jellybeanshiba