Re: How to connect firewalls in VXLAN BGP EVPN data center network

enginer SNOS · ‎03-01-2020

Hello everyone!

Please, help with a problem: how to connect two firewalls (ASA) in active-active routed mode in the data center network built on VXLAN BGP EVPN? For fault tolerance firewalls must be located at different sites.
We tried to connect them to different border leaves (pair of N9K switches), each firewall node was connected using vPC, but this scheme worked poorly.

I'll be very grateful for any help.

Francesco Molino · ‎03-02-2020

Hi
Can you share a design how you interconnected them? Were they doing dynamic or static routing?

Here 2 docs showing dual attached firewall:
https://www.cisco.com/c/en/us/td/docs/switches/datacenter/nexus9000/sw/92x/vxlan-92x/configuration/guide/b-cisco-nexus-9000-series-nx-os-vxlan-configuration-guide-92x/b_Cisco_Nexus_9000_Series_NX-OS_VXLAN_Configuration_Guide_9x_appendix_010110.html

https://www.cisco.com/c/dam/en/us/products/collateral/switches/nexus-7000-series-switches/white-paper-c11-736585.pdf

You can also have them single attached and as they're in active/active, your design will be redundant anyways. If you can share a quick sketch maybe there's something I'm missing in your design.

Thanks
Francesco
PS: Please don't forget to rate and select as validated answer if this answered your question

enginer SNOS · ‎03-03-2020

Hi!

There is an existing design in schem.jpg

I think I was mistaken with the description at the beginning. Two ASAs are in cluster and in routed mode. They do static routing. We have several VRFs, and we need to filter traffic between them, and also we need to filter traffic between these VRFs and outside network. Each ASA is connected using vPC.

Francesco Molino · ‎03-03-2020

Ok this is 1 cluster with 2 members. 1 member on each site.
To be able to interconnect them you need to respect some requirements:
https://www.cisco.com/c/en/us/td/docs/security/asa/asa99/configuration/general/asa-99-general-config/ha-cluster.html#ID-2170-0000038b

if you respect all requirements, i recommend to open a TAC case to investigate issues you were having.
By the way what kind of issues you ran into?

The design is ok based on the sketch uploaded.

Thanks
Francesco
PS: Please don't forget to rate and select as validated answer if this answered your question

enginer SNOS · ‎03-03-2020

As firewalls we have two ASA 5585-X SSP-20. Each one is connected with two 10G interfaces in vPC as data link. We also have ASA5585-NM-20-1GE modules in each ASA. Four 10/100/1000BASE-T interfaces on these modules and four the same interfaces on the ASA itself form Cluster Control Link, connected in vPC. 12 optical interfaces on the module are free.

This seems to be the only difference from the recommendations. It's ok? Or is it worth remaking the connection?

I will try to describe the problems. Suppose we have vlan 5 for hosts, which is located in the vrf XXXX. Configuration of the vlan 5 and its interface looks like this:

vlan 5
vn-segment 5000

interface Vlan5
no shutdown
vrf member XXXX
no ip redirects
ip address 192.168.1.254/24
no ipv6 redirects
fabric forwarding mode anycast-gateway

Also we have vlan 111 and its interface, which is a point-to-point connection with ASA:

vlan 111
  vn-segment 111000
interface Vlan111
  no shutdown
  vrf member XXXX
  no ip redirects
  ip address 10.10.10.6/29
  no ipv6 redirects
  fabric forwarding mode anycast-gateway

Suppose, 10.10.10.1 is ASA cluster IP, .2 and .3 is a cluster member's IPs. Finally, there is vlan 555, which is used as L3 VNI for this VRF routing.

vlan 555
  vn-segment 555000
interface Vlan555
  no shutdown
  vrf member XXXX
  no ip redirects
  no ipv6 redirects

And we have our vrf XXXX configured like this:

vrf context XXXX
  vni 555000
  ip route 0.0.0.0/0 10.10.10.1
  rd auto
  address-family ipv4 unicast
    route-target both auto
    route-target both auto evpn
router bgp 65500
  vrf XXXX
    address-family ipv4 unicast
      advertise l2vpn evpn
      redistribute static route-map permitall
      default-information originate

With these settings, the hosts in vlan 5 work, but sometimes they fall off without any regularity and stop pinging. It happened 2-3 pings and up to an hour of inaccessibility. And the most interesting we see in the logs:

20:57:55 %USER-2-SYSTEM_MSG: Detected duplicate host 0001.0000.000c, topology 111, during Local update, with host located at remote VTEP 192.168.200.101, VNI 111000 - l2rib
20:58:57 %USER-2-SYSTEM_MSG: Detected duplicate host 0001.0000.000c, topology 111, during Local update, with host located at remote VTEP 192.168.200.101, VNI 111000 - l2rib
21:00:15 %USER-2-SYSTEM_MSG: Detected duplicate host 0001.0000.000c, topology 111, during Local update, with host located at remote VTEP Po1 - l2rib
21:00:46 %USER-2-SYSTEM_MSG: Detected duplicate host 0001.0000.000c, topology 111, during Remote update, from VTEP 192.168.200.101, VNI 111000 - l2rib
some messages every minute or two
21:25:36 %USER-2-SYSTEM_MSG: Unfreeze limit (3) hit, MAC 0001.0000.000c in topo: 111 is permanently frozen - l2rib
and so on, endlessly

This duplicated mac-address is ASAs cluster interface address in vl111, which is connected on the data interfaces on both sides:

SW1# sh mac address-table | i 0001.0000.000c
* 111     0001.0000.000c   dynamic  0         F      F    Po1

SW2# sh mac address-table | i 0001.0000.000c
+ 111     0001.0000.000c   dynamic  0         F      F    Po1

SW1 and SW2 are vPC pair N9K switches, SW3 and SW4 as well

SW3# sh mac address-table | i 0001.0000.000c
+ 111     0001.0000.000c   dynamic  0         F      F    Po1

SW4# sh mac address-table | i 0001.0000.000c
* 111     0001.0000.000c   dynamic  0         F      F    Po1

For a whole year we had no idea how to deal with this. We opened many cases in TAC. They examined separately either the firewall or the switches, and found nothing. In the end, we decided that the problem is in the wrong design. Then we started searching for the best design and here I am :)

sorry for such a long comment

Francesco Molino · ‎03-04-2020

Ok thanks for the great explaination.
Honestly in terms of design/links, for now it looks ok and if there was an issue there, sure the TAC would have pointed out.
Now regarding your issue, it reminds me same issues i encountered with OTV as dci.
What i would suggest is to filter the virtual mac from asa to be exchanged between the 2 sites using a route-map.
Take a look here on how to implement mac (L2) filtering:
https://www.cisco.com/c/en/us/td/docs/switches/datacenter/nexus9000/sw/93x/vxlan/configuration/guide/b-cisco-nexus-9000-series-nx-os-vxlan-configuration-guide-93x/b-cisco-nexus-9000-series-nx-os-vxlan-configuration-guide-93x_chapter_010100.html#id_10...

Be careful, you want to deny your asa vMac to discuss to any but keep the rest.
To be honest, never had this design yet in vxlan environment but had to deal with it in OTV. This is the way i solved my issue by filtering this vmac on OTV edge devices.

Thanks
Francesco
PS: Please don't forget to rate and select as validated answer if this answered your question

enginer SNOS · ‎03-04-2020

Filtering is an interesting option, but then the question arises: if ASA's mac-address will not be transferred to another site, then the hosts on that site will access the mac-address that is connected directly. But if we assume that the directly connected ASA falls, then where will they go?

Regarding design, we considered the following options:
1) Do not change the design. Make not a cluster, but a failover, one firewall active, the other passive. I guess there will be one mac-address on one side, if this ASA falls, another rises on the other side. In theory, the problem of duplicated mac-addresses should be solved. Correct if I am wrong. On the other hand, all traffic between the VRFs will go to one ASA, and it will become a bottleneck.
2) Connect both firewalls using vPC to one pair of N9K switches, which will be one border leaf in the network instead of two now. In this case, the switches will see these mac-addresses behind the vPC, and in theory we will get rid of duplicated. The difficulty here is that each in pair of switches and each in pair of firewalls must be physically on different sites, for reasons of fault tolerance. I attached a diagram of how it is planned to connect. Dashed line indicates different buildings. And of course, the other difficulty is that there is a lot of reconfiguration.

Maybe there are other options?

Francesco Molino · ‎03-05-2020

The mac address filtering will remove duplicates but won't work in a vxlan environment where default gateway will always be local even if ASA fails. I need to think about it. There's no so much validated design in this specific environment. There's maybe a quick way by using scripts to remove config in case something fails.
I don't have any available lab right now but can try later next week.
I'm skeptical TAC didn't give you any answers on this. Can you share your Nexus configurations?

For other alternative designs, I would go with the schem2 you shared. At the end, it makes more sense and you'll avoid any duplicates.

Thanks
Francesco
PS: Please don't forget to rate and select as validated answer if this answered your question

enginer SNOS · ‎03-10-2020

I can’t post the configuration publicly. Maybe only some specific places in the config if you say which ones. You hardly need the settings for aaa or fex)))

I thought that at first I should switch the firewalls to a couple of vPC switches. And so I’ll immediately check if duplicates disappear. Then I think of sending some traffic through the firewall to check how it works. If everything works, I can try to give a load there. Then it remains only to transfer the connections to external networks to these switches with the firewall connected. Although does it make sense to transfer them there or leave it as it is?

If you still decide to try this design in the lab, I would be interested to know what happened ))

Francesco Molino · ‎03-11-2020

You can PM me to exchange your configs (without any aaa or confidential information). After that i can try to build a lab and test it

Thanks
Francesco
PS: Please don't forget to rate and select as validated answer if this answered your question

Alexander09 · ‎05-27-2020

Hi,

I had a similar messages popping up with a firewall (pfsense using CARP similar to VRRP) and failover wasn't working and both firewalls where active at the same time and the virtual Mac was advertised from the firewalls to their respectively leafs where they were connected to.

My problem was that a certain mcast group (using PIM BiDir with phantom RP and use it active-active with 2 different mcast ranges) that my devices where the RP were configured (ncs5001) I had a typo on the interfaces participation in multicast routing. (i.e bundle-ethernet1.200 instead of bundle-ethernet1.2000)

Not saying this will solve your issue but it might give you an idea to make sure mcast config is accurate (if not using ingress replication;) )

good hunting!

--
Alexander Deca

enginer SNOS · ‎05-29-2020

Hi,

thank you for sharing your experience. We are using ingress replication. In this case, your decision will not help me?

By the way, due to the need, one of ASA had to be temporarily removed from the cluster and disconnected from the Nexus switches. There is only one ASA left, and there are no more such logs.

It seems to me that this proves that the active-active cluster should be connected most likely to one pair of switches in VPC, and not to different ones, or if to different ones, then do not make it active-active, make one ASA standby.

asiergd · ‎06-14-2020

Hi,

What is the latency between sites?

@enginer SNOS wrote:
Hi!
There is an existing design in schem.jpg
I think I was mistaken with the description at the beginning. Two ASAs are in cluster and in routed mode. They do static routing. We have several VRFs, and we need to filter traffic between them, and also we need to filter traffic between these VRFs and outside network. Each ASA is connected using vPC.

Is hmm track confugured for static routes?

Thanks

enginer SNOS · ‎06-19-2020

Hi!
I will be very grateful if you tell me how to measure the latency. And how to see if hmm track is configured.
Thanks.