I am having an issue with route leaking between VRFs in EVPN getting blackholed. I swear I had this working in the lab the other day and now it's not.
Example setup is 2 leaf one spine.
L2VNI 1000 L3VNI 1001
L2VNI 2000 L3VNI 2001
Using auto rt/rd except:
route-target export 123:1
route-target export 123:1 evpn
route-target import 123:2
route-target import 123:2 evpn
route-target export 123:2
route-target export 123:2 evpn
route-target import 123:1
route-target import 123:1 evpn
nve1 on leaf A has 1000 1001 2001
nve1 on leaf B has 2000 2001 1001
BGP redistributing static/direct
Everything looks perfectly fine in the show forwarding, show bgp etc etc etc
in fact it all works if I set a loopback interface on leaf a and b.
example I can set loopback 10 to an ip address on the leaves and it works between them from the CLI
like ping 220.127.116.11 vrf Test2 source 18.104.22.168 (where 22.214.171.124 is on leaf a and 126.96.36.199 is on leaf b loopback)
Sniffing the traffic, it looks right.
THE PROBLEM IS.. It doesn't work in the hardware. It's blackholed in the hardware if you look at bcm-shell:
2112 4 188.8.131.52/30 00:00:00:00:00:00 100017 0 0 0 0 y
This is the 'defip' the alpm route which is set to interface 10017 in BCM shell
if you look at the egress interface obect table
Entry Mac Vlan INTF PORT MOD MPLS_LABEL ToCpu Drop RefCount L3MC
100017 00:00:00:00:00:00 4095 4095 1 110 -1 no yes no
It's set to Null, drop.
If you look at: show system internal forwarding vrf Test1 detail
it has an entry:
with nothing after it.
Dev| Prefix | PfxIndex | AdjIndex | LIF
0 184.108.40.206/30 0xcdc967fc 0x186b1 0xfff
LIF set to 0xfff, and 186b1 is 100017 , when it should be set to the remote vtep rmac adjindex and lif
the ones that work have entries with vlan ids and adjacencies etc
So this bug is setting it to null for some strange reason?
The switch is setting it to the wrong egress , it should be set to 10016 in my list which is the tunnel mac for the other switch.
Why is it doing this? I spent days trying to figure this out and I'm utterly frustrated at this point.
Like i said it uses the right interface from the NX-OS CLI but if a server or devices is connected to a port it blackholes it. This has to be a bug, although it looks intentional? I tried with 7.0.3.I7.8 and 9.3.4 so far.
Would appreciate it if someone else can test this in a lab.
I read that, and all the documentation many times.
The config i have is very similar to what is under the title of "Route Leak between VRFs on different VTEPs" in that document with the exception of that I DON'T have the other leafs VNI or VRF or VLAN configured on the other leaf. There is no need for it, as the routes get imported and the label gets set properly and it works perfectly in the software, but programs the hardware incorrectly. Try it out, please!
Doing some more lab testing, the device will actually install it in the hardware if the destination vni and vrf is configured. For example if I want to send traffic to VNI 1000 (VRF test1) from a leaf that has imported the route for VNI 1000, I have to configure a vlan, assign that vlan to vni 1000, create a vrf (wahtever name), create a vlan SVI, assign that SVI to the vrf and ip foward.
That works, however it defeats the purpose of what I was trying to do which was send traffic to a vrf that doesn't exist on the device I am on. Also every vrf i'd want to leak routes to would have to be on every device which is ridiculous. It doesn't scale well. And the shared services or internet vrf device would have to have every customers vni with associated vlan and vrf configured (which it doesn't NEED to be).
This is obviously a bug or a limitation of NXOS where it purposely blackholes the destination if the VNI/VRF/SVI isn't configured on the local device. I don't know why it would do this? Maybe it really is a bug, but since it's in several versions of the OS i've tried I'm leaning on more of a purposeful neuter of what it can do vs a bug, especially since it works in the NXOSv and from the physical hardware CLI.
I'm testing some other vendor devices to see if they have the same restriction and will post back in case someone else encounters this
Tested this on another vendors device and it will allow you to do this (it automatically dynamically creates a vlan and associates it with the VNI if a type 5 route comes in). No need to create a vrf or SVI. Doing more testing this platform also allows centralized gateway on evpn or centralized anycast gw on evpn (using mlag pair).
It does this by advertising the system mac or virtual router mac into the evpn as a type 2, which is exactly what I was trying to recreate on the nexus devices in another thread.
So at the end of the day I suppose we are missing a few tunables here. One is the advertisement of the system mac or virtual mac as a tunable into the evpn as a type 2 route for the l2 vni to get centralized gw to work. The other is inserting proper adjacency for the rmac either by dynamically assigning a vlan from the pool with the rmac in it, which uses resources, or by assigning the rmac into the router vni of the importing vrf (possibly using more resources but a cleaner solution).
I think it's more of how the devices handle the tunnel creation. Needing a vlan on the device per vni to send to a remote vtep is very strange, also needing the svi and vrf created before inserting the route is strange especially since the route is imported into another vrf. Reading the ietf documentation on evpn it does not seem like this is necessary and possibly put in by cisco for some reason or another, maybe security. Since there is no tunables to make it work like i want (similar to mpls l3vpn), it is rather frustrating.
Centralized route leak is working, sort of. Doesn't look like any of this is a hardware limitation, simply software implementation.
The goal is one thing:
leak a route from vrf a(switch1) into vrf b(switch2) and have vrf b be able to send those packets to vrf a, without having vrf a or the vlan/svi for vrf a configured on the device that vrf b is on. It seems to do this, i need to configure vrf a on switch2, along with associated vni/vlan/svi and also import at least one route into vrf a on switch2. None of which is necessary because vrf b has the route imported. It's the way the software implements it.
The other vendor does it a little differently, still not how i'd like but it automatically creates a vlan/vni mapping for the vrf a on switch 2 without me doing anything and it works. I don't think it needs a vlan to vni mapping at all which takes up a vlan slot on the switch2 that is unnecessary.
On top of this, switch1 won't accept packets from vrf b, from switch2, if switch1 doesn't know about switch2 (i.e. it's not in the nve peer list) which is a security thing, but there's no tunable to bypass it. What if i wanted vrf b to be able to send packets to vrf a only, without vrf a being able to respond? so switch1 wouldn't even need to know switch 2 existed and would accept the vxlan packet to vrf a (but it won't, some sort of built in acl to redirect packets to unencap only if they are in peer list, which i get, but wish there was a way to specify what peers to allow manually)