I am having an issue with route leaking between VRFs in EVPN getting blackholed. I swear I had this working in the lab the other day and now it's not.
Example setup is 2 leaf one spine.
Leaf A
vrf Test1
L2VNI 1000 L3VNI 1001
Leaf B
vrf Test2
L2VNI 2000 L3VNI 2001
Using auto rt/rd except:
Vrf Test1
route-target export 123:1
route-target export 123:1 evpn
route-target import 123:2
route-target import 123:2 evpn
Vrf Test2
route-target export 123:2
route-target export 123:2 evpn
route-target import 123:1
route-target import 123:1 evpn
nve1 on leaf A has 1000 1001 2001
nve1 on leaf B has 2000 2001 1001
BGP redistributing static/direct
Everything looks perfectly fine in the show forwarding, show bgp etc etc etc
in fact it all works if I set a loopback interface on leaf a and b.
example I can set loopback 10 to an ip address on the leaves and it works between them from the CLI
like ping 5.5.5.5 vrf Test2 source 6.6.6.6 (where 5.5.5.5 is on leaf a and 6.6.6.6 is on leaf b loopback)
Sniffing the traffic, it looks right.
THE PROBLEM IS.. It doesn't work in the hardware. It's blackholed in the hardware if you look at bcm-shell:
2112 4 4.4.4.4/30 00:00:00:00:00:00 100017 0 0 0 0 y
This is the 'defip' the alpm route which is set to interface 10017 in BCM shell
if you look at the egress interface obect table
Entry Mac Vlan INTF PORT MOD MPLS_LABEL ToCpu Drop RefCount L3MC
100017 00:00:00:00:00:00 4095 4095 1 110 -1 no yes no
It's set to Null, drop.
If you look at: show system internal forwarding vrf Test1 detail
it has an entry:
4.4.4.4/30 ,
with nothing after it.
Dev| Prefix | PfxIndex | AdjIndex | LIF
0 4.4.4.4/30 0xcdc967fc 0x186b1 0xfff
LIF set to 0xfff, and 186b1 is 100017 , when it should be set to the remote vtep rmac adjindex and lif
the ones that work have entries with vlan ids and adjacencies etc
So this bug is setting it to null for some strange reason?
The switch is setting it to the wrong egress , it should be set to 10016 in my list which is the tunnel mac for the other switch.
Why is it doing this? I spent days trying to figure this out and I'm utterly frustrated at this point.
Like i said it uses the right interface from the NX-OS CLI but if a server or devices is connected to a port it blackholes it. This has to be a bug, although it looks intentional? I tried with 7.0.3.I7.8 and 9.3.4 so far.
Would appreciate it if someone else can test this in a lab.