cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
787
Views
0
Helpful
1
Replies

N9K-C9396PX and N3K-C3164Q Strange VXLAN behavior

ss1
Level 1
Level 1

Hi

We recently built a new VXLAN topology in our datacenter which is pretty much like a Leaf-and-Spine. Strange issues happen during our laboratory testing and that's why I decided to explain what we did. Hope somebody here may have some clue as for what might be wrong in our case.

A total of 8 N9K-C9396PX are used as leafs and customer access. They are all connected to 2 N3K-C3164Q, say, spine 1 and spine 2. So I have a VLAN from spine 1 to all leafs [1-8] and another VLAN from spine 2 to the same leafs [1-8]. 

Spines and leafs see each other through switchports rather than routerports. A VLAN interface is present with OSPF and BFD running on them. Then, Loopback addresses are redistributed as /32 through the OSPF. Finally a BGP in L2VPN EVPN address-families peer the Spine loopbacks with the leaf loopbacks. We have 'ip pim sparse-mode' on both the VLAN interface bringing up OSPF and the loopback interface used for BGP and NVE source purposes. A static rp-address is also configured for the multicast groups subnet - we selected 225.X.AB.CD (so 225.0.0.0/8) in order to be able to configure a different multicast group for every different vn-segment. The rp-address is up on both spines as a different loopback and also redistributed through OSPF. 

The NX-OS versions used are: 7.0.3.I7.7 on Leafs and 7.0.3.I6.1 on Spine layer.
Unfortunately the success of this is only partial and our efforts to let this configuration work stable proven to bring no result in the last 2 weeks ... 
All MAC addresses are successfully seen on the NVE interfaces. However - there is no unicast traffic flow but only broadcasts. We tried connecting 2 servers on 2 different leafs and we cannot manage to run data traffic and/or ping between them - ARP traffic is successfully passing though.

The weird part of the story is that this issue is not consistent in all cases. For example, I was able to get exactly 12 pings every time I did shutdown/no shutdown on interface nve1 of the Leaf. I did this test several times. 2 hours later it had suddenly started to operate perfect without any configuration change (!) and it was okay until I shutdown/no shutdown the nve1 interface again. Nothing helped to restore the transport.

I have inspected the transport part concerning the overlay VLAN and it looks okay. No STP topology changes, blocks etc. Leaf-to-leaf jumbo MTU and 9000 bytes on the VLAN interface - proven with ping df-bit between a loopback and loopback. Further, we checked at 'show ip mroute' to see if the required multicast groups are registered and it seems so. But unfortunately the nve counters stay zero on Rx on both sides.
# show nve vni 9003335 counters  
VNI: 9003335
TX
       128 unicast packets 30700 unicast bytes
       4771 multicast packets 402876 multicast bytes
RX
       0 unicast packets 0 unicast bytes
       0 multicast packets 0 multicast bytes

 

Apologies for the long message and really hope somebody may have a clue as for what might be wrong here ... if needed I will paste some actual configs of course. I would be very thankful in case of some shared experience.

Thank you so much!

 

1 Reply 1

ss1
Level 1
Level 1

Not sure if it could have something in common with CSCvs90075 but I have to mention that last night I tried to upgrade to NX-OS 7.0.3.I7.8 and it didn't help either. Today I will also try to set NX-OS 9.3.3 and try it.
Any help or clue would be greatly appreciated because we intend to deploy this on a large scale environment which consists of approx. 900 VLANs currently on pure Layer 2 switching between all devices. A VXLAN solution would optimize the processes up to a great extent.
Thank you.

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community:

Innovations in Cisco Full Stack Observability - A new webinar from Cisco