we are trying to attach windows and redhat servers on vpc portchannels to FEXes (n2248 hooked on N9K-C9372PX-E) (servers set to active-active/LACP). Even if switch looks fine - lacp neighbor info is correct, MAC addresses are learnt on portchannels (all L2 access), some servers are not able to ping each other (they are all on the same vlan) - they even don't learn ARP from the others.
For troubleshooting purpose we turned servers to active-standby and configured switchports on FEXes as standard access ports without portchannel - they effectively all became orphan ports in the vlan. With this setup servers can ping each other only if all NICs are active on the same switch. Once we move active NIC to another switch - servers are not able to ping each other. In all cases MAC address table looks correctly and MAC addresses are learnt where they should be (including entries for vpc-peerlink). We use nx-os 7.0(3)I5(1)
when vPC was not working we wanted to have at least some resiliency and went to active-standby. Diagram shows what doesn't work - when both servers have active NICs on one FEX, they can reach each other, if active NICs are on different switches, they can't. All four FEX ports are simple access ports all in the same vlan. Vlan is allowed on peerlink and MAC addresses of servers are seen on the switches.
leaf03# sh run vpc
vpc domain 2
peer-keepalive destination 10.0.0.4 source 10.0.0.3
no layer3 peer-router syslog
ipv6 nd synchronize
ip arp synchronize
....show vpc doesn't show related vlan as we have many vlans in campus. It shows first 6 rows of vlans and then "...". But show int trunk displays vlan in forwarding not-pruned state on vpc peerlink. Also MAC address learning across peelink works fine for that vlan.
You don't have any orphan ports configured on the VPC domain
Your configuration should look like the following. It will only be active at one N9K switch at a time and once the primary VPC switche falls the other port will kick in.
configure terminal switch(config)# interface ethernet 3/1 switch(config-if)# vpc orphan-ports suspend switch(config-if)# exit
Not sure why you want to set it up this way. You would be better off setting it up like the image I attached.
So basically it doesn't know that it's an orphan port unless you configure it that way.
Also CFS is running on the switches with the "show cfs status" command.
These are pretty good links...
If you configured port- channel 106 as a vPC port member I think you have a design and configuration issue. If that is the case the issue is that the traffic between vPC member ports accross the Peer Link are dropped because in a correct vPC design this would be a duplicated traffic. Also It is not correct to configure the same port-channel with two independent devices unless they are running vPC (the access switches of your diagram aren't running vPC).
The recommended design is to connect both access switches to each Leafs switches using diferrent Port-channel per access switch. Both port-channel have to be configured on Leafs as a vPC member port.
short update - it seems to be caused by cisco bug CSCvc12950.
this log entry pointed us to that:
%DAEMON-2-SYSTEM_MSG: Table Full/Hash collision Vxlan VP 4885, vlan 829, Please reduce the number of Vxlan VPs - pixmc
we upgraded to 7.0.3.I7.3 and testing, seems to be fixed