cancel
Showing results for 
Search instead for 
Did you mean: 
cancel

Overlap VLAN pool Lead Intermittent Packet Drop to VPC Endpoints and Spanning-tree Loop

3012
Views
35
Helpful
1
Comments

Symptoms

We have seen considerable number of Intermittent packet drop if the destination is sitting behind of VPC. The most common cause of this problem is that multiple domains associated with EPG contains overlapped VLAN block. Each vlan-pool has a dedicated range of vxlan-id pre-allocated by APIC, that is why same VLAN from different pool would end up with different vxlan-id. Here are two scenarios we usually see from the field. 

 

1. EPGs deployed on VPC links contains two domains that associate with overlapped VLAN-pool

Because of both domains contain same access-encap vlan-100 (like below), however the allocated vxlan-id on leaf101 is vxlan-8292 but vxlan-8293 on leaf102. This will result of endpoint manager (EPM) process  (a NXOS process running from leaf to sync the endpoint behind of VPC to peer leaf and etc) remove the endpoint info (MAC and IP) from hardware so leaf has no idea to forward the packet. This removal is based on the logic that same access-encap vlan deployed on VPC link must have the same vxlan-id.

vlan-pool-overlap.PNG

2. EPGs deployed on individual links contains two domains that associate with overlapped VLAN-pool

Because of both domains contain same access-encap vlan-100 (like below), however the allocated vxlan-id on leaf101 is vxlan-8292  but vxlan-8293 on leaf102. This will result of BPDU packet received on leaf101 VLAN-100 will be dropped on leaf102 because BPDU frame is flooded strictly within the receiving VLAN by encap with vxlan-8292 but leaf102 does not use vxlan-8292 for vlan-100 but vxlan-8293.

vlan-pool-overlap-bpdu.PNG

Diagnosis

1. How to check the vxlan-id consistency

 

 Here is the quickest way to verify if the vxlan-id is matching between two leaf switches. Issue the command below from both leaf101 and leaf102 and compare the fabric-encap.diagnost.PNG

2.How to confirm the EP info is removed by mismatched vxlan-id

leaf101# less /var/log/dme/log/epmc-trace.txt | grep -A 15 "Unknown FD"
[2017 Nov 4 23:32:37.280637631:295753369:epm_mcec_pre_process_ep_req:807:E] Unknown FD vlan/vxlan 8997 bd_vnid 14909413 ... ignoring EP req; ep_flags local|vPC|MAC|sclass| [2017 Nov 4 23:32:37.280638877:295753370:epm_send_ep_del_ack_to_peer:1174:t] EP req for EP for which FD/BD/VRF/Tun doesn't exist, deleting EP from EP Db, if it exists [2017 Nov 4 23:32:37.280640484:295753371:epm_process_ep_del:2300:t] Delete req rcvd for EP:
[2017 Nov 4 23:32:37.280642648:295753372:epm_debug_dump_epm_ep:398:t] log_collect_ep_event EP entry: mac = 0000.1111.2222; num_ips = 0 vlan = 21; epg_vnid = 8599; bd_vnid = 14909414; vrf_vnid = 2195457 ifindex = 0x16000000; tun_ifindex = 0; vtep_tun_ifindex = 0 sclass = 32779; ref_cnt = 4 flags = local|vPC|MAC|sclass|timer| create_ts = 11/04/2017 15:32:59.046167 upd_ts = 11/04/2017 15:32:59.046167

Solution

  • VLAN pool is a bucket for VLAN ID and VxLAN IDs.
  • Each EPG/AEP could associate multiple domains, but each domain must associate with a vlan-pool containing unique vlan-block that is not overlapped with any other vlan-pool. This is to ensure the global consistent vlan-to-vxlan mapping.
  • If the design is for port-local VLAN use case, that is a different story. 
  • Further information can be found in "Understand VLAN-Based EPG" from  Cisco Learning Network 
Comments
Enthusiast

Thank you for the info.