We are planning a pilot to add white boarding session for the upcoming ACI webinars. However we do not know what is your top interest, Please join this discussion and let us know which scenario you would like to learn more so that we can prioritize that in next session!
Welkin Tiantang He
Techinical Leader, Services
... View more
ACI uses shared l3out to provide shared services for multiple consumer vrfs. This is achieved via leaking the external subnet from provider vrf. When OSPF is selected to exchange the route between border leaf and exteranl router in consumer vrf, where that provider vrf is also deployed on the same leaf, from the consumer leaf perspective, the external subnet is treated as eBGP. Since eBGP had administrative distance 20 which is preferred than OSPF even the same subnet is learned over OSPF from a different source, shared l3out leaked route would provide a backdoor internally so that external implemented path is bypassed.
Further, it could cause an OSPF flapping due to LSA flush via maxage in a certain use case below. When 192.168.103.1/32 become the DR o f the 192.168.103.0/24 subnet for common_vrf, it would genrate type-2 LSA to firewall. In parallel, because of the shared l3out, 192.168.103.1/32 could also be leaked to user_vrf via shared-l3out, because 192.168.103.1/32 is reachable locally via another VRF, URIB would tag this particular route 192.168.103.1/32 as local. Hence Rtr-ID: 184.108.40.206 would believe it-self as the originator of the type-2 LSA generated by 220.127.116.11. However because the type-2 LSA received from North is more recent because higher age, 18.104.22.168 is flushing the type-2 LSA via max-age and flood back. That would trigger firewall remove all the path to DR and routes learned via DR.
1. How does the type-2 LSA look like:
leaf103# show ip ospf database network 192.168.103.1 detail vrf common:common_vrf OSPF Router with ID (22.214.171.124) (Process ID default VRF welkin:inside) Network Link States (Area 0.0.0.15) LS age: 1369 Options: 0x2 (No TOS-capability, No DC) LS Type: Network Links Link State ID: 192.168.103.1 (Designated Router address) Advertising Router: 126.96.36.199 LS Seq Number: 0x8000002e Checksum: 0x2029 Length: 36 Network Mask: /24 Attached Router: 188.8.131.52 Attached Router: 184.108.40.206 Attached Router: 192.168.103.3
2. user_vrf is max-aging the LSA
The type-2 LSA originated by common vrf is treated as sef-orginated by user_vrf so it was maxaged because the recieved one is more recent, this is wrong.
2018 Jan 5 09:55:28.921944 ospf default : TID 11119:ospf_ha_lsdb_update:4144:(user:user_vrf-base) ObjStore entry for LSA 192.168.103.1(0x2)220.127.116.11 (0x80000111) (0x570e) (3600) area 0.0.0.15 (if-none) updated 2018 Jan 5 09:55:28.921876 ospf default : TID 11119:ospf_ha_lsdb_update:4144:(user:user_vrf-base) ObjStore entry for LSA 192.168.103.1(0x2)18.104.22.168 (0x80000111) (0x570e) (3600) area 0.0.0.15 (if-none) updated 2018 Jan 5 09:55:28.921796 ospf default : TID 11119:ospf_ha_lsdb_update:4144:(user:user_vrf-base) ObjStore entry for LSA 192.168.103.1(0x2)22.214.171.124 (0x80000111) (0x570e) (7) area 0.0.0.15 (if-none) updated 2018 Jan 5 09:55:28.921722 ospf default : TID 11119:ospf_update_lsdb_entry:1661:(user:user_vrf-base) LSA already exists, updating 2018 Jan 5 09:55:28.921713 ospf default : TID 11119:ospfv2_overrule_stale_self_originated_lsa:1292:(user:user_vrf-base) Received self-originated LSA 2018 Jan 5 09:55:28.921673 ospf default : TID 11119:ospfv2_process_newer_lsa:1922:(user:user_vrf-base) LSA is more recent 2018 Jan 5 09:55:28.921665 ospf default : TID 11119:ospfv2_process_incoming_lsa:2330:(user:user_vrf-base) LSA 192.168.103.1(0x2)126.96.36.199 (0x80000111) (0x570e) (7) from 192.168.105.3
The problem is caused because 192.168.103.1/32 is local from the user:user_vrf, which has mis-leading the ospf believe 188.8.131.52 as the originator.
leaf103# vsh -c "show ip route 192.168.103.0 vrf user:user_vrf" 192.168.103.0/24, ubest/mbest: 1/0, attached, direct *via 192.168.103.1%common:common_vrf, Vlan5, [20/0], 01:18:24, bgp-65000, external, tag 65000 leaf103# vsh -c "show ip route 192.168.103.1/32 vrf user:user_vrf" 192.168.103.1/32, ubest/mbest: 1/0, attached *via 192.168.103.1%common:common_vrf, Vlan5, [1/0], 01:18:50, local, attached-export, local
3. Why OSPF believe 184.108.40.206 as the originator
If the LSA's header 192.168.103.1/32 belong to router-id or any local network, OSPF would think itself as the originator flush the LSA. This has made 220.127.116.11 believe himself was the originator and kicked in the LSA-flush.
We will need prevent ACI from leaking the OSPF SVI subnet into user vrf via shared-list, this is an unexpected configuration which can be addressed by:
1. stop aggregate share subnet 0.0.0.0.0/0
2. limit the contract provider and consumer.
... View more
ACI upgrade involves APIC software update and switches update. Switch upgrade is usually very straight forward, however APIC upgrade may evolve to some cluster issues. Here are a few pre-check list we usually recommend to customer to prepare before an upgrade get started.
Preparations for APICs Before Upgrade:
0. Clear all the faults and Overlapped VLAN Blocks
The faults of ACI fabric stand that there are invalid or conflict policies or even disconnected interfaces etc, please understand the trigger and clear them before kick in upgrade. Please be aware, those conflict policies such as "encap already been used" or "Routed port is in L2 mode" would result of unexpected outage because ACI switch upgrade would fetch all the policies from APIC from scratch and follows "first come first serve" behavior. As a result, the unexpected policies would very possibly take over those expected polices.
Overlapped VLAN blocks across different VLAN pool could result of intermittent packet drop as well as spanning-tree loop because of BPDU drops, please refer to this document https://supportforums.cisco.com/t5/data-center-documents/overlap-vlan-pool-lead-intermittent-packet-drop-to-vpc-endpoints/ta-p/3211107 . The Impact of overlapping vlan-pool could become more outstanding after upgrade since all policies are fetched from scratch.
1. Make sure the upgrade path is supported. There are data-conversion involved during the upgrade, follow a supported upgrade path would make sure the database are converted properly.
***very important:Reading the release notes of target APIC version.
2.Backup APIC's configuration to an external server. By any chance if we have to re-import the configuration, this would be the only data we can restore the same configuration. If the encryption of backup is enabled, make sure the encryption key is saved, otherwise, all the passwords include admin's password would not be imported promptly, then we will have to reset the admin password from cli (the admin login to or via USB.)
3.Make sure the CIMC of APICs are accessible. This is to avoid two risks:
a. CIMC 1.5(4e) has a memory leak defect which would lead the impacted APIC (usually APIC2 and above) won't kick off the upgrade. It would also lead APIC1's process crash post the upgrade. You can detect if the CIMC has reached the bad state if the CIMC become not reachable either from GUI/SSH, it is very important to restore that by reset CIMC through disconnect server's power cord, wait for 3 minutes and connect back. Upgrade the CIMC before the APIC upgrade is highly
b.Without CIMC access, we will not be able to access the APIC console remotely if that something went wrong, get all of this access ready before the upgrade is very critical.
4. Make Sure appliance element process was not locked by IPMI defect
We saw a few cases that a CentOS defect (about IPMI) would lock the AE thread. AE (appliance element) is in charge of calling the upgrade utility (installer.py), if AE is locked, the upgrade would not kick in. We can confirm whether AE is impacted by IPMI by CLI:
grep "ipmi" /var/log/dme/log/svc_ifc_ae.bin.log | tail -5
If there is no such hit from the IPMI output or the last IPMI query to chassis was longer than 10 seconds ago in comparison with the system current time (get by date), you may want to reboot the APIC OS before triggerring the upgrade, please do not reboot two or more apics at the same time.
5. Make sure NTP are reachable
This will avoid hitting a know issue which may result apic2-3 stuck in waiting. Details can be found in the troubleshooting cast study below.
6.Review behavior changes of new version and evaluate the potential impact. One example is that if router control enforcement (for l3 out) was turned on for OSPF before ACI version 2.0 (it was there for BGP and was not grey out for OSPF), it would start working as soon as leaf get upgraded to 2.0, so all OSPF routes are filtered out by L3out which would cause outage.
7. Stage the Upgrade In LAB before apply the change in production. It will always be good to get familiar with the newer version by upgrading the lab, have at least a minimum test of the applications.
Preparations for Switches Before Upgrade:
1. Place VPC/redundant pairs into different maintenance group.
APIC won't allow vpc pairs upgrade at the same time from a certain version and beyond, still it is best practice to put vpc pairs into different maintenance group, for non-vpc pairs of switches which backup each other like border leaf switches, they need be put into different groups. So that only one of member is rebooted while the other remain online.
In case the upgrade failed and troubleshooting is required, always start with APIC1, if APIC1 did not finish upgrade, please do not touch APIC2. If APIC1 is done but APIC2 did not complete, please do not touch APIC3, violate this rule could lead the cluster database broken and cluster rebuilt.
1. APIC2 or Above stuck at 75% even APIC1 has completed.
This problem could happen because the APIC1's upgraded version information is not propagated to APIC2 or above. Please be aware, svc_ifc_appliance_director is in charge of the version sync between APICs and store them into a framework so that upgrade utility (and other process) could read.
First, please make sure APIC1 could ping rest of the APIC, this will determine whether we need troubleshoot from leaf switch or continue from APIC itself. If APIC1 can not ping APIC2, you may want to call TAC to troubleshoot the switch. If APIC1 could ping APIC2, then move to second step.
Second, since APICs can talk to each other, which means APIC1's version info should have been replicated to peer but somehow was not accepted, the version info is identified by the followed timestamp. We can run the cli below to confirm the version timestamp of APIC1 from APIC1 self and APIC2 which is waiting at 75% before complete.
apic1# acidiag avread | grep id=1 | cut -d ' ' -f20-21 version=2.0(2f) lm(t):1(2017-10-25T18:01:04.907+11:00)
apic1# acidiag avread | grep common= | cut -d ' ' -f2 common=2017-10-25T18:01:04.907+11:00
apic2# acidiag avread | grep id=1 | cut -d ' ' -f20-21 version=2.0(1m) lm(t):1(2017-10-25T18:20:04.907+11:00)
As showed above on APIC2, APIC1's (old) version 2.0(1m) is even later than APIC1's new version 2.0(2f) timestamp, this prevents APIC2 to accpeted APIC1's newer version propagation, so the installer on APIC2 think that APIC1 did not complete upgrade yet. Instead of moving to data-conversion stage, APIC2 will keep waiting for APIC1. There is a workaround which must be run from APIC1 and only when APIC1 has completed the upgrade successfully and booted up into new version, never run this from any APICs if they are waiting at 75% , this would totally mess up. Consider of the risk, i would suggest you call TAC instead of doing that by yourself.
... View more
We have seen considerable number of Intermittent packet drop if the destination is sitting behind of VPC. The most common cause of this problem is that multiple domains associated with EPG contains overlapped VLAN block. Each vlan-pool has a dedicated range of vxlan-id pre-allocated by APIC, that is why same VLAN from different pool would end up with different vxlan-id. Here are two scenarios we usually see from the field.
1. EPGs deployed on VPC links contains two domains that associate with overlapped VLAN-pool
Because of both domains contain same access-encap vlan-100 (like below), however the allocated vxlan-id on leaf101 is vxlan-8292 but vxlan-8293 on leaf102. This will result of endpoint manager (EPM) process (a NXOS process running from leaf to sync the endpoint behind of VPC to peer leaf and etc) remove the endpoint info (MAC and IP) from hardware so leaf has no idea to forward the packet. This removal is based on the logic that same access-encap vlan deployed on VPC link must have the same vxlan-id.
2. EPGs deployed on individual links contains two domains that associate with overlapped VLAN-pool
Because of both domains contain same access-encap vlan-100 (like below), however the allocated vxlan-id on leaf101 is vxlan-8292 but vxlan-8293 on leaf102. This will result of BPDU packet received on leaf101 VLAN-100 will be dropped on leaf102 because BPDU frame is flooded strictly within the receiving VLAN by encap with vxlan-8292 but leaf102 does not use vxlan-8292 for vlan-100 but vxlan-8293.
1. How to check the vxlan-id consistency
Here is the quickest way to verify if the vxlan-id is matching between two leaf switches. Issue the command below from both leaf101 and leaf102 and compare the fabric-encap.
2.How to confirm the EP info is removed by mismatched vxlan-id
leaf101# less /var/log/dme/log/epmc-trace.txt | grep -A 15 "Unknown FD" [2017 Nov 4 23:32:37.280637631:295753369:epm_mcec_pre_process_ep_req:807:E] Unknown FD vlan/vxlan 8997 bd_vnid 14909413 ... ignoring EP req; ep_flags local|vPC|MAC|sclass|
[2017 Nov 4 23:32:37.280638877:295753370:epm_send_ep_del_ack_to_peer:1174:t] EP req for EP for which FD/BD/VRF/Tun doesn't exist, deleting EP from EP Db, if it exists
[2017 Nov 4 23:32:37.280640484:295753371:epm_process_ep_del:2300:t] Delete req rcvd for EP: [2017 Nov 4 23:32:37.280642648:295753372:epm_debug_dump_epm_ep:398:t] log_collect_ep_event
mac = 0000.1111.2222; num_ips = 0
vlan = 21; epg_vnid = 8599; bd_vnid = 14909414; vrf_vnid = 2195457
ifindex = 0x16000000; tun_ifindex = 0; vtep_tun_ifindex = 0
sclass = 32779; ref_cnt = 4
flags = local|vPC|MAC|sclass|timer|
create_ts = 11/04/2017 15:32:59.046167
upd_ts = 11/04/2017 15:32:59.046167
VLAN pool is a bucket for VLAN ID and VxLAN IDs.
Each EPG/AEP could associate multiple domains, but each domain must associate with a vlan-pool containing unique vlan-block that is not overlapped with any other vlan-pool. This is to ensure the global consistent vlan-to-vxlan mapping.
If the design is for port-local VLAN use case, that is a different story.
Further information can be found in "Understand VLAN-Based EPG" from Cisco Learning Network
... View more