05-22-2017 12:33 AM - edited 03-01-2019 05:14 AM
Hi
I am troubleshooting on an issue relating to bringing up an ACI multipod.
I have a 'seed' pod (pod1) ready. IPN (an n5k) is up. APIC on pod1 is able to learn DHCP Discover originated from the 1st spine (pod2-sp1) of the remote pod (pod2). APIC GUI allows me to register that switch. After I explicitly register it on the GUI, a series of DHCP Offer, DHCP Request and DHCP ACK were exchanged between pod2-sp1 and the APIC via the DHCP agent on the IPN switch port connected by pod2-sp1.
However, it is strange that after the DHCP ACK from pod2-sp1. There was no more communication between pod2-sp1 and APIC which is the reason I believe explaining why no VTEP IP was assigned to pod2-sp1. I have no idea why. Please kindly help.
Ethanalyzer trace capture on the IPN n5k's inbound-hi shows the details:
2017-05-22 06:53:46.075475 0.0.0.0 -> 255.255.255.255 DHCP DHCP Discover - Transaction ID 0x720b5f3f
2017-05-22 06:53:46.076422 70.0.0.38 -> 50.0.128.1 DHCP DHCP Discover - Transaction ID 0x720b5f3f <--- 50.0.128.1 is APIC1 TEP IP
2017-05-22 06:53:46.076535 70.0.0.38 -> 50.0.128.2 DHCP DHCP Discover - Transaction ID 0x720b5f3f <--- 50.0.128.1 is APIC2 TEP IP
2017-05-22 06:53:46.076648 50.0.128.1 -> 70.0.0.38 DHCP DHCP Offer - Transaction ID 0x720b5f3f
2017-05-22 06:53:46.076730 70.0.0.38 -> 50.0.128.3 DHCP DHCP Discover - Transaction ID 0x720b5f3f <--- 50.0.128.3 is APIC3 TEP IP
2017-05-22 06:53:46.076811 50.0.128.2 -> 70.0.0.38 DHCP DHCP Offer - Transaction ID 0x720b5f3f
2017-05-22 06:53:46.076974 50.0.128.3 -> 70.0.0.38 DHCP DHCP Offer - Transaction ID 0x720b5f3f
2017-05-22 06:53:46.077630 70.0.0.38 -> 255.255.255.255 DHCP DHCP Offer - Transaction ID 0x720b5f3f
2017-05-22 06:53:46.078130 70.0.0.38 -> 255.255.255.255 DHCP DHCP Offer - Transaction ID 0x720b5f3f
2017-05-22 06:53:46.078635 70.0.0.38 -> 255.255.255.255 DHCP DHCP Offer - Transaction ID 0x720b5f3f
2017-05-22 06:53:57.078391 0.0.0.0 -> 255.255.255.255 DHCP DHCP Request - Transaction ID 0x720b5f3f
2017-05-22 06:53:57.079344 70.0.0.38 -> 50.0.128.1 DHCP DHCP Request - Transaction ID 0x720b5f3f
2017-05-22 06:53:57.079454 70.0.0.38 -> 50.0.128.2 DHCP DHCP Request - Transaction ID 0x720b5f3f
2017-05-22 06:53:57.079591 50.0.128.1 -> 70.0.0.38 DHCP DHCP ACK - Transaction ID 0x720b5f3f
2017-05-22 06:53:57.079600 70.0.0.38 -> 50.0.128.3 DHCP DHCP Request - Transaction ID 0x720b5f3f
2017-05-22 06:53:57.079816 50.0.128.2 -> 70.0.0.38 DHCP DHCP ACK - Transaction ID 0x720b5f3f
2017-05-22 06:53:57.079876 50.0.128.3 -> 70.0.0.38 DHCP DHCP ACK - Transaction ID 0x720b5f3f
2017-05-22 06:53:57.080351 70.0.0.38 -> 255.255.255.255 DHCP DHCP ACK - Transaction ID 0x720b5f3f
2017-05-22 06:53:57.080857 70.0.0.38 -> 255.255.255.255 DHCP DHCP ACK - Transaction ID 0x720b5f3f
2017-05-22 06:53:57.081378 70.0.0.38 -> 255.255.255.255 DHCP DHCP ACK - Transaction ID 0x720b5f3f
... no more traffic between pod2-sp1 and APIC....
The IPN n5k config is as below:
policy-map type network-qos jumbo
class type network-qos class-default
mtu 9216
system qos
service-policy type network-qos jumbo
ip pim ssm range 232.0.0.0/8
vlan 4
name infra
service dhcp
ip dhcp relay
vrf context fabric-mpod
ip pim rp-address 12.1.1.1 group-list 225.0.0.0/8 bidir
ip pim rp-address 12.1.1.1 group-list 239.255.255.240/28 bidir
ip pim ssm range 232.0.0.0/8
interface Ethernet1/1
no switchport
mtu 9150
interface Ethernet1/1.4
description pod1-spine1
mtu 9150
encapsulation dot1Q 4
vrf member fabric-mpod
ip address 70.0.0.34/30
ip ospf network point-to-point
ip ospf mtu-ignore
ip router ospf a1 area 0.0.0.0
ip pim sparse-mode
ip dhcp relay address 50.0.128.1
ip dhcp relay address 50.0.128.2
ip dhcp relay address 50.0.128.3
interface Ethernet1/2
interface Ethernet1/3
interface Ethernet1/4
interface Ethernet1/5
interface Ethernet1/6
interface Ethernet1/7
interface Ethernet1/8
interface Ethernet1/9
interface Ethernet1/10
interface Ethernet1/11
interface Ethernet1/12
interface Ethernet2/1
no switchport
mtu 9150
interface Ethernet2/1.4
description pod2-spine1
mtu 9150
encapsulation dot1Q 4
vrf member fabric-mpod
ip address 70.0.0.38/30
ip ospf network point-to-point
ip ospf mtu-ignore
ip router ospf a1 area 0.0.0.0
ip pim sparse-mode
ip dhcp relay address 50.0.128.1
ip dhcp relay address 50.0.128.2
ip dhcp relay address 50.0.128.3
interface Ethernet2/2
interface Ethernet2/3
interface Ethernet2/4
interface Ethernet2/5
interface Ethernet2/6
interface Ethernet2/7
interface Ethernet2/8
interface Ethernet2/9
interface Ethernet2/10
interface Ethernet2/11
interface Ethernet2/12
interface loopback29
vrf member fabric-mpod
ip address 12.1.1.1/32
ip ospf network point-to-point
ip router ospf a1 area 0.0.0.0
ip pim sparse-mode
line console
line vty
boot kickstart bootflash:/n6000-uk9-kickstart.7.1.1.N1.1.bin
boot system bootflash:/n6000-uk9.7.1.1.N1.1.bin
router ospf a1
vrf fabric-mpod
router-id 29.29.29.29
Solved! Go to Solution.
05-26-2017 11:45 AM
Clement,
Could you also connect a leaf into the spine?
This might not resolve the issue, but it is a requirement for pod spine bringup.
If the issue persists after connecting a leaf and forming LLDP neighbors, then I would recommend opening a TAC service request to further troubleshoot the forwarding behavior within the spine.
Once you come to resolution, it would be great to update this thread with the solution to benefit the support community.
Jason
05-22-2017 07:08 AM
Before the new spine receives a TEP address from DHCP, it must first receive its L3 out IP via DHCP (the IP used for OSPF between spine and IPN). Try verifying the following:
1. Sub-interface IP address for pod2-spine is configured in the APIC GUI (Check the interface profile of the Infra tenant L3 out). If no IP is configured, then add one.
2. Log into the console of the spine and run a show ip interface brief vrf overlay-1 You should see the L3 out IP assigned to the sub-interface.
3. If the sub-interface has been assigned an IP address, then check the pod-2 spine routing table for vrf overlay-1. If the spine is at the point to where it has a sub-int IP but no TEP address (IP for Loopback 0), then it should either
A) Have a static route to the APIC TEP (the APIC which assigned the sub-int IP to the spine)
or
B) OSPF should be up and it should learn routes via OSPF
Lets check the parts above and see where the discovery process is stuck at.
Jason
05-22-2017 10:16 PM
Hi Jason
1. Sub-interface IP address (70.0.0.37/32) for pod2-spine has been configured in the APIC GUI.
2. Pod2-Spine1# show ip interface brief vrf overlay-1
IP Interface Status for VRF "overlay-1"(4)
Interface Address Interface Status
eth1/1 unassigned protocol-down/link-down/admin-up
eth1/2 unassigned protocol-down/link-down/admin-up
eth1/3 unassigned protocol-down/link-down/admin-up
eth1/4 unassigned protocol-down/link-down/admin-up
eth1/5 unassigned protocol-down/link-down/admin-up
eth1/6 unassigned protocol-down/link-down/admin-up
eth1/7 unassigned protocol-down/link-down/admin-up
eth1/8 unassigned protocol-down/link-down/admin-up
eth1/9 unassigned protocol-down/link-down/admin-up
eth1/10 unassigned protocol-down/link-down/admin-up
eth1/11 unassigned protocol-down/link-down/admin-up
eth1/12 unassigned protocol-down/link-down/admin-up
eth1/13 unassigned protocol-down/link-down/admin-up
eth1/14 unassigned protocol-down/link-down/admin-up
eth1/15 unassigned protocol-down/link-down/admin-up
eth1/16 unassigned protocol-down/link-down/admin-up
eth1/17 unassigned protocol-down/link-down/admin-up
eth1/18 unassigned protocol-down/link-down/admin-up
eth1/19 unassigned protocol-down/link-down/admin-up
eth1/20 unassigned protocol-down/link-down/admin-up
eth1/21 unassigned protocol-down/link-down/admin-up
eth1/22 unassigned protocol-down/link-down/admin-up
eth1/23 unassigned protocol-down/link-down/admin-up
eth1/24 unassigned protocol-down/link-down/admin-up
eth1/25 unassigned protocol-down/link-down/admin-up
eth1/26 unassigned protocol-down/link-down/admin-up
eth1/27 unassigned protocol-down/link-down/admin-up
eth1/28 unassigned protocol-down/link-down/admin-up
eth1/29 unassigned protocol-down/link-down/admin-up
eth1/30 unassigned protocol-down/link-down/admin-up
eth1/31 unassigned protocol-down/link-down/admin-up
eth1/32 unassigned protocol-up/link-up/admin-up
eth1/32.32 70.0.0.37/30 protocol-up/link-up/admin-up
3)
Pod2-Spine1# show ip route vrf overlay-1
IP Route Table for VRF "overlay-1"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>
50.0.128.1/32, ubest/mbest: 1/0
*via 70.0.0.38, eth1/32.32, [255/0], 00:01:13, static, tag 4294967295
70.0.0.36/30, ubest/mbest: 1/0, attached, direct
*via 70.0.0.37, eth1/32.32, [1/0], 00:01:13, direct
70.0.0.37/32, ubest/mbest: 1/0, attached
*via 70.0.0.37, eth1/32.32, [1/0], 00:01:13, local, local
Regards
Clement
05-23-2017 08:42 AM
Hi Clement,
The spine did receive the sub-int IP via DHCP, but OSPF doesn't appear to be up. For OSPF to come up, the pod2-spine must receive some configuration from the APIC. To do this, it uses the static route to download a bootstrap file from the APIC which contains the l3 out configuration.
To see if the bootstrap configuration download has completed, log into pod2-spine and run cat /mit/sys/summary
spine# cat /mit/sys/summary
# System
address : x.x.x.x
bootstrapState : done / downloading-bootstrap-config / none
childAction :
configIssues :
...
After executing the command you'll want to focus on bootstrapState. If the field is marked done, then check the OSPF process and for OSPF neighbors in VRF overylay-1
If it says downloading-bootsrap-config, then it is likely failing to download the bootstrap file from the APIC. If this is the case then:
1) Verify the bootstrap file exists on APIC1. The file is located under the /firmware/fwrepos/fwrepo/boot/ directory.
admin@apic1:~> ls -l /firmware/fwrepos/fwrepo/boot/ | grep boot
-rw-r--r-- 1 root root 17344 May 19 11:47 bootstrap-201.xml
-rw-r--r-- 1 root root 17344 May 19 11:47 bootstrap-401.xml
-rw-r--r-- 1 root root 17345 May 19 11:47 bootstrap-601.xml
If this exists, then verify that apic 1 and pod2-spine have reachability
2) Test with ICMP between apic 1 and pod2-spine. Be sure to specify from the source interface of bond0.XXXX (XXXX = Infra VLAN) when pinging from apic to spine.
Example: admin@apic1:~> ping -I bond0.4000 70.0.0.37
If this fails, then you can verify that the spine receives the ICMP request from the APIC and replies by running a TCPdump on the spine. Be sure to specify the interface and filter for icmp traffic.
pod2-spine# tcpdump -i kpm_inb icmp
Jason
05-23-2017 07:58 PM
Hi Jason
On the spine, the bootstrapState is downloading-bootsrap-config.
Ping from APIC towards the spine's subinterface IP failed. But, when taking tcpdump -i kpm_inb icmp on the spine, I can see both ping requests and replies.
Pod2-Spine# tcpdump -i kpm_inb icmp
tcpdump: WARNING: kpm_inb: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on kpm_inb, link-type EN10MB (Ethernet), capture size 65535 bytes
10:26:16.316628 IP 50.0.128.1 > 70.0.0.37: ICMP echo request, id 13455, seq 156, length 64
10:26:16.316718 IP 70.0.0.37 > 50.0.128.1: ICMP echo reply, id 13455, seq 156, length 64
10:26:17.320534 IP 50.0.128.1 > 70.0.0.37: ICMP echo request, id 13455, seq 157, length 64
10:26:17.320604 IP 70.0.0.37 > 50.0.128.1: ICMP echo reply, id 13455, seq 157, length 64
10:26:18.320586 IP 50.0.128.1 > 70.0.0.37: ICMP echo request, id 13455, seq 158, length 64
10:26:18.320674 IP 70.0.0.37 > 50.0.128.1: ICMP echo reply, id 13455, seq 158, length 64
But when I do show interface eth1/32 on the spine, I did not see any TX packets count increments while the RX packets count kept incrementing due to the ICMP requests. No idea why the ping replies are not sent down the wire.
Ethernet1/32 is up
admin state is up, Dedicated Interface
Hardware: 10000/100000/40000 Ethernet, address: 0000.0000.0000 (bia 00f6.639d.d2f7)
MTU 9150 bytes, BW 40000000 Kbit, DLY 1 usec
reliability 255/255, txload 1/255, rxload 1/255
Encapsulation ARPA, medium is broadcast
Port mode is routed
full-duplex, 40 Gb/s, media type is 40G
FEC (forward-error-correction) : disable-fec
Beacon is turned off
Auto-Negotiation is turned on
Input flow-control is off, output flow-control is off
Auto-mdix is turned off
Rate mode is dedicated
Switchport monitor is off
EtherType is 0x8100
EEE (efficient-ethernet) : n/a
Last link flapped 21:57:18
Last clearing of "show interface" counters 00:00:23
1 interface resets
30 seconds input rate 0 bits/sec, 0 packets/sec
30 seconds output rate 0 bits/sec, 0 packets/sec
Load-Interval #2: 5 minute (300 seconds)
input rate 0 bps, 0 pps; output rate 0 bps, 0 pps
L3 in Switched:
ucast: 0 pkts, 0 bytes - mcast: 0 pkts, 0 bytes
L3 out Switched:
ucast: 0 pkts, 0 bytes - mcast: 0 pkts, 0 bytes
RX
32 unicast packets 6 multicast packets 0 broadcast packets
38 input packets 4354 bytes
0 jumbo packets 0 storm suppression bytes
0 runts 0 giants 0 CRC 0 no buffer
0 input error 0 short frame 0 overrun 0 underrun 0 ignored
0 watchdog 0 bad etype drop 0 bad proto drop 0 if down drop
0 input with dribble 0 input discard
0 Rx pause
TX
0 unicast packets 1 multicast packets 0 broadcast packets
1 output packets 291 bytes
0 jumbo packets
0 output error 0 collision 0 deferred 0 late collision
0 lost carrier 0 no carrier 0 babble 0 output discard
0 Tx pause
05-24-2017 07:10 PM
Hello,
Would you mind checking one thing? Go to...
-Fabric>Topology
-Under the "inter-pod network" box select "config"
-verify that there are two subnets defined under your "Fabric External Routing Profile". These subnets should exactly match the networks that your IPN devices are using to peer with your pod1 and pod2 spines. You can't use a single subnet that encompasses all of your IPN peerings.
EDIT:
I'm looking at the pic you submitted that includes the fabric external routing profile. It looks like you only have one subnet defined here (/16) rather than two subnets matching exactly your IPN peerings. Once this is changed you should at least have one problem resolved :)
05-25-2017 02:55 AM
05-25-2017 08:22 AM
Could you send the output of the follow commands from the pod2-spine?
show module
show lldp neighbor
show ip arp
05-25-2017 06:04 PM
There you go...
Pod2-Spine1# show module
Mod Ports Module-Type Model Status
--- ----- ----------------------------------- ------------------ ----------
1 32 32p 40/100G Ethernet Module N9K-X9732C-EX ok
22 0 Fabric Module N9K-C9504-FM-E ok
23 0 Fabric Module N9K-C9504-FM-E ok
24 0 Fabric Module N9K-C9504-FM-E ok
26 0 Fabric Module N9K-C9504-FM-E ok
27 0 Supervisor Module N9K-SUP-A active
29 0 System Controller N9K-SC-A active
30 0 System Controller N9K-SC-A standby
Mod Sw Hw
--- -------------- ------
1 12.2(1o) 1.4
22 12.2(1o) 1.0
23 12.2(1o) 1.0
24 12.2(1o) 1.0
26 12.2(1o) 1.0
27 12.2(1o) 1.0
29 12.2(1o) 1.6
30 12.2(1o) 1.6
Mod MAC-Address(es) Serial-Num
--- -------------------------------------- ----------
1 00-f6-63-9d-d2-d8 to 00-f6-63-9d-d3-5d FOC211017R4
22 00-00-00-00-00-00 to 00-00-00-00-00-00 FOC210501ZM
23 00-00-00-00-00-00 to 00-00-00-00-00-00 FOC21070SBV
24 00-00-00-00-00-00 to 00-00-00-00-00-00 FOC21070S9D
26 00-00-00-00-00-00 to 00-00-00-00-00-00 FOC21030C93
27 e0-0e-da-36-bd-16 to e0-0e-da-36-bd-28 SAL2037UYTF
29 00-00-00-00-00-00 to 00-00-00-00-00-00 SAL2039V9FQ
30 00-00-00-00-00-00 to 00-00-00-00-00-00 SAL2039V9NC
Mod Online Diag Status
--- ------------------
1 pass
22 pass
23 pass
24 pass
26 pass
27 pass
29 pass
30 pass
Pod2-Spine1# show lldp neighbors
Capability codes:
(R) Router, (B) Bridge, (T) Telephone, (C) DOCSIS Cable Device
(W) WLAN Access Point, (P) Repeater, (S) Station, (O) Other
Device ID Local Intf Hold-time Capability Port ID
N5K-IPN Eth1/32 120 B Eth2/1
Total entries displayed: 1
Pod2-Spine1# show ip arp
Flags: * - Adjacencies learnt on non-active FHRP router
+ - Adjacencies synced via CFSoE
# - Adjacencies Throttled for Glean
D - Static Adjacencies attached to down interface
IP ARP Table for all contexts
Total number of entries: 1
Address Age MAC Address Interface
70.0.0.38 00:11:33 8c60.4fc1.8e01 eth1/32.32
05-26-2017 08:42 AM
Could you try pinging from 70.0.0.38 to 70.0.0.37?
Could you also try the following?
- Create a loopback on N5K-IPN
- Add the loopback to the multipod VRF
- Assign the new loopback 50.0.128.1/32 (duplicate the APIC1 TEP)
- Ping 70.0.0.37 sourcing from the new loopback address
05-26-2017 09:29 AM
N5K-IPN# ping 70.0.0.37 source 70.0.0.38 vrf fabric-mpod
PING 70.0.0.37 (70.0.0.37) from 70.0.0.38: 56 data bytes
Request 0 timed out
Request 1 timed out
Request 2 timed out
Request 3 timed out
Request 4 timed out
N5K-IPN# ping 70.0.0.37 source 50.0.128.1 vrf fabric-mpod
PING 70.0.0.37 (70.0.0.37) from 50.0.128.1: 56 data bytes
Request 0 timed out
Request 1 timed out
Request 2 timed out
Request 3 timed out
Request 4 timed out
05-26-2017 11:45 AM
Clement,
Could you also connect a leaf into the spine?
This might not resolve the issue, but it is a requirement for pod spine bringup.
If the issue persists after connecting a leaf and forming LLDP neighbors, then I would recommend opening a TAC service request to further troubleshoot the forwarding behavior within the spine.
Once you come to resolution, it would be great to update this thread with the solution to benefit the support community.
Jason
05-31-2017 07:32 AM
Hi Jason
Great news. After I added a leaf behind Pod2-Spine1, the bootstrap process was proceeded successfully. Now, the TEP IP for the new spine & leaf have been assigned. Just one side question, I still have the newly brought up switches flipping between 'active' and 'inactive' states, what are the possible causes ? Moreover, even after I got the TEP IP assigned to the leaf/spine on Pod2, Pod2 is still not shown on the navigation pane under Fabric|Inventory of the APIC GUI (refer to attached screen-capture).
APIC# acidiag fnvread
ID Pod ID Name Serial Number IP Address Role State LastUpdMsgId
--------------------------------------------------------------------------------------------------------------
....
....
401 2 Pod2-Leaf1 FDO210314UT 60.0.180.94/32 leaf active 0x4000001ca4e89
....
903 2 Pod2-Spine1 FOX2036G83U 60.0.180.95/32 spine active 0x4000001ca4e8a
APIC# acidiag fnvread
ID Pod ID Name Serial Number IP Address Role State LastUpdMsgId
--------------------------------------------------------------------------------------------------------------
...
...
401 2 Pod2-Leaf1 FDO210314UT 60.0.180.94/32 leaf inactive 0x5000001c56181
...
903 2 Pod2-Spine1 FOX2036G83U 60.0.180.95/32 spine inactive 0x5000001c56182
APIC# acidiag fnvread
ID Pod ID Name Serial Number IP Address Role State LastUpdMsgId
--------------------------------------------------------------------------------------------------------------
...
...
401 2 Pod2-Leaf1 FDO210314UT 60.0.180.94/32 leaf active 0x5000001c56181
...
903 2 Pod2-Spine1 FOX2036G83U 60.0.180.95/32 spine active 0x5000001c56182
Regards
Clement
05-31-2017 08:44 AM
Could you try sending a continuous ping from APIC 1 to the spine TEP? Let it run for at least a minute
ping 60.0.180.95 -c 100
Do the pings go through 100% or do you notice any intermittent drops?
If there are drops occurring, then could you upload the full topology of the multi-pod network? Include spines and everything in between.
If no drops occur, then check the date on all devices by executing the date command in CLI. If the times are off by more than a few hours, then that could be an issue.
Lets also verify the version on the APIC, pod2-spine, and pod2-leaf using the show version command on each device. It's ideal to have them all on the same image.
Jason
05-31-2017 07:32 PM
Hi
The root cause has been identified. It was due to the residual configuration on the IPN N5K in above troubleshooting step that created a loopback interface with the same IP address 50.0.128.1 of the APIC. Once the interface was removed, there was no more flipping of the node state, and Pod2 finally appeared on the APIC GUI.
In summary, ACI's behaviour when we attempts to bring up a new spine on Pod2 without an attached leaf is that all IP traffic (icmp, tftp, etc...) will not be allowed to get transmitted upstream. Only certain ethernet traffic such as LLDP, DHCP are allowed.
Thank you very much for your help!!
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide