cancel
Showing results for 
Search instead for 
Did you mean: 
cancel

Troubleshoot DHCP Issues

391
Views
0
Helpful
0
Comments

Description of the Issue

You do not get an IP address.

Topology for the troubleshooting flow

Refer to the following diagram for the troubleshooting flow:dhcp_troubleshooting_workflow.png

Possible causes

This section describes SD-Access specific troubleshooting. All the debugs and show commands from traditional DHCP troubleshooting would still apply here.

  1. DHCP Option 82 check
  2. DHCP Server Reachability issues
  3. DHCP Message Exchange not happening

DHCP Option 82

The architecture of SD-Access introduces a few design changes on how DHCP would operate in a fabric:

  • Anycast IP address is used from vLAN interface relaying the DHCP messages. In a traditional network, each access switch (edge node) will have a unique IP address for the L3 interface acting as relay-agent.
  • DHCP server location. The DHCP server would typically be located or reachable through the border routers as shown in the diagram.

Why is Option 82 necessary?

SD-Access fabric uses the anycast IP address for Gateway IP Address (GIADDR) in DHCP messages. This is the same IP address across all the edge switches in the fabric for a given VLAN. To identify the source Routing Locator ID (RLOC) where the host is connected the Option 82 field is required.

Here's a sample config from one of the edge switches:

 

interface Vlan1021
description Configured from apic-em
mac-address 0000.0c9f.f45c
vrf forwarding DEFAULT_VN
ip address 172.70.10.1 255.255.255.0 <<Anycast IP address on all edge switches>>
ip helper-address 192.168.1.101
no ip redirects
ip pim sparse-mode
ip route-cache same-interface
ip igmp version 3
no lisp mobility liveness test
lisp mobility 172_70_10_0-DEFAULT_VN
end
DHCP Option 82 is enabled on the edge node with the following command, which is pushed by DNAC: 

 

ip dhcp relay information option

The options can be verified from either Sniffer capture or debug DHCP messages.

What to look for in DHCP Option 82

Here's an example of Option 82 from a DHCP Discover packet.dhcp_option82_example.png

There are 2 sub-options in the Option 82 message, which help identify the end client requesting for an IP address:

  • Agent Circuit ID. This identifies the VLAN, Interface Module, and Port.
  • Agent Remote ID. This identifies LISP Instance ID, IPv4/IPv6, and Host MAC Address.

Decoding sub-option 1 from the above screen capture

Agent Circuit ID: 000403fd0117

 

00 - Sub Option 1

04 - Length of option

03fd - VLAN 1021

01 - Module 1

17 - Port 23 (0x17)

Decoding sub-option 2 from the above capture

Agent Remote ID: 030800100201c0a86445

03 - Sub-option for LISP

08 - Length of option

001002 - LISP instance ID 4098

01 - IPv4 Locator

c0a86445 - 192.168.100.69 [Source RLOC ID]

The option can be verified from either Sniffer capture or debug DHCP messages.

DHCP Server Reachability Issue

The following SD-Access fabric configurations are necessary to understand the communication between the DHCP Client and Server.

  • The DHCP server is usually located outside the fabric. The server should be reachable by the GIADDR, to send the DHCP messages (Offer and ACK) back to the client.
  • When the DHCP messages are received form the server on the border node, the messages have to be punted to the CPU for processing the option 82, and identifying the RLOC ID (Edge switch), to which the message will be forwarded. This is achieved by the following configurations that are pushed on the Border Node from Cisco DNA Center.

Border configuration

interface Loopback1021
description Loopback Border
vrf forwarding DEFAULT_VN
ip address 172.70.10.1 255.255.255.255
router bgp 65000
address-family ipv4 vrf DEFAULT_VN
network 172.70.10.1 mask 255.255.255.255
neighbor 172.16.10.54 remote-as 65004
exit-address-family

When the DHCP Discover packet is received from the client, if the LISP Map-cache does not have an entry for the server, then the map-request is sent towards the map server.

Following is a snippet of the LISP control plane debugs taken on the Edge, for reference:

010224: *Oct 12 10:30:22.890: DHCP_SNOOPING: process new DHCP packet, message type: DHCPDISCOVER, input interface: Gi1/0/23, MAC da: ffff.ffff.ffff, MAC sa: 0672.5a4c.0000, IP da: 255.255.255.255, IP sa: 0.0.0.0, DHCP ciaddr: 0.0.0.0, DHCP yiaddr: 0.0.0.0, DHCP siaddr: 0.0.0.0, DHCP giaddr: 0.0.0.0, DHCP chaddr: 0672.5a4c.0000, efp_id: -2072051712, vlan_id: 1021      

010231: *Oct 12 10:30:22.891: [XTR] LISP: Processing data signal for EID prefix IID 4098 192.168.1.101/32

010232: *Oct 12 10:30:22.891: [XTR] LISP-0: Remote EID IID 4098 prefix 192.168.1.101/32, Change state to incomplete (sources: <signal>, state: unknown, rlocs: 0).

010233: *Oct 12 10:30:22.891: [XTR] LISP-0: Remote EID IID 4098 prefix 192.168.1.101/32, [incomplete] Scheduling map requests delay 00:00:00 min_elapsed 00:00:01 (sources: <signal>, state: incomplete, rlocs: 0).

010236: *Oct 12 10:30:23.020: [XTR] LISP: Send map request for EID prefix IID 4098 192.168.1.101/32

010238: *Oct 12 10:30:23.020:       LISP-0: EID-AF IPv4, Sending map-request from 192.168.1.101 to 192.168.1.101 for EID 192.168.1.101/32, ITR-RLOCs 1, nonce 0xD5532B99-0xC6AC6FD0 (encap src 192.168.100.69, dst 192.168.101.8), FromPITR.

010242: *Oct 12 10:30:23.021: [XTR] LISP-0: Map Request IID 4098 prefix 192.168.1.101/32 remote EID prefix[LL], Received reply with rtt 1ms.

DHCP Server Reachability Checks

Verify the DHCP server has a map-cache entry on the edge node with the correct RLOC

BGL-FE-12#sh ip lisp map-cache 192.168.1.101 instance-id 4098
LISP IPv4 Mapping Cache for EID-table vrf DEFAULT_VN (IID 4098), 5 entries

192.168.1.0/24, uptime: 00:24:27, expires: 23:35:32, via map-reply, complete
Sources: map-reply
State: complete, last modified: 00:24:27, map-source: 192.168.100.65
Idle, Packets out: 5(2037 bytes) (~ 00:23:25 ago)
Locator Uptime State Pri/Wgt Encap-IID
192.168.100.65 00:24:27 up 10/10 -
Last up-down state change: 00:24:27, state change count: 1
Last route reachability change: 1w1d, state change count: 1
Last priority / weight change: never/never
RLOC-probing loc-status algorithm:
Last RLOC-probe sent: 00:24:27 (rtt 1ms)

Workaround: Make sure the proxy ITR is configured on the edge and with a 0/0 map-cache for the DHCP request to be sent in the Overlay. Refer LISP troubleshooting page, to debug why the map-cache entry is not present.

Check the reachability of the RLOC to reach the DHCP server from the underlay network

BGL-FE-12#sh ip route 192.168.100.65
Routing entry for 192.168.100.65/32
Known via "isis", distance 115, metric 20, type level-1
Redistributing via isis
Last update from 192.168.100.117 on FortyGigabitEthernet1/1/1, 1w1d ago
Routing Descriptor Blocks:
* 192.168.100.117, from 192.168.100.65, 1w1d ago, via FortyGigabitEthernet1/1/1
Route metric is 20, traffic share count is 1

BGL-FE-12#ping 192.168.100.65 
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.100.65, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/1/1 ms

Workaround: Verify underlay Routing protocol (For example IS-IS) to understand why the RLOC is not reachable.

Verify Server Reachability from the Border sourcing from anycast IP address

This step helps identify routing issues outside the fabric for reachability of the server.

9500-border-7#ping vrf DEFAULT_VN 192.168.1.101 source 172.70.10.1
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.1.101, timeout is 2 seconds:
Packet sent with a source address of 172.70.10.1 
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/1/1 ms

Workaround:

Check if the Anycast IP address is advertised to the Fusion Router/Shared services.

Fusion-Router#sh ip bgp vpnv4 vrf DEFAULT_VN 172.70.10.1/32
BGP routing table entry for 1:4098:172.70.10.1/32, version 3
Paths: (1 available, best #1, table DEFAULT_VN)
Advertised to update-groups:
116 
Refresh Epoch 1
65000
172.16.10.53 (via vrf DEFAULT_VN) from 172.16.10.53 (192.168.100.65)
Origin IGP, metric 0, localpref 100, valid, external, best
Extended Community: RT:1:4098
rx pathid: 0, tx pathid: 0x0

Server has proper routing configured for reachability to the Host pool EID Space.

PS C:\Users\Administrator> ROUTE PRINT 172.70.10*
========================================================================
Interface List
 14...00 0c 29 f4 2b bf ......Intel (R) 82574L Gigabit Network Connection #2
 12...00 0c 29 f4 2b b5 ......Intel (R) 82574L Gigabit Network Connection
  1...........................Software Loopback Interface 1
 13...00 00 00 00 00 00 00 e0 Microsoft ISATAP Adapter
 15...00 00 00 00 00 00 00 e0 Microsoft ISATAP Adapter #2
=======================================================================

IPv4 Route Table
=======================================================================
Active Routes:
Network Destination              Netmask              Gateway           Interface                  Metric
      172.70.10.0          255.255.255.0             192.168.1.1        192.168.1.101              11 

DHCP DORA

Another important troubleshooting step would be to take packet captures at various points in the fabric to understand the DHCP message flow. The Embedded Packet Capture functionality on Cisco devices provide the ability to perform packet captures. Here are a few points of packet capture and corresponding causes to troubleshoot.

A Ingress on Edge Node

The Discover packet is not received on the Edge node, check if the client has actually sent Discover.

In this case, troubleshoot on the client to understand why Discover was not sent. 

B Egress on Edge Node The Discover is received on the Edge and not sent out.
In this case, add platform debugging to see why packets are getting dropped.
C Ingress on the Border Node Verify whether the correct Option 82 values are used when the packet is received on the Border. If Discover is not received on the Border, check the intermediate IP network for drops or reachability.
D Egress on the Border Node The Discover is not sent out of the Border.
E On the DHCP Server The packet is not received on the Server. Debugging needs to be done outside the fabric, in this case, and on shared services.

 

Solution

Basic checks should include the following:

  1. Check the adapter settings on the host. Make sure that the "Obtain Ip address automatically" and "Obtain DNS server address automatically" options are selected.
  2. Verify whether the IP address pool (for the host "vrf") has been created on the DHCP server.
  3. Verify whether the DHCP server runs out of addresses to lease for that particular IP pool.

Pre-LISP

In a traditional network, the IP address of the interface that the DHCP Discover message discovers is used to set the Relay Address (giaddress) when being forwarded by the relay towards the DHCP server.

Post-LISP

The Fabric Edge/xTR pushed config:

interface Vlan1021

 description Configured from apic-em

 mac-address 0000.0c9f.f45c

 vrf forwarding DEFAULT_VN

 ip address 172.70.10.1 255.255.255.0 << This would be the same any cast address used across all fabric edge switches >>

 ip helper-address 192.168.1.101

 no ip redirects

 ip pim sparse-mode

 ip route-cache same-interface

 ip igmp version 3

 no lisp mobility liveness test

 lisp mobility 172_70_10_0-DEFAULT_VN

With the LISP architecture, an IP Anycast is used. In other words, every Fabric Edge uses an Anycast IP Address.

Serviceability recommendations for Cisco DNA Center and the Switch

  • Regular Heartbeat (IP SLA) messages from Edge for DHCP Server reachability
  • DHCP Packet trace from Host to Server
CreatePlease to create content
Content for Community-Ad
July's Community Spotlight Awards