cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
2529
Views
6
Helpful
3
Replies

DHCP relay broken - C9500-32c StackWise Virtual

rwesel
Level 1
Level 1

We have 2 C9500-32C in StackWise Virtual Mode with about 20 SVIs acting as router.
To serve DHCP for clients in these VLANs we configured ip-helper addresses in every SVI.
It all worked very well.
But for some time now it seems that sometimes the DHCP Relay does not forward the broadcasts of the clients in a certain VLAN or forward them with a delay.

This morning we rebooted a client 10~20 times and DHCP Relay worked well. But suddenly it is broken.
Sometimes clients broadcasts reach the relay and it replies the DHCPOFFER to the client (see below), but mostly nothing happens on the relay.

It seems that there is a table or cache which runs full and preventing further packets from being processed.

How can we debug this? Better, how can we fix this?

 

# C9500-32C:
Cisco IOS Software [Amsterdam], Catalyst L3 Switch Software (CAT9K_IOSXE), Version 17.3.3, RELEASE SOFTWARE (fc7)


# Affected VLAN interface:

interface Vlan94 
ip address 10.73.94.1 255.255.254.0
ip helper-address 10.73.120.15

# Affected Client Interface (on a different switch)

interface GigabitEthernet4/12
switchport access vlan 94
spanning-tree portfast edge

This is what we tried

## Switchport
# Reboot PC

062473: Jul 9 13:49:35: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet4/12, changed state to down
062474: Jul 9 13:49:36: %LINK-3-UPDOWN: Interface GigabitEthernet4/12, changed state to down
# PC comes up
062475: Jul 9 13:49:43: %LINK-3-UPDOWN: Interface GigabitEthernet4/12, changed state to up
062476: Jul 9 13:49:44: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet4/12, changed state to up
# PXE boot starts
062477: Jul 9 13:49:46: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet4/12, changed state to down
062478: Jul 9 13:49:47: %LINK-3-UPDOWN: Interface GigabitEthernet4/12, changed state to down
062479: Jul 9 13:49:51: %LINK-3-UPDOWN: Interface GigabitEthernet4/12, changed state to up
062480: Jul 9 13:49:52: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet4/12, changed state to up

## DHCP Relay (C9500-32C)

relay#show debug
DHCP server packet debugging is on.
DHCP server packet detail debugging is on.
UDP:
UDP packet debugging is on
DHCPC:
DHCP client activity debugging is on (detailed)

# Console Output (sometimes there is this output but mostly there is nothing)

Jul 9 13:49:49: UDP: rcvd src=0.0.0.0(68), dst=255.255.255.255(67), length=355
Jul 9 13:49:49: Option 82 not present
Jul 9 13:49:49: DHCPD: tableid for 10.73.94.1 on Vlan94 is 0
Jul 9 13:49:49: DHCPD: client's VPN is .
Jul 9 13:49:49: DHCPD: No option 125
Jul 9 13:49:49: DHCPD: Option 125 not present in the msg.
Jul 9 13:49:49: Option 82 not present
Jul 9 13:49:49: Option 82 not present
Jul 9 13:49:49: DHCPD: Option 125 not present in the msg.
Jul 9 13:49:49: DHCPD: Looking up binding using address 10.73.94.1
Jul 9 13:49:49: DHCPD: setting giaddr to 10.73.94.1.
Jul 9 13:49:49: DHCPD: adding relay information option.
Jul 9 13:49:49: DHCPD: relay information option content (add/replace):
Jul 9 13:49:49: DHCPD: 520801060004005e0965
Jul 9 13:49:49: UDP: sent src=10.73.94.1(67), dst=10.73.120.15(67), length=365
Jul 9 13:49:49: DHCPD: BOOTREQUEST from 1866.da1c.c6f8 forwarded to 10.73.120.15.
Jul 9 13:49:49: UDP: rcvd src=10.73.120.190(67), dst=10.73.94.1(67), length=336
Jul 9 13:49:49: DHCPD: tableid for 10.73.120.101 on Vlan60 is 0
Jul 9 13:49:49: DHCPD: client's VPN is .
Jul 9 13:49:49: DHCPD: No option 125
Jul 9 13:49:49: DHCPD: forwarding BOOTREPLY to client 1866.da1c.c6f8.
Jul 9 13:49:49: DHCPD: validating relay information option.
Jul 9 13:49:49: DHCPD: Option82 is currently:
Jul 9 13:49:49: 01060004005e0965
Jul 9 13:49:49: DHCPD: Removing option82 information
Jul 9 13:49:49: DHCPD: relay information option removed
Jul 9 13:49:49: DHCPD: Option82 is removed
Jul 9 13:49:49: DHCPD: Option 125 not present in the msg.
Jul 9 13:49:49: DHCPD: egress Interfce Vlan94
Jul 9 13:49:49: DHCPD: broadcasting BOOTREPLY to client 1866.da1c.c6f8.
Jul 9 13:49:49: UDP: sent src=0.0.0.0(67), dst=255.255.255.255(68), length=326


## ISC DHCP server log (in this case the packets reach the DHCP server)

Jul 9 13:49:49 do dhcpd[5961]: DHCPDISCOVER from 18:66:da:1c:c6:f8 via 10.73.94.1
Jul 9 13:49:49 do dhcpd[5961]: DHCPOFFER on 10.73.94.61 to 18:66:da:1c:c6:f8 via 10.73.94.1

 

Thank you for help.

Stefan

1 Accepted Solution

Accepted Solutions

After a long time, I would like to share my findings on this problem.

With the help of Cisco TAC we found out that the queue "ICMP Redirect" was overloaded. This queue is also used by the DHCP relay feature.

You can check by observing the counter "Queue Drop" in the output of "show platform hardware fed switch active qos queue stats internal cpu policer". It should not increase.

 

We applied "no ip redirects" to every L3 interface and the drops disappeared. And the DHCP relay is working fine now.

 

Thanks all for replies.

View solution in original post

3 Replies 3

Leo Laohoo
Hall of Fame
Hall of Fame

Take a pick:  CSCvs15759, CSCvi39202 or CSCvk16813. 

Alternatively, upgrade to 17.3.4 (released last week).

rwesel
Level 1
Level 1

Thanks, I checked these Bugs.

CSCvs15759, I think is not relevant here.

CSCvi39202 is possible because we have dhcp snooping enabled. But only on the access switch where the clients are connected. (we have access switch --- distribution switch --- router with dhcp relay --- distribution switch --- vmware --- ISC DHCP server) Uplinks are a port-channel and trusted. I will check if toggle trusted (as recommended in bug description) will help.

or CSCvk16813 ... recommended workaround ist not possible - we need port-channels. 

 

To fix this issue we upgraded Router IOS (16.12.1 => 16.12.5b => 17.3.3). It didn't help. To have a quick solution we configured an interface on our dhcp server in that affected vlan. Without dhcp relay it is now working well. But this cannot be permanent.

 

We will give Amsterdam-17.3.4 a try. Bengaluru-17.5.1 is also provided for download. Has somebody experiences with this latest versions?

 

Stefan

After a long time, I would like to share my findings on this problem.

With the help of Cisco TAC we found out that the queue "ICMP Redirect" was overloaded. This queue is also used by the DHCP relay feature.

You can check by observing the counter "Queue Drop" in the output of "show platform hardware fed switch active qos queue stats internal cpu policer". It should not increase.

 

We applied "no ip redirects" to every L3 interface and the drops disappeared. And the DHCP relay is working fine now.

 

Thanks all for replies.

Review Cisco Networking products for a $25 gift card