07-12-2021 05:35 AM
We have 2 C9500-32C in StackWise Virtual Mode with about 20 SVIs acting as router.
To serve DHCP for clients in these VLANs we configured ip-helper addresses in every SVI.
It all worked very well.
But for some time now it seems that sometimes the DHCP Relay does not forward the broadcasts of the clients in a certain VLAN or forward them with a delay.
This morning we rebooted a client 10~20 times and DHCP Relay worked well. But suddenly it is broken.
Sometimes clients broadcasts reach the relay and it replies the DHCPOFFER to the client (see below), but mostly nothing happens on the relay.
It seems that there is a table or cache which runs full and preventing further packets from being processed.
How can we debug this? Better, how can we fix this?
# C9500-32C:
Cisco IOS Software [Amsterdam], Catalyst L3 Switch Software (CAT9K_IOSXE), Version 17.3.3, RELEASE SOFTWARE (fc7)
# Affected VLAN interface:
interface Vlan94
ip address 10.73.94.1 255.255.254.0
ip helper-address 10.73.120.15
# Affected Client Interface (on a different switch)
interface GigabitEthernet4/12
switchport access vlan 94
spanning-tree portfast edge
This is what we tried
## Switchport
# Reboot PC
062473: Jul 9 13:49:35: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet4/12, changed state to down
062474: Jul 9 13:49:36: %LINK-3-UPDOWN: Interface GigabitEthernet4/12, changed state to down
# PC comes up
062475: Jul 9 13:49:43: %LINK-3-UPDOWN: Interface GigabitEthernet4/12, changed state to up
062476: Jul 9 13:49:44: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet4/12, changed state to up
# PXE boot starts
062477: Jul 9 13:49:46: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet4/12, changed state to down
062478: Jul 9 13:49:47: %LINK-3-UPDOWN: Interface GigabitEthernet4/12, changed state to down
062479: Jul 9 13:49:51: %LINK-3-UPDOWN: Interface GigabitEthernet4/12, changed state to up
062480: Jul 9 13:49:52: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet4/12, changed state to up
## DHCP Relay (C9500-32C)
relay#show debug
DHCP server packet debugging is on.
DHCP server packet detail debugging is on.
UDP:
UDP packet debugging is on
DHCPC:
DHCP client activity debugging is on (detailed)
# Console Output (sometimes there is this output but mostly there is nothing)
Jul 9 13:49:49: UDP: rcvd src=0.0.0.0(68), dst=255.255.255.255(67), length=355
Jul 9 13:49:49: Option 82 not present
Jul 9 13:49:49: DHCPD: tableid for 10.73.94.1 on Vlan94 is 0
Jul 9 13:49:49: DHCPD: client's VPN is .
Jul 9 13:49:49: DHCPD: No option 125
Jul 9 13:49:49: DHCPD: Option 125 not present in the msg.
Jul 9 13:49:49: Option 82 not present
Jul 9 13:49:49: Option 82 not present
Jul 9 13:49:49: DHCPD: Option 125 not present in the msg.
Jul 9 13:49:49: DHCPD: Looking up binding using address 10.73.94.1
Jul 9 13:49:49: DHCPD: setting giaddr to 10.73.94.1.
Jul 9 13:49:49: DHCPD: adding relay information option.
Jul 9 13:49:49: DHCPD: relay information option content (add/replace):
Jul 9 13:49:49: DHCPD: 520801060004005e0965
Jul 9 13:49:49: UDP: sent src=10.73.94.1(67), dst=10.73.120.15(67), length=365
Jul 9 13:49:49: DHCPD: BOOTREQUEST from 1866.da1c.c6f8 forwarded to 10.73.120.15.
Jul 9 13:49:49: UDP: rcvd src=10.73.120.190(67), dst=10.73.94.1(67), length=336
Jul 9 13:49:49: DHCPD: tableid for 10.73.120.101 on Vlan60 is 0
Jul 9 13:49:49: DHCPD: client's VPN is .
Jul 9 13:49:49: DHCPD: No option 125
Jul 9 13:49:49: DHCPD: forwarding BOOTREPLY to client 1866.da1c.c6f8.
Jul 9 13:49:49: DHCPD: validating relay information option.
Jul 9 13:49:49: DHCPD: Option82 is currently:
Jul 9 13:49:49: 01060004005e0965
Jul 9 13:49:49: DHCPD: Removing option82 information
Jul 9 13:49:49: DHCPD: relay information option removed
Jul 9 13:49:49: DHCPD: Option82 is removed
Jul 9 13:49:49: DHCPD: Option 125 not present in the msg.
Jul 9 13:49:49: DHCPD: egress Interfce Vlan94
Jul 9 13:49:49: DHCPD: broadcasting BOOTREPLY to client 1866.da1c.c6f8.
Jul 9 13:49:49: UDP: sent src=0.0.0.0(67), dst=255.255.255.255(68), length=326
## ISC DHCP server log (in this case the packets reach the DHCP server)
Jul 9 13:49:49 do dhcpd[5961]: DHCPDISCOVER from 18:66:da:1c:c6:f8 via 10.73.94.1
Jul 9 13:49:49 do dhcpd[5961]: DHCPOFFER on 10.73.94.61 to 18:66:da:1c:c6:f8 via 10.73.94.1
Thank you for help.
Stefan
Solved! Go to Solution.
11-08-2021 05:50 AM - edited 11-08-2021 07:01 AM
After a long time, I would like to share my findings on this problem.
With the help of Cisco TAC we found out that the queue "ICMP Redirect" was overloaded. This queue is also used by the DHCP relay feature.
You can check by observing the counter "Queue Drop" in the output of "show platform hardware fed switch active qos queue stats internal cpu policer". It should not increase.
We applied "no ip redirects" to every L3 interface and the drops disappeared. And the DHCP relay is working fine now.
Thanks all for replies.
07-12-2021 05:47 AM
Take a pick: CSCvs15759, CSCvi39202 or CSCvk16813.
Alternatively, upgrade to 17.3.4 (released last week).
07-13-2021 04:26 AM
Thanks, I checked these Bugs.
CSCvs15759, I think is not relevant here.
CSCvi39202 is possible because we have dhcp snooping enabled. But only on the access switch where the clients are connected. (we have access switch --- distribution switch --- router with dhcp relay --- distribution switch --- vmware --- ISC DHCP server) Uplinks are a port-channel and trusted. I will check if toggle trusted (as recommended in bug description) will help.
or CSCvk16813 ... recommended workaround ist not possible - we need port-channels.
To fix this issue we upgraded Router IOS (16.12.1 => 16.12.5b => 17.3.3). It didn't help. To have a quick solution we configured an interface on our dhcp server in that affected vlan. Without dhcp relay it is now working well. But this cannot be permanent.
We will give Amsterdam-17.3.4 a try. Bengaluru-17.5.1 is also provided for download. Has somebody experiences with this latest versions?
Stefan
11-08-2021 05:50 AM - edited 11-08-2021 07:01 AM
After a long time, I would like to share my findings on this problem.
With the help of Cisco TAC we found out that the queue "ICMP Redirect" was overloaded. This queue is also used by the DHCP relay feature.
You can check by observing the counter "Queue Drop" in the output of "show platform hardware fed switch active qos queue stats internal cpu policer". It should not increase.
We applied "no ip redirects" to every L3 interface and the drops disappeared. And the DHCP relay is working fine now.
Thanks all for replies.
05-15-2024 12:56 PM
Having the same issue I believe on a C9500 stack. We are on 17.9.6
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide