Solved: DHCP not working on default VLAN1 but works on other VLANS - Page 2

mlord · ‎07-28-2023

Hello,

I've spent about 2 weeks reading threads on this subject but I've yet to find what solves my issue. We recently have an issue where our VLAN1 stopped doing DHCP, well, not entirely. Sometimes it takes a long time for an IP to be issued, or not at all. And sometimes it issues an IP that's already in use even though most of the IP addresses are available. The configurations we're using have been in use for years; we've had this problem maybe twice in the past but a core stack reboot would typically clear it up. We deleted exclusions (bad idea) as I assumed having reservations was sort of the same thing; they've been re-added so no more conflicts. My laptop is connected to a small switch, which is then connected to our Admin Core stack of 3 9300-48T's (16.8.1a). An ipconfig /renew does nothing but return an error it couldn't find our DHCP server. I'm trying to get an IP from our scope (192.168.0.1-254/24) from default VLAN1 (192.168.0.150) to our DHCP server VM (10.1.90.3). The DHCP Server VM is on a VMWare machine; this VMWare machine has two Nics occupied and each are a trunk with all VLAN's. Our Sonicwall Firewall is 192.168.0.1, our switch stack is 192.168.0.150 (if it even should be) and our DHCP server vm is 10.1.90.3.

I'm sure both switch stacks don't have matching configurations; these were set up prior to me and so with reverse engineering how these were set up, plus not being fully educated on cisco switches and managing them this has been an uphill climb. I appreciate any and all questions and insights. Thank you.

Peter Paluch · ‎07-30-2023

Hello,

This should not be needed in fact - according to https://www.cisco.com/c/en/us/td/docs/ios-xml/ios/ipapp/command/iap-cr-book/iap-i1.html#wp1776761080 , BOOTP/DHCP ports are enabled by default, among others. In addition, we know that the DHCP sometimes works in VLAN 1 and always works in other VLANs; if the forwarding of BOOTP/DHCP was disabled, DHCP would never work in any VLAN.

Best regards,
Peter

MHM Cisco World · ‎08-01-2023

sorry @Peter Paluch
can you confirm that when I add ip forward-protocol nd
it not effect default UDP port, I dont have device to try so please confirm that.
thanks

Peter Paluch · ‎08-01-2023

Hi MHM,

Configuring or removing the "ip forward-protocol nd" only affects the forwarding of the ancient Sun Network Disk protocol (it is IP protocol number 77, completely unrelated to UDP-based protocols). It won't affect the forwarding of any UDP-based broadcasting service.

Best regards,
Peter

MHM Cisco World · ‎08-01-2023

just want to confirm I dont have nd in my lab
thanks
MHM

Peter Paluch · ‎07-30-2023

Hello everyone,

Please allow me to join.

@mlord , according to your diagram, there are two ports from the core stack to the ESXi server where the DHCP server is running. The configuration on the core stack shows that those two ports - Gi2/0/17 and Gi3/0/17 - are individual trunk, not an EtherChannel of any sort.

That is okay but then my question is: How is the ESXi and the DHCP server using those two links? Is it configured in any sort of NIC teaming, or does it simply have two independent NICs... How does it make use of those two uplinks?

Best regards,
Peter

mlord · ‎07-31-2023

Hello Peter @Peter Paluch ,

Thank you for your reply. Inspecting the properties of the vSwitch it does look to have NIC Teaming set up. Our DHCP Server is labeled "HubDC1".

EDIT: I was looking at the running-configs for each core stack, as we have two different buildings on the same property. These running configurations don't match 100%. I noticed the below configuration. Should the "ip default-gateway" be 192.168.0.150? The IP of our Sonicwall Firewall is 192.168.0.1. I see the ip default-gateway on the News stack shows 192.168.0.150. Also, the IP Routes are different between stacks. Our Admin stack is the primary stack and is where the Sonicwall Firewall is physically networked. I'll add a screenshot of our Firewall too; I did a packet monitor and I see UDP traffic for ports 67/68 in Packet Monitor and I'm not sure if that's a good thing.

Admin stack (both stacks communicate via port-channel TenGig1/1/1 and TenGig1/1/2)

ip default-gateway 192.168.0.1
ip forward-protocol nd
ip http server
ip http authentication local
ip http secure-server
ip ftp username mlord
ip ftp password 7
ip route 0.0.0.0 0.0.0.0 192.168.0.1
ip route 10.10.51.0 255.255.255.0 10.10.52.1
ip route 10.10.53.0 255.255.255.0 10.10.52.1
ip route 10.154.0.0 255.255.0.0 63.246.204.157
ip route 192.168.80.0 255.255.255.0 10.10.21.10

News stack

ip default-gateway 192.168.0.150
ip forward-protocol nd
ip http server
ip http authentication local
ip http secure-server
ip route 0.0.0.0 0.0.0.0 192.168.0.150

MHM Cisco World · ‎07-31-2023

come ON we get it
FW drop UDP 67/68 which I mention before check it open the port. and check the DHCP.
why traffic go to FW and back that you need to check
first check ip routing is add to SW or not.

Peter Paluch · ‎07-31-2023

Hi @mlord ,

So there is some sort of NIC teaming enabled... To be honest, these types of configs that try to use multiple links on the server side but do not require a configuration on the switch side always make me cringe because there are so many things that can go wrong. The switch assumes a certain level of symmetry, for example - if it talks to a MAC address through port X, it expects that MAC address to respond back through that same port. If it doesn't, you'll have a MAC flap issue. Just to illustrate my point.

I'd like to ask you for an experiment here: Do you think you can temporarily shut down one of those two links from your Admin stack toward the ESXi and see if it has any impact on the DHCP functionality for clients in VLAN1? I am trying to narrow down a possible traffic blackholing issue due to the switch not being prepared to properly handle the way ESXi uses those two NICs. It's best to do this in off-business hours so that if there's any impact to production traffic, it is minimized.

Regarding the "ip default-gateway" - first of all, if the switch is configured with "ip routing", the "ip default-gateway" command is ignored completely. Its functionality is superseded by the routing table, and eventually, the default route there.

Now, should your default route point to 192.168.0.150? On the admin stack, certainly not because such a default route would point to the stack itself. The stack won't even allow you to configure it. On other switches - well, that kind-of depends. Clearly, the admin stack is connected to multiple networks, but the firewall also connects to another set of networks. Can you even have a universal default route here if each of these devices offers a connectivity to a different set of destination networks, and so you need to use each of them? This calls more for a routing protocol rather than a default route; the default route should in this case point toward the device connecting to the internet (if any).

I hope I haven't made this even more unclear - please feel welcome to ask further!

Best regards,
Peter

mlord · ‎08-01-2023

Good Morning Peter @Peter Paluch,

Again, thank you for the reply. I have quite a few windows of opportunity to bring of those trunk ports down, and what you say makes sense. I'll do that today. I don't suspect there will be an issue; the NIC teaming was set up likely only for redundancy. It's not needed necessarily considering the ESXi machine is 5 feet from the core stack.

I changed the ip default-gateway of the Admin stack yesterday to 192.168.0.150; though I must admit I didn't come across in any of my reading that it would be superseded by a default route so thank you for that information. The ip route of 192.168.0.1 (our Firewall) makes sense in that case. I also did "service dhcp" just to make sure as I don't see anything in the running-config confirming it was in use; but other VLAN's work fine so it must be.

I'm also no longer able to SSH into the News stack from this side over the port-channel. I was accessing Admin stack with 10.1.254.1 and News stack with 10.1.254.2; and I couldn't only accesses News stack while on our Wi-Fi which is vlan40. So confused by that one too.

Peter Paluch · ‎08-01-2023

Hey @mlord : )

Good morning there!

I have quite a few windows of opportunity to bring of those trunk ports down, and what you say makes sense. I'll do that today.

Perfect.

I must admit I didn't come across in any of my reading that it would be superseded by a default route so thank you for that information

Oh, that's perhaps because nowadays, most of our devices come with IP routing support and that one is usually enabled by default. Check this document: https://www.cisco.com/c/en/us/support/docs/ip/routing-information-protocol-rip/16448-default.html

I changed the ip default-gateway of the Admin stack yesterday to 192.168.0.150

Maybe I'm misunderstanding but I in fact advised against even using the "ip default-gateway", and I also advised against configuring either the "ip default-gateway" or the "ip route 0.0.0.0 0.0.0.0" on the Admin stack to 192.168.0.150 because that IP address is the Admin stack itself. Such a route doesn't make sense - it would bring the packets back to the stack instead of advancing them further to their destination.

I also did "service dhcp" just to make sure as I don't see anything in the running-config confirming it was in use

It is on by default. You can try running "show running-config all" to get the full output including the default settings.

I'm also no longer able to SSH into the News stack from this side over the port-channel. I was accessing Admin stack with 10.1.254.1 and News stack with 10.1.254.2

I'd approach this as a routing issue - hence, try to verify first whether you can ping the News stack from different sources, and if not, where does the traceroute stop. But perhaps we should focus on one thing at a time.

Best regards,
Peter

MHM Cisco World · ‎08-01-2023

Indeed it routing issue'

Case1

No ip routing

Here ip helper never work which as you mention before it work for other vlan only for vlan1 it not work

Case2

Ip routing

Here what we need to stop' you topolgy not clear but let summer what happened from my opinion (lol.. change modes)

Now you have two SW and each SW have routing enable and have SVI of vlan.

The SW receive the dhcp request have no SVI vlan of DHCP server' so it need defualt routing or (ip defualt route' this need to check in show ip route) to forward traffic to L3 device which can connect to dhcp server' this l3 device is FW.

Now are my opinion is totally true? No what make it need more check is traceroute BUT if you test traceroute (which I think what you do before) with pc connect to SW have SVI of vlan of DHCP then this test is wrong and need to done in pc connect to right SW.

Cheers...

MHM

mlord · ‎08-01-2023

@Peter Paluch ,

I'm in the processes of preparing to disconnect one of the trunk ports to the ESXi, so that will still be done. I left Wireshark open on both my PC and the DHCP server to see if I could follow mac addresses. I see Offers and ACK's leaving the DHCP server to:

90 00aa.6eaf.10e9 DYNAMIC Po1

So VLAN 90 on the etherchannel interface?

In Wireshark on my laptop I see Offers from source 192.168.0.150 as 1 00aa.6eaf.10c7 DYNAMIC Po1, so VLAN1 on port etherchannel. The etherchannel connects two different buildings here so I'm not sure why it would use that port?

Wireshark on my laptop shows

Peter Paluch · ‎08-01-2023

Hi @mlord ,

Remember that there is a DHCP relay between your DHCP client and server. So following the path of the frames is a good idea but remember that they will change on every routed hop.

So VLAN 90 on the etherchannel interface?

Yes, seems so; the OUI of 00aa6e is Cisco.

In Wireshark on my laptop I see Offers from source 192.168.0.150 as 1 00aa.6eaf.10c7 DYNAMIC Po1

Slow down; be precise in what is the source MAC and destination MAC you see in Wireshark first. It's hard to navigate when you only mention some MAC addresses without clear reference in what exact field you see them.

Best regards,
Peter

mlord · ‎08-01-2023

My apologies @Peter Paluch.

I removed vmnic5 from the nic team; only one vmnic is available now to that DHCP server (as well as others which is fine). That vmnic5 was also Gi2/0/17 on the core stack so I unplugged that and now shows as Down. I restarted DHCP Server services for good measure, did an ipconfig /release and /renew and my nic is still waiting in an "Unidentified Network" state.

I don't understand enough of the Cisco platform to ask the appropriate questions but I'm trying my best. I have Wireshark sorting traffic as "bootp". If my issue is indeed routing then I'm trying to figure out where the DHCP packets are going and why they take so long to get to their destination. The Wireshark image below is on my laptop attempting to obtain an IP from VLAN1. As I think about it more, the traffic I see here from Source 192.168.0.150 may in fact be clients on the other side of the port-channel trying to reach the DHCP server, so this is probably a non-issue given that.

Peter Paluch · ‎08-01-2023

Hey @mlord ,

No apologies needed from your side. I apologize if I reacted too harshly.

My reference to "the routing issue" was more about you not being able to talk to the News stack, not about the DHCP. Let's just forget that one for now.

I can see from the screenshot on your laptop that you send a series of DHCP Discovers that were not replied to. Did you at the same time run Wireshark on the DHCP server VM, too? If so, can you share them?

Ideally, we want to see two PCAPs done at the same time: On the client when it tries to acquire an IP address, and on the server that should offer that address.

And completely ideally, please attach those PCAPs as files here if it is okay by your security policy. I'd like to inspect the details of the packets there.

Thank you!

Best regards,
Peter