We are having nothing but trouble with PXE/WDS imaging through a newly installed 3850. Before we installed this switch, we were running our office off a 2921 with a 24port switch module installed. PXE worked just fine with this arrangement (our PXE server is in another bldg on a separate subnet so we were routing PXE boot installs just fine).
When we installed the 3850, we copied over the applicable configurations from the 2921 and 24P Sw module and added them to the 3850. Everything worked great (data vlan, voice vlan, management) except for PXE booting. The PXE clients would grab an IP, the IP of the PXE server and the name of the file that they were supposed to download and then......timeout. We would move our uplink back to the old 2921 arrangement and PXE boots would zoom right through.
Here's a list of things that we tried:
Tried several different IOS versions. Currently on 16.9.
Ensured ip helper-address was config'd for DHCP and PXE servers on the layer 3 vlan int (although only DHCP helper was config on the 2921).
Changed MTU sizes from default down to 1400
Changed TFTP blocksize from default to a variety of options (also on the switch the server is connected to)
Created an ACL specifically to allow all TFTP traffic on the Vlan.
Wiped 3850 and reconfigured multiple times. Took off all unnecessary configs except what was needed for connectivity to the network.
Tried a different 3850.
Verified DHCP server configuration. Options 66 and 67 were config'd correctly, but we took them off anyway and tried. No dice. So we reconfigured them with PXE info (nothing changed, just deleted and readded the same info on 66/67).
Looked at the PXE server itself and found nothing of note except that it would show that a TFTP session failed whenever a PXE client would request the file for download. But you could plug the 2921 back in and change nothing else, and the PXE file download would complete to the client.
We worked on this issue for a couple of weeks, as time would allow, and could find absolutely nothing wrong with anything in our configs. So we finally gave up and tried a 3560-CX to see if it would allow us to PXE image. It did. Same exact configs as we used on the 3850. Minds=blown.
For now we are using the 3560 as our Layer 3 device and trunking the 3850 to it. We use the 5 or 6 ports on the 3560 for imaging and the 3850 just for phone/data connections.
Does anyone have any ideas what is going on here? Is there anything about the 3850 that would cause the PXE process to fail? It seems to allow the DHCP traffic through just fine, but when the client requests the file from the PXE server, the TFTP traffic seems to fail. A debug session showed nothing and we are not allowed to use a packet sniffer on this network.
Thank you for any and all help.
PS I can't post a sh run output here unfortunately. This part of the network is simple though: 3850 using OSFP to reach a 4507 directly connected to PXE server (virtual).
keep in mind the 2921 is a router with a switch module the 3850 is a switch with layer-3 capabilities
what image/license are you running on the 3850 (ipbase?, ipservices?)
100% tracking that the devices are different, but does that mean that the 3850 is not capable of supporting PXE/WDS imaging? I just can't see how that can be but maybe it is. All other services are working great.
Thank you for your reply.
can you share some configuration info? it's only guessing now.
I also guess the 3560CX did not have ios version 16.x.x?
and do you use dot1x ? or downloadable ACL's? -> behaviour can be different between platforms
specially after version 15.x , some features can be automatically enabled when some configuration is added to a port
like dot1x authentication automatically enables IP device tracking
I logged a TAC case that revealed behavior can be different by hardware platform
----- From 15.2 onwards, source IP to be "any" is mandated across all classic platforms (ex 2960, 3750 etc)
----- but nova platforms (ex 3650, 3850 etc) would not check for source IP. Hence, we would not see this behavior in 3650
hence some platforms ignored the conflicting ACL, other rejected the ACL completely
and on higher version IP device tracking is enabled by default, because other functions use this in the background
read this document
Unfortunately I can't post the configs as this is on a classified network. And I know that hampers any help that can be provided.
The 3560 is running c3560cx-universalk9-mz.152-4.E2, license ipservices.
Not running dot1x or downloadable ACLs.
I think you're right about opening a TAC case...unfortunately I don't have the rights to do it, but I feel it is the only way this is going to be resolved. Thanks for your replies, truly.
If you are not allowed to post configuration details then it becomes very difficult for us to understand the issue. As a start can you provide some clarification about the topology you describe as " 3850 using OSFP to reach a 4507 directly connected to PXE server (virtual)."
Can we assume that the vlans configured on 3850 are exactly the same as on 2921? And assume that routing logic is exactly the same? Where is the DHCP server that provides addresses etc to PCs? Is there a helper address for the DHCP server and a separate helper address for PXE server? If you put a PC on the vlan where the clients are having problems with PXE and configure it with appropriate IP address, mask, and gateway, is the PC able to ping the PXE server?
Yes unfortunately the network is classified and I can't post it here. I thought about transcribing some of the 'sh run' output, but the possibility that i would change something or leave something out that might be key to the solution is pretty high, and doesn't seem worth the effort.
I was hoping that someone would reply with a simple solution (just turn on 'this' or configure 'that') that was specific to the 3850s but it doesn't seem to be the case. It seems as though the best course is to open a TAC case, but I don't have the necessary rights to do that for this organization, yet.
To your questions: yes the vlans are configured the same on the 3850 as the 2921, routing/routes are the same, and the DHCP server sits on the same subnet as the PXE server. We have a helper address for the DHCP server and played around with adding a second helper for PXE, then took them all off, then added them back, took off options, added options, changed options etc. PCs can ping the PXE all day long on the same vlan. Everything works...except for the TFTP session from PXE to the client. Multiple folks have wiped, configured and texted these configs through the 3850 so I'm fairly confident that it isn't one specific thing that I'm missing or doing incorrectly. If it is, it's all of us doing it!
Thank you Rick
Thanks for the additional information. I am a bit puzzled at this statement "PCs can ping the PXE all day long on the same vlan". It was my understanding from the original post that PXE was in a different subnet, which sort of implies in a different vlan. Can you clarify?
I would have thought that if you have a helper-address for DHCP that you would also need one for PXE. But if the PXE server is showing that tftp sessions are attempted then it sure seems like requests from the clients are being received. Can you verify that traffic initiated from the PXE server to the clients is successful?
Does DNS resolution work for clients connected to the 3850?