We are having nothing but trouble with PXE/WDS imaging through a newly installed 3850. Before we installed this switch, we were running our office off a 2921 with a 24port switch module installed. PXE worked just fine with this arrangement (our PXE server is in another bldg on a separate subnet so we were routing PXE boot installs just fine).
When we installed the 3850, we copied over the applicable configurations from the 2921 and 24P Sw module and added them to the 3850. Everything worked great (data vlan, voice vlan, management) except for PXE booting. The PXE clients would grab an IP, the IP of the PXE server and the name of the file that they were supposed to download and then......timeout. We would move our uplink back to the old 2921 arrangement and PXE boots would zoom right through.
Here's a list of things that we tried:
Tried several different IOS versions. Currently on 16.9.
Ensured ip helper-address was config'd for DHCP and PXE servers on the layer 3 vlan int (although only DHCP helper was config on the 2921).
Changed MTU sizes from default down to 1400
Changed TFTP blocksize from default to a variety of options (also on the switch the server is connected to)
Created an ACL specifically to allow all TFTP traffic on the Vlan.
Wiped 3850 and reconfigured multiple times. Took off all unnecessary configs except what was needed for connectivity to the network.
Tried a different 3850.
Verified DHCP server configuration. Options 66 and 67 were config'd correctly, but we took them off anyway and tried. No dice. So we reconfigured them with PXE info (nothing changed, just deleted and readded the same info on 66/67).
Looked at the PXE server itself and found nothing of note except that it would show that a TFTP session failed whenever a PXE client would request the file for download. But you could plug the 2921 back in and change nothing else, and the PXE file download would complete to the client.
We worked on this issue for a couple of weeks, as time would allow, and could find absolutely nothing wrong with anything in our configs. So we finally gave up and tried a 3560-CX to see if it would allow us to PXE image. It did. Same exact configs as we used on the 3850. Minds=blown.
For now we are using the 3560 as our Layer 3 device and trunking the 3850 to it. We use the 5 or 6 ports on the 3560 for imaging and the 3850 just for phone/data connections.
Does anyone have any ideas what is going on here? Is there anything about the 3850 that would cause the PXE process to fail? It seems to allow the DHCP traffic through just fine, but when the client requests the file from the PXE server, the TFTP traffic seems to fail. A debug session showed nothing and we are not allowed to use a packet sniffer on this network.
Thank you for any and all help.
PS I can't post a sh run output here unfortunately. This part of the network is simple though: 3850 using OSFP to reach a 4507 directly connected to PXE server (virtual).
keep in mind the 2921 is a router with a switch module the 3850 is a switch with layer-3 capabilities
what image/license are you running on the 3850 (ipbase?, ipservices?)
100% tracking that the devices are different, but does that mean that the 3850 is not capable of supporting PXE/WDS imaging? I just can't see how that can be but maybe it is. All other services are working great.
Thank you for your reply.
can you share some configuration info? it's only guessing now.
I also guess the 3560CX did not have ios version 16.x.x?
and do you use dot1x ? or downloadable ACL's? -> behaviour can be different between platforms
specially after version 15.x , some features can be automatically enabled when some configuration is added to a port
like dot1x authentication automatically enables IP device tracking
I logged a TAC case that revealed behavior can be different by hardware platform
----- From 15.2 onwards, source IP to be "any" is mandated across all classic platforms (ex 2960, 3750 etc)
----- but nova platforms (ex 3650, 3850 etc) would not check for source IP. Hence, we would not see this behavior in 3650
hence some platforms ignored the conflicting ACL, other rejected the ACL completely
and on higher version IP device tracking is enabled by default, because other functions use this in the background
read this document
Unfortunately I can't post the configs as this is on a classified network. And I know that hampers any help that can be provided.
The 3560 is running c3560cx-universalk9-mz.152-4.E2, license ipservices.
Not running dot1x or downloadable ACLs.
I think you're right about opening a TAC case...unfortunately I don't have the rights to do it, but I feel it is the only way this is going to be resolved. Thanks for your replies, truly.
If you are not allowed to post configuration details then it becomes very difficult for us to understand the issue. As a start can you provide some clarification about the topology you describe as " 3850 using OSFP to reach a 4507 directly connected to PXE server (virtual)."
Can we assume that the vlans configured on 3850 are exactly the same as on 2921? And assume that routing logic is exactly the same? Where is the DHCP server that provides addresses etc to PCs? Is there a helper address for the DHCP server and a separate helper address for PXE server? If you put a PC on the vlan where the clients are having problems with PXE and configure it with appropriate IP address, mask, and gateway, is the PC able to ping the PXE server?
Yes unfortunately the network is classified and I can't post it here. I thought about transcribing some of the 'sh run' output, but the possibility that i would change something or leave something out that might be key to the solution is pretty high, and doesn't seem worth the effort.
I was hoping that someone would reply with a simple solution (just turn on 'this' or configure 'that') that was specific to the 3850s but it doesn't seem to be the case. It seems as though the best course is to open a TAC case, but I don't have the necessary rights to do that for this organization, yet.
To your questions: yes the vlans are configured the same on the 3850 as the 2921, routing/routes are the same, and the DHCP server sits on the same subnet as the PXE server. We have a helper address for the DHCP server and played around with adding a second helper for PXE, then took them all off, then added them back, took off options, added options, changed options etc. PCs can ping the PXE all day long on the same vlan. Everything works...except for the TFTP session from PXE to the client. Multiple folks have wiped, configured and texted these configs through the 3850 so I'm fairly confident that it isn't one specific thing that I'm missing or doing incorrectly. If it is, it's all of us doing it!
Thank you Rick
Thanks for the additional information. I am a bit puzzled at this statement "PCs can ping the PXE all day long on the same vlan". It was my understanding from the original post that PXE was in a different subnet, which sort of implies in a different vlan. Can you clarify?
I would have thought that if you have a helper-address for DHCP that you would also need one for PXE. But if the PXE server is showing that tftp sessions are attempted then it sure seems like requests from the clients are being received. Can you verify that traffic initiated from the PXE server to the clients is successful?
Does DNS resolution work for clients connected to the 3850?
Sorry, I conflated that statement a bit.
PXE (and all other servers) are on a /26 subnet in our main datacenter. The PXE clients are on a /27 in a different building. What I meant by "PCs can ping the PXE all day" was that I can have several PCs on the /27 pinging away at the PXE server, then reboot one of them and attempt to get it to boot/load from the PXE server. It will retrieve the PXE server IP just fine from the DHCP options, but never download a file from the PXE server. Meanwhile, the other PCs on the /27 can still hit the PXE server.
The PXE server appears to see the tftp request, according to the logs, but it fails. No explanation given.
At this point, we have decided to stick with the 3560/3850 router/switch solution. It doesn't seem worth the effort to try and figure out what is going on with the 3850. The 2921 and 3560 have no problem supporting PXE boots/downloads across subnets...if we ever figure out the 3850 fiasco, I'll post it here.
Thank you again
Thanks for the clarification. So PCs in their separate subnet do have IP connectivity to the PXE server. That does eliminate one potential source of the problem. I do have a couple of questions:
- you have clarified that traffic initiated to PXE is successful. Can you also verify that traffic initiated from PXE to PC is also successful?
- do you have separate helper address commands for the DHCP and PXE servers?
- is DNS resolution successful for the PCs?
We have tried separate ip-helper commands for DHCP and PXE servers, no change.
DNS resolution worked fine, all clients were able to connect to the internet and use services with no issues.
With this being a classified network, I am unable to run any kind of sniffer to verify connections. I can only go off what the individual computers/servers are telling me.
I was finally able to open a TAC case with Cisco. Here is what they suggested, but so far it hasn't worked.
ip forward-protocol udp 67
ip forward-protocol udp 69
ip forward-protocol udp 4011
Even with these commands the PXE imaging fails. All other services by other clients remain up.
I'll keep this posted as more info comes, and if we find the solution.
>>> With this being a classified network, I am unable to run any kind of sniffer to verify connections <<<
This should not completely be true?
sniffing should be allowed , but you need to filter on this specific problem. and disregard data from other hosts that are not involved here.
so if it is possible to have a setup with the 3850 and only a single PXE client there should be something possible?
apart from that maybe you need to investigate ACL's in backwards direction that may not allow the return packet.
Thanks for the additional information. Glad you were able to open a case with Cisco TAC. Hopefully you can share some information with them that you could not post in this public community that might shed some light on the issue. The suggestion of forward protocol for udp 4011 is interesting. I believe that I have seen PXE work successfully without specifying that forward protocol, but it is certainly a logical suggestion.
Thanks for confirming that DNS resolution does work for the clients and that you have tried helper address for both the DHCP server and the PXE server. You have verified that traffic initiated from clients to server is successful (ping in particular). Have you tested whether traffic initiated from server to client is also successful?
I wonder if enabling directed broadcast on the client vlan interface would have any effect?