Peculiar Timing Issue With PxE In Nexus Environment
I have seen some peculiar behavior doing PxE builds. I wanted to pick the brain of some experienced network engineers as my research on the internet shows there is a ton of contradictory opinions / philosophies.
Here is the setup:
1. HP SL4540 Servers with a Broadcom Copper/RJ-45 10Gbe Cards with PxE enabled. 2. Connected to Nexus 2232 10G Copper/RJ-45 edge switches going back to a Nexus 5596 aggregation layer. 3. IP helpers services are enabled on the 2232 so that DHCP and PxE requests are forwarded to a specific IP address. 4. Both the Switch Port & NIC Port are set to auto-negotiate.
With the solution fully configured, the servers still fail to either find the DHCP / PxE server or make initial contact but never complete the handshake process. The peculiar thing is that we sometimes will see this late at night. We break for sleep and wake up in the morning and the servers are fine and ready to load an OS. There is some type timing phenomenon going on in the background and given enough time they eventually find this timing window and move forward.
I’ve read dozens of articles now and I have found some experts pointing to potential causes but then others refuting the same ones. Here are the preliminary ones I found.
1. With NIC and Switch Port set to auto-negotiate the negotiation process is taking longer than the PxE request process cycle which seems to be pretty short (10 to 15 seconds). I have read tons of conflicting info. Some of it (Citrix, Altiris, VMWare) saying setting to full duplex will circumvent the negotiation process if it is indeed taking longer than the PxE request process. Other docs say Gigabit Ethernet requires auto-negotiate and wont work without it. What are your thoughts on this ? Is Auto-Negotiate absolutely required for 1G/10G and if not can setting to full duplex potentially help the NIC sync with the switch quicker ? We do have tight control of server and switch so can ensure both are set to whatever we need. 2. I found a bunch of articles where PxE fails when certain services are active on the switch. The articles showed that these services can cause a long negotiation process to get link, or even allow link but hold packets for a bit as they bring in the new connections. The ones in particular they mentioned were: a. Spanning Tree Protocol b. Ether Channel c. Port Aggregation Protocol d. Disabled PortFast Service What are your thoughts on this ? Are there other services that could have a similar effect ? 3. Outside these theories any other reasons one could see why this would happen ? The fact that it eventually “fixes itself” bugs me so I want to find a definitive answer.
Also from the switch side what troubleshooting steps can we take to prove or disprove these theories ? We can pull the switch logs, is there any other things to look for or monitor ? What would we expect to see in the switch log if one of these theories were correct ? Would loading WireShark and examining packet dumps be of any use here ? Any insight would be greatly appreciated. Thanks Cisco community. :-)
ENCS 5400 is a purpose built compute platform for branch networking. Multiple VNFs (virtual network functions) can be hosted in the ENCS platform with flexible connectivity options.
There are multiple Layer2 software and hardware entities in a typi...
how do we restrict a router interfaces from directly connected to Some vlans? can any one help me to figureout?the question is Router should not have interfaces directly connected to Vlan 30 and Vlan 40
I've got a one problem. Me and my friend have the same ISP. I checked my External IP address at WhatIsMyIp.com and my friend do it to. And we saw we have the same External IP.So my question is can 2 routers have the same External IP address?If i'm right 2...
LISP Protocol (Location Identifier Separation Protocol)! - The LISP protocol has become a brilliant stardom with the digital transformation that we are now experiencing. - Today we will talk about the LISP protocol and its advantages and method of p...