02-27-2018 05:09 AM - edited 03-01-2019 01:26 PM
Hello!
I have a really weird problem in a UCS solution, it has 2 Fabric Interconnect 6269 connected to 2 Nexus, and IOM's to the Chassis and servers in End Host Mode.
The problem is that I have some Virtual servers that can not be reached from the outside unless the server PING it's default gateway, this problem happens randomly in some servers, they can be reached over SSH and they have connection between other servers on the same VLAN, I thought it was a layer 3 problem with the Nexus, and made a test with a PING with the servers under the fault, and it can not ping it, I've compared the MAC table of the Nexus with the servers but this was not conclusive, I believed it might have been a problem with the nexus loosing the MAC of the servers, or they are not properly doing an ARP request, but I'm not sure on how to do prove this or what else this would be
02-27-2018 08:09 AM - edited 02-27-2018 08:24 AM
Greetings.
If you can ping and communicate from within the same vlan at the same time, then your issues are not likely within the UCSM.
I would suggest going to your DG and doing traceroute to the IP in question, and make sure there aren't some odd routes specifically for the problem IPs.
I have had some odd cases in past where a similar symptom was caused by duplicate macs.
Does same issue occur after you have vmotioned the guestVM to another host?
I would try resetting the mac address (delete the NIC, and re-add at guestVM config level), reapply OS IP address, and see if problem go away after changing the MAC address.
How many guestVMs do you have like this?
You have a TAC case open for this?
Thanks,
Kirk...
03-07-2018 05:52 AM - edited 03-07-2018 05:56 AM
Thank you for your response,
There are quite a few servers with this behavior, I can not really say how many because out client has resolved many of these servers pinging the gateway so until they have the failure I can not really find which server has the flaw.
I check on the duplicate MAC as you said, I really though that was it, but then we configured one of the servers with a Static Mac just to know it would not change, and still had the problem, also could not find that server.
Traceroute will not do much because it's between servers connected to a Fabric Interconnect on End Host mode to a Nexus, so the only layer 3 device in that communication is the Nexus.
The flaw happens after Vmotion yes, but that was not conclusive as it did not happen every time, but we recently found out that the problem always happen after the servers were cloned, we are checking to see if the problem is with the VmNic but so far it has not been conclusive, any though on this?
Sadly, there is a problem with the client's Cisco account, so right now we are unable to open a TAC case.
Thanks,
Jesus B.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide