cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
968
Views
0
Helpful
5
Replies

UCS Mini - intermittent ping replies when FI B rebooted

Brian Hayes
Level 1
Level 1

Here is what I have. I have a UCS mini at fw version 3.0.2C and we are going to 3.1.2C. Before starting a call to TAC to open a pre-check prior to the upgrade, I created my backups and I was going to boot each fabric interconnect separately to clear out the files in /Var/Tmp that grow to large with the reboot taking them back down to normal. This UCS Mini is at one of our remote sites and has 6 B200 blades in it and no other servers connected to it. I have ping going to all the blades running ESXi 6.0 U3, some guest OS's, and the FI ip's. I boot FI B (subordinate) and start seeing 4 of the blades start dropping pings intermittently. All the guest OS's on those 4 blades keep pinging normally. FI B comes back up and the blades start pinging normally again. I tried rebooting FI B again and the same exact thing happens. I verified my NIC templates and they are configured correctly. I verified the service profiles are configured correctly. I talked to the VMWare team and we looked and everything on the VMWare side looks configured correctly also. I spoke to TAC and the tech could not find the issue and he does not want to change the cluster lead and reboot FI A until we find out why the pings drop. Does anyone have any suggestions on what to look at next or an idea of what could be making just 4 of the 6 blades ping intermittently when FI B is booted?

5 Replies 5

Evan Mickel
Cisco Employee
Cisco Employee

1) PM me your case number.

2) Grab one of the servers that encounter this issue, note the MAC address of the interface that you're pinging on the server side.

3) Use the show mac address-table | grep <mac address> on NXOS from each FI and on the upstream switches.  Do this during the incident as well, it will give you information on which interface exactly is in use on the FI and upstream switch to service this traffic.

4) Utilize the show interface command to determine whether or not interface errors are occurring.

5) Where are you pinging from within the network?  Is this from your local machine?  Within the subnet?  Some additional information on what lies between the src and dst would be helpful.

 

This will get you started on understanding where exactly the traffic is traversing.  During the reboot, of course, FI-A will be passing the traffic in question, if there is an issue for example on the upstream switch (CRCs, input/output errors, control plane policing policies) you could see intermittent connectivity from certain protocols such as ICMP.

 

There are many moving parts to an issue like this, the above steps will get us moving in the right direction.  

 

 

 

Thanks!

Thanks very much for the PM, I took a look through the case notes, it seems to be moving forward.  The best thing to do would be to track the traffic/trace the MAC addresses while the issue is actively taking place as initially noted.  Feel free to reach back out to the thread with any updates, and if you are not receiving the support your need through your ticket, you do have the option to reach TAC frontline and request that it be requeued at any time.

 

 

 

Thanks!

 

 

Wes Austin
Cisco Employee
Cisco Employee

I would suggest having TAC on the line if you are able to reproduce the problem and allow them to debug it live. This will be the most beneficial way to understand the behavior. Otherwise, follow the suggestions by Evan.

Brian Hayes
Level 1
Level 1

I am finally able to get back to this issue. I have looked at the mac tables on both FI's on this UCS domain with the issue and a few other of our UCS domains. So I was thinking that the mac tables would match each other but that is not the case on any of our ucs domains. On the ones we looked at so far, the management vlan macs are on the subordinate FI and not on the primary. Is that normal?

The MAC tables will not match, each endpoint (vNIC) will pin to uplinks based on where VLANs are being forwarded and will pass traffic through either FI-A or FI-B.  Because MAC learning is source based, we will only populate MAC addresses based on traffic that is arriving from a given address.

 

Primary and Subordinate are terms that are only relevant with respect to UCSM/and the management engines that run the platform in the background.  This has no effect on the data plane, or on MAC learning.

 

 

 

 

Thanks!

Review Cisco Networking for a $25 gift card