C-240 cannot communicate with the B200 Blade servers which are connecting to the sameFabric Intercon...

zhang_johnson · ‎08-01-2015

Hi everybody,

Recently when we are doing Fabric Interconnect fail-over test, we met a very strange problem.

We have one UCS 5108 Mini chassis with four B200M3 Blade servers and two 6324 Fabric Interconnects. Another C-240 M3 is connecting to this 5108 chassis with two VIC 1228 cards and two Cisco Direct-Attach Breakout Cables, and it is managed by the 5108's UCS manager.

The C-240 has windows 2012 R2 installed, and the B200M3 has VMware ESXi 5.5 installed. and the C-240 and the B200M3's management vmkernels are on the same VLAN.

When both sides' uplinks are up and running, everything works fine. But when we disabled uplinks of one side, the C-240 fail to communicate with the B200M3's management vmkernels, cann't ping with each other. Although all of the C-240 and B200M3's mgmt vmkernels can communicate with outside of the UCS system or even VMs running on the B200M3s.

We tried to disable each side's uplinks separately, and got the same result. The communicate between C-240 and B200M3 resumes as soon as we recover the disabled uplinks.

Any suggestion is appreciated.

Johnson

zhang_johnson · ‎08-01-2015

Update:

We also installed windows 2012 R2 on one of the B200M3 servers, when one side's uplinks are disabled, the Windows system on C-240 can communicate with windows system on B200M3 server, and the ESXi mgmt vmkernels can communicate with each other. but Windows systems can't communicate with ESXi hosts.

All of them are in the same VLAN.

ODUrasler · ‎08-12-2015

Curious to see if you have solved your issue.

I'm having a similar issue where I'm using the same service profile for all my blades. However, I have 2- M4 blades that are unable to communicate with the other blades (M3, all are ESXi). The service profile is utilizing 2 nics--one for Fabric A, and one for Fabric B. If I were to remove one of the uplinks from ESX (for example, fabric A), i would get communication. I have ruled out that Fabric A is the issue, since i was able to get communication on Fabric B.

I did recently upgrade the FW for the infrastructure to 2.2(5b) as well as for the 2-new blades. The M3s are still on 2.2(1c)...so i'm thinking i may have to upgrade the FW on those servers to see if that fixes my problem.

zhang_johnson · ‎09-13-2015

My problem resolved by upgrading the enic driver at the ESXi hosts from 2.1.2.59 to 2.1.2.69 to match the VIC's firmware verion 4.0(3a).

However, I encountered another issue: ESXi hosts cannot communicate with each other across Fabrics. For example, if ESXi host1's vmk0 is active at Fabric A, and ESXi host2's vmk0 is active at Fabric B, they cannot talk to each other even they are in the same chassis, but they can talk to other servers outside this chassis.

reboot the hosts may get communication but it appeared three times and I am not sure when this will come out again or how to reproduce this scenario...

Andreas Linde · ‎09-23-2015

Hi, in those cases where you are running ESXi, can you check if the vmk Management PG has picked up a MAC address from one of the physical nics? If it has then you might see random traffic issues like the ones you describe.

If the vmk has a Cisco MAC (00:25:B5...) instead of a VMware MAC (00:50:56...) then have a look at these articles:

https://tools.cisco.com/bugsearch/bug/CSCuf65032/?referring_site=bugquickviewredir

https://tools.cisco.com/bugsearch/bug/CSCuv00089/?referring_site=bugquickviewredir

(Scroll down to the "Further Problem Description" part).

Fix it by removing and adding the Management PG back:

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1031111

Regards