05-16-2012 04:59 AM
Hi.
I have a pretty basic design with FCoE running on a Nexus 5548 with NPV (true FC to a Brocade fabric) and Nexus 4000.
Everything works fine until we have an ESX trying to reach a datastore shared by another ESX. Once the second ESX is coming, here is what happens :
N5K-01# 2012 May 15 15:39:48 N5K-01 %PORT-5-IF_TRUNK_DOWN: %$VSAN 620%$ Interfac e vfc107, vsan 620 is down (Gracefully shutdown)
2012 May 15 15:39:48 N5K-01 %PORT-5-IF_DOWN_NONE: %$VSAN 620%$ Interface vfc107 is down (None)
2012 May 15 15:39:48 N5K-01 %PORT-5-IF_TRUNK_DOWN: %$VSAN 620%$ Interface vfc107 , vsan 620 is down (waiting for flogi)
N5K-01#
N5K-01#
N5K-01# 2012 May 15 15:40:40 N5K-01 %NPV-3-ACL_UPDATE_FAILED: Device FLOGI entry update failed for pwwn 21:00:00:c0:dd:1b:82:c5 on server interface vfc123 : (null)
2012 May 15 15:40:40 N5K-01 %PORT-5-IF_TRUNK_DOWN: %$VSAN 620%$ Interface vfc123, vsan 620 is down (Gracefully shutdown)
2012 May 15 15:40:40 N5K-01 %PORT-5-IF_DOWN_NONE: %$VSAN 620%$ Interface vfc123 is down (None)
2012 May 15 15:40:41 N5K-01 %PORT-5-IF_TRUNK_DOWN: %$VSAN 620%$ Interface vfc110, vsan 620 is down (Gracefully shutdown)
2012 May 15 15:40:41 N5K-01 %PORT-5-IF_DOWN_NONE: %$VSAN 620%$ Interface vfc110 is down (None)
2012 May 15 15:40:41 N5K-01 %PORT-5-IF_TRUNK_DOWN: %$VSAN 620%$ Interface vfc123, vsan 620 is down (waiting for flogi)
2012 May 15 15:40:41 N5K-01 %PORT-5-IF_TRUNK_DOWN: %$VSAN 620%$ Interface vfc110, vsan 620 is down (waiting for flogi)
2012 May 15 15:40:48 N5K-01 %PORT-5-IF_TRUNK_DOWN: %$VSAN 620%$ Interface vfc106, vsan 620 is down (Gracefully shutdown)
2012 May 15 15:40:48 N5K-01 %PORT-5-IF_DOWN_NONE: %$VSAN 620%$ Interface vfc106 is down (None)
2012 May 15 15:40:48 N5K-01 %PORT-5-IF_TRUNK_DOWN: %$VSAN 620%$ Interface vfc108, vsan 620 is down (Gracefully shutdown)
2012 May 15 15:40:48 N5K-01 %PORT-5-IF_DOWN_NONE: %$VSAN 620%$ Interface vfc108 is down (None)
2012 May 15 15:40:48 N5K-01 %PORT-5-IF_TRUNK_DOWN: %$VSAN 620%$ Interface vfc106, vsan 620 is down (waiting for flogi)
2012 May 15 15:40:49 N5K-01 %PORT-5-IF_TRUNK_DOWN: %$VSAN 620%$ Interface vfc108, vsan 620 is down (waiting for flogi)
The vfc of the incoming ESX goes down (graceful shutdown ?) and then 1 minute later, all vfc also go down.
05-16-2012 05:54 AM
Hi Surya
What NX-OS are you running on Nexus 4000? Can you also paste output of "show interface flowcontrol" from the Nexus 4k?
05-16-2012 06:00 AM
Hi.
All switches run the last version available on CCO :
4.1(2)E1(1h) for Nexus 4000 and 5.1.3 N2 for 5548.
In fact this is the same installation as described here
https://supportforums.cisco.com/thread/2121164
but another issue
I'll retrieve the output of the command.
05-17-2012 03:18 AM
Here it is; we can see a specific interface which receives a lot of pause frames from the associated server, but why do all the vfc go down ?
NX-BCH1-M7# sh interface flowcontrol
--------------------------------------------------------------------------------
Port Send FlowControl Receive FlowControl RxPause TxPause
admin oper admin oper
--------------------------------------------------------------------------------
Eth1/1 off off off off 0 8710974
Eth1/2 off off off off 0 8710990
Eth1/3 off off off off 0 8710978
Eth1/4 off off off off 0 8711002
Eth1/5 off off off off 0 8711015
Eth1/6 off off off off 0 8711026
Eth1/7 off off off off 1753895 8710893
Eth1/8 off off off off 0 8710966
Eth1/9 off off off off 0 8711056
Eth1/10 off off off off 0 8711028
Eth1/11 off off off off 0 8711059
Eth1/12 off off off off 0 8711061
Eth1/13 off off off off 0 8711062
Eth1/14 off off off off 0 8711047
Eth1/15 off off off off 0 8710658
Eth1/16 off off off off 0 8710632
Eth1/17 off off off off 0 0
Eth1/18 off off off off 0 0
Eth1/19 off off off off 0 0
Eth1/20 off off off off 0 0
Po2 off off off off 0 1742130
05-17-2012 04:22 AM
Hello Surya
We have seen instances where if a rogue server sends too many pause frames, FIP snooping on Nexus 4000 would stop working. This would cause the upstream N5k not to see FIP Keep alives and it would bring the vFCs down. You might want to investigate why server blade/CNA on Eth1/7 is doing this.
Thanks
-Prashanth
05-17-2012 04:31 AM
ok, maybe faulty NIC or bad driver ?
Surya
05-17-2012 05:03 AM
Quite possible
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide