You are correct and we do

embowers1 · ‎06-09-2014

We have 2 Nexus 5548 and the UCS 6100 is homed to both with a port channel. We need to replace a 5548. When we unplug the UCS connection from that 5548 we lose the UCS totally. Even though it is still connected to the other 5548. Anyone seen this before or know enough about UCS to formulate a possible reason? One 5548 has 2 links in port-channel configuration to Fabric-A and the other 5548 is connected the same way to Fabric-B. I am assuming there is something in the UCS that is not happening. The UCS admin assures me the failover tests he performs works from the UCS.

Mohammed Majid Hussain · ‎06-09-2014

Hi,

I have a suspicion on the network control policy. (Lan tab)

By default the action on uplink fail is set to link-down can you set it to warning and check?

Thanks,

Majid

Walter Dey · ‎06-09-2014

I think something is not setup properly, either on N5k or the UCS FI;

see eg

http://www.cisco.com/c/en/us/products/collateral/switches/nexus-7000-series-switches/white_paper_c11-623265.html

Figure 9

If one fabric goes down, be it the N5K or the FI, the other fabric should still be available, assuming that your teaming (Ethernet) or multipathing (Fibrechannel) is properly configured.

embowers1 · ‎06-09-2014

Sadly I am not not have the information as to what they are doing on the UCS as of yet. I do know the network control policy is set to warning though. I also know the 5k config is solid that is simple.

interface port-channel15
description UCS_6100
switchport mode trunk
switchport trunk allowed vlan xxx-xxx
spanning-tree port type edge trunk
speed 10000

interface Ethernet1/15
description UCS_6100
switchport mode trunk
switchport trunk allowed vlan xxx-xxx
spanning-tree port type edge trunk
channel-group 15 mode active

interface Ethernet1/16
description UCS_6100
switchport mode trunk
switchport trunk allowed vlan xxx-xxx
spanning-tree port type edge trunk
channel-group 15 mode active

Which leads me right back the the UCS.

Mohammed Majid Hussain · ‎06-09-2014

Which version of UCSM are we running?

Is vlan optimization count enabled?

The symptoms you describe lead me to believe that this could be due to defect CSCuf35678

https://tools.cisco.com/bugsearch/bug/CSCuf35678/?reffering_site=dumpcr

Thanks,

Majid

embowers1 · ‎06-10-2014

We are running 2.2.1b.

Walter Dey · ‎06-10-2014

Do you really have 2 vPC between each Fabric Interconnect and the 2 N5K, and they are both up and running ?

embowers1 · ‎06-10-2014

They are not a vPC they are set up as Orphaned Ports.

Walter Dey · ‎06-10-2014

Why ? it is best practise to use vPC between UCS FI and N5K. I am almost sure, that your initial problem is not related to UCS, but to this strange PC connectivity.

embowers1 · ‎06-10-2014

This connectivity was something that was in place prior to me. It was relayed to me that this was designed and installed by cisco and at the time was a Cisco "Best Practice" for the architecture in place. Not what the core architecture has turned into or as is currently stands. The 5K swap is part of that conversion. Hence the need to pull the old 5548P and replace it with a new 5548UP.

I will agree that the UCS connectivity does not meet current best practices based on the current Nexus designs. However let that not distract you.

From a Nexus standpoint if you lose a switch that should not stop the UCS from seeing the fabric drop and repining the VNICs to the other fabric. Then the fabric would send a GrARP to the switch that is still up and rehome those addresses to that switch.

Don’t confuse this with losing the peer link as that would indeed cause the same type of issue. Only the vPC role (secondary) switch would shut down vPC’s as well as all the L3 SVI's. The physical interfaces would remain up and the UCS would not see a down status. So it would keep trying to use the fabric to that Nexus. Essentially you would create a black hole for traffic leaving that fabric to the Nexus where it would have no L3 route to the next hop and just drop the traffic.

Walter Dey · ‎06-11-2014

From a Nexus standpoint if you lose a switch that should not stop the UCS from seeing the fabric drop and repining the VNICs to the other fabric. Then the fabric would send a GrARP to the switch that is still up and rehome those addresses to that switch.

ususally this is not the default behaviour, unless you use the hardware failover flag ! If all uplinks to the switch fail, UCSM will signal link down to all vnics/vhba's of this fabric. (default setting of Network Control Policy -> Action on Uplink failure -> link down)

embowers1 · ‎06-11-2014

You are correct and we do have it set for "Warning" not the default action of "Link Down". We have a call with Cisco today to look into the issue. I am working on the plans to convert this whole thing to a vPC design. This will put us in line with Cisco best practice for Nexsus. Instead of the current connectivity that was in place when 6509's were there and you couldn't do vPC. Port-Channels were the only option during that time.

Walter Dey · ‎06-11-2014

OK, understood !

I also hope / assume your UCS domain is in Ethernet EHM (end host mode).

It is clear, that with this setting "warning on link down", you are sending traffic in a black hole, unless your fabric interconnects are in ethernet switching mode; which is today (it was different a few years back) not best practise; there isnt't a use case for ethernet switching mode.

UCS Connectivity issues.