Solved: Re: UCS HA and VN-link questions

moamen.elhefnawy · ‎01-21-2011

Hello Dears,

I'm new to the UCS and I have some questions concerning the HA, VN-link hardware and the dynamic vNIC policies.

As I understand we use the VN-link to allow VMs on the ESX to use a direct vNIC from the UCS and it will be mapped to veth on the fabric interconnect

I believe that I understand the concepts, but my problem in how to configure the full setup and when to use it:

1- when exactly we need to use the VN-link , and if it is used only if I will install ESX .

2- as I understand we use VN-link we need to configure dynamic vNIC allocation policies, is there any documents explain teh configuration steps for doing that, and how the dynamic vNICs will be allocated to the VMs.

3- in the case if the Dynamic VN-link, does that mean we don't need to configure a static vNICs and if we need to configure them , how they will be used, and how will be the configuration exactly.

4- in the Create vNIC Template part in the UCSM configuration guide, there is an option Target, which has one of 2 values, either Adapter or VM, is this related to the VN-link, and which option to use in each case.

5- How the HA will work in the case of I have VN-link with ESX , and is it true that we don't need to enable the fabric failover in this case , and why ? and how all will work together ( if I have 2 vCONs, on both of them I have created 2 VNIC and 2 vHBA , and as I understand each one of them is connected to the both IOMs.

6- there is a note in the UCS configuration guide that we may not be able to use the option " bring vNIC down, when the uplink is down, because in this case the traffic on the vHBA maybe affected, but if we did that , the OS will not detect that the vNIC is down so he will not enable teaming failover, so how it will work ?

I know I have a lot of questions, but I hope that you will able to help me .

Thanks in advance for your support.

Moamen

Manish Tandon · ‎01-23-2011

Moamen

> Regarding the point of Fabric Interconnect Failover, and if it should be used, I found in some documents that UCSM 1.3 will not synch the MAC addresses that Learned from the VMs , so failover will not work with ESX environment, I’m not sure is that is correct and if it is covered in the next release.

<>

The issue you are mentioning is for soft switches (vSwitch, DVS, Nexus 1000v) in UCS.

VN-link in hardware is different as it is a pass through rather than doing any local switching.

So the issue does not exist with it. The dynamic vNICs created via the policy which are used in VN-link in hardware have always been Fabric Failover enabled and it works.

> “The default behavior of the Action on Uplink Fail property is optimal for most Cisco UCS that support

>ink failover at the adapter level or only carry Ethernet traffic. However, for those converged network

> adapters that support both Ethernet and Fibre Channel traffic, such as the Cisco UCS CNA M72KR-Q

> and the Cisco UCS CNA M72KR-E, the default behavior can affect and interrupt Fibre Channel traffic

<>

The quoted documentation is for the M72KR's.

Fabric failover is only supported on the M71KR's (also known as Menlo's) and the M81KR (also known as the VIC or Palo).

VN-link in hardware is only supported on the M81KR (Palo).

For the M72KR's, you should not enable Fabric failover and you need to use a teaming driver (along with beaconing if you want failover on uplink failures) on the host to achieve failover and that is what it talks abt.

For M71KR's and M82KRs when the ethernet uplinks fail, only ethernet vNICs are failed over. vHBA's are not effected as mentioned earlier.

Look at the following thread for how fabric failover works for the M72KR's

https://supportforums.cisco.com/thread/2060305?tstart=30

Thanks

--Manish

View solution in original post

Manish Tandon · ‎01-21-2011

Moamen

> 1- when exactly we need to use the VN-link , and if it is used only if I will install ESX .

VN-link is an umbrella terms which refers to a logical link between a VM NIC to a switch port on a Cisco switch.

It comes in 2 flavors

a) VN-link in software (Nexus 1000v)

b) VN-link in hardware (which is what I believe your below questions are pertaining to.

A technical primer on VN-link which discusses the 2 modes is given at

http://www.cisco.com/en/US/partner/solutions/collateral/ns340/ns517/ns224/ns892/ns894/white_paper_c11-525307_ps9902_Products_White_Paper.html

Currently VN-link is only available for ESX though the intent is to have it across other hypervisors like XEN, KVM, Hyper-V etc.

Like we showcased it last year with Redhat KVM -

http://www.redhat.com/about/news/prarchive/2010/cisco.html

>2- as I understand we use VN-link we need to configure dynamic vNIC allocation policies, is there any documents explain teh configuration steps for doing that, and how the dynamic vNICs will be allocated to the VMs.

This cookbook talks abt the configuration steps -

http://www.cisco.com/en/US/products/ps10281/products_configuration_example09186a0080b52d0d.shtml

>3- in the case if the Dynamic VN-link, does that mean we don't need to configure a static vNICs and if we need to configure them , how they will be used, and how will be the configuration exactly.

Static vNICs are still required (2 ethernet vNICs).

The above doc explains how they are used. They are not given to the VM's but to the vDS.

>4- in the Create vNIC Template part in the UCSM configuration guide, there is an option Target, which has one of 2 values, either Adapter or VM, is this related to the VN-link, and which option to use in each case.

One of the benefits of VN-link in hardware is that it brings feature parity to virtualized and non-virtualized hosts.

When a VM is using a dynamic vNIC (dictated by the port profile) its switchport is on the FI.

Similarly when you load Windows on bare metal with Palo, the switchport again is on the FI.

So the troubleshooting methodology for the two will be the same.

The vNIC template is providing you a single place to create a switchport definition and you can use it for adapter (bare metal) or VM (port profile) if you choose that.

>5- How the HA will work in the case of I have VN-link with ESX , and is it true that we don't need to enable the fabric failover in this case , and why ? and how all will work together ( if I have 2 vCONs, on both of them I have created 2 VNIC and 2 vHBA , and as I understand each one of them is connected to the both IOMs.

The dynamic vNICs when created are Fabric Failover enabled. You do not have a choice there i.e you cannot disable fabric failover for dynamic vNICs.

When you associate a dynamic vNIC policy to a SP, you will see the vNICs created are FF enabled (A-B or B-A in the Fabric ID) column.

>6- there is a note in the UCS configuration guide that we may not be able to use the option " bring vNIC down, when the uplink is down, because in this case the traffic on the vHBA maybe affected, but if we did that , the OS will not detect that the vNIC is down so he will not enable teaming failover, so how it will work ?

Ethernet vNICs and vHBAs are different PCI devices and are handled differently.

When the vNIC is brought down (because of uplink failure), the vHBA is *not* affected.

Thanks

--Manish

moamen.elhefnawy · ‎01-23-2011

Hello Manish,

Thanks a lot for your detailed replay, I really appreciate it, but there are some minor points not clear for me, sorry for that, but we are still waiting the lab kit , so I don’t have the option now to test what I need to know.

- Regarding the point of Fabric Interconnect Failover, and if it should be used, I found in some documents that UCSM 1.3 will not synch the MAC addresses that Learned from the VMs , so failover will not work with ESX environment, I’m not sure is that is correct and if it is covered in the next release.

- And regarding my last question about if bringing the vNIC down when the uplink is down will affect the vHBA, please find the abstract from the configuration guide and please advise what the meaning of it.

“The default behavior of the Action on Uplink Fail property is optimal for most Cisco UCS that support

link failover at the adapter level or only carry Ethernet traffic. However, for those converged network

adapters that support both Ethernet and Fibre Channel traffic, such as the Cisco UCS CNA M72KR-Q

and the Cisco UCS CNA M72KR-E, the default behavior can affect and interrupt Fibre Channel traffic

as well. Therefore, if the server includes one of those converged network adapters and the adapter is

expected to handle both Ethernet and Fibre Channel traffic, we recommend that you configure the Action

on Uplink Fail property with a value of warning. Please note that this configuration may result in an

Ethernet teaming driver not being able to detect a link failure when the border port goes down.”

Thanks a lot of your help and time.

Moamen

Manish Tandon · ‎01-23-2011

Moamen

> Regarding the point of Fabric Interconnect Failover, and if it should be used, I found in some documents that UCSM 1.3 will not synch the MAC addresses that Learned from the VMs , so failover will not work with ESX environment, I’m not sure is that is correct and if it is covered in the next release.

<>

The issue you are mentioning is for soft switches (vSwitch, DVS, Nexus 1000v) in UCS.

VN-link in hardware is different as it is a pass through rather than doing any local switching.

So the issue does not exist with it. The dynamic vNICs created via the policy which are used in VN-link in hardware have always been Fabric Failover enabled and it works.

> “The default behavior of the Action on Uplink Fail property is optimal for most Cisco UCS that support

>ink failover at the adapter level or only carry Ethernet traffic. However, for those converged network

> adapters that support both Ethernet and Fibre Channel traffic, such as the Cisco UCS CNA M72KR-Q

> and the Cisco UCS CNA M72KR-E, the default behavior can affect and interrupt Fibre Channel traffic

<>

The quoted documentation is for the M72KR's.

Fabric failover is only supported on the M71KR's (also known as Menlo's) and the M81KR (also known as the VIC or Palo).

VN-link in hardware is only supported on the M81KR (Palo).

For the M72KR's, you should not enable Fabric failover and you need to use a teaming driver (along with beaconing if you want failover on uplink failures) on the host to achieve failover and that is what it talks abt.

For M71KR's and M82KRs when the ethernet uplinks fail, only ethernet vNICs are failed over. vHBA's are not effected as mentioned earlier.

Look at the following thread for how fabric failover works for the M72KR's

https://supportforums.cisco.com/thread/2060305?tstart=30

Thanks

--Manish

moamen.elhefnawy · ‎01-24-2011

Hello Manish,

Thanks a lot for your support , it is really appreciated.

Moamen

chandra20 · ‎04-16-2011

Hi Manish,

I just going through this thread for finding solution to the issue I am facing on my new UCS setup.

I am using palo card (M81KR), when I powered down the Fabric interconnect-A to test failover, I am losing connectivity to all my ESX hosts for more than 5mins and getting disk errors on all my VMs. I had to reboot all my UC VMs to fix the disk errors. Could you please help, what is that we need to to do fix this issue.

Thanks in advance.........

CGP

Manish Tandon · ‎04-17-2011

Chandra

Will have to break this down to narrow down on the issue.

Fabric Failover *only* applies to Ethernet traffic.

For FC, SAN multipathing applies and has nothing to do with Fabric failover capability of the M81KR.

Where is your VMFS at as you mentioned disk errors?

If its on the SAN, need to troubleshoot it from the SAN multipathing perspective i.e Manage Paths under VCenter and seeing the paths and target/lun etc on both paths.

If its on NFS/iSCSI (i.e Ethernet), then we are looking at the soft switch and vNICs.

For ESX soft switch environment (Nexus1000v/ DVS/ vSwitch), the recommendation is to *not* have Fabric Failover enabled on the vNICs i.e having one on each side (in pairs) with non-FF to provide load sharing and redundancy.

What does your config look likes from a vNIC perspective if its the ethernet path?

Thanks

--Manish

chandra20 · ‎04-18-2011

Hi Manish thanks for the reply.

We have disks on SAN, no vc, as we are built this environment for Cisco UC applications.

After full day of Sunday’s work, I found some issue with upstream topology or configuration.

We have two 3750 upstream switches one stack. connected to FIC-A&B, (cross connection)

When I make this straight HA works absolutely fine.

What would be the recommended topology for upstream connection, on stacked switches?

Cheers,

Chandra

Manish Tandon · ‎04-18-2011

Chandra

I don't believe I understand the issue then.

If ethernet connectivity is the issue you ran into on HA test you shouldn't have had to reload ESX hosts to get connectivity back for the disk errors.

As it is SAN based, connectivity to 3750 is only for IP/Ethernet.

We do recommend port channeling upstream where possible. So if you have stacked switches upstream, port channel (LACP) from the FI's to them.

It is not a requirement as in EHM there is always active/active usage of links but port channeling means the least number of moving parts when things change (links flap etc) plus minimizing number of hops for east-west blade traffic if its vPC upstream.

Am unsure why criss cross to diff modules within the stack didn't work for you.

Maybe you can contact me offline and I can look at your config etc.

Thanks

--Manish

chandra20 · ‎04-18-2011

Thanks Manish,

Not sure whats happening, when I did the first test plugging out FI-A 's power. I lost network connectivity for about 5mins. Then all my VMs were with the error "EXT3-FS.........Journal has aborted". Then we have to reboot all VMs to fix the error.

Then next day, I just disabled cris cross network and tested with straight connection. Worked fine. Again today, I enabled cris cross ports and tested. Works fine without issue. I could not able to get the RCA for the first test failure. :-(

Rgrds,

Chandra