cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
3040
Views
35
Helpful
7
Replies

Ask the Expert: Cisco UCS Troubleshooting Boot from SAN with FC and iSCSI

ciscomoderator
Community Manager
Community Manager

  

Welcome to this Cisco Support Community Ask the Expert conversation. This is an opportunity to learn and ask questions about Cisco UCS Troubleshooting Boot from SAN with FC and iSCSI with Vishal Mehta and Manuel Velasco.

The current industry trend is to use SAN (FC/FCoE/iSCSI) for booting operating systems instead of using local storage.

Boot from SAN offers many benefits, including:

 

  • Server without local storage can run cooler and use the extra space for other components.
  • Redeployment of servers caused by hardware failures becomes easier with boot from SAN servers.
  • SAN storage allows the administrator to use storage more efficiently.
  • Boot from SAN offers reliability because the user can access the boot disk through multiple paths, which protects the disk from being a single point of failure.

 

Cisco UCS takes away much of the complexity with its service profiles and associated boot policies to make boot from SAN deployment an easy task.

 

Vishal Mehta is a customer support engineer for Cisco’s Data Center Server Virtualization TAC team based in San Jose, California. He has been working in the TAC for the past three years with a primary focus on data center technologies such as Cisco Nexus 5000, Cisco UCS, Cisco Nexus 1000v, and virtualization. He has presented at Cisco Live in Orlando 2013 and will present at Cisco Live Milan 2014 (BRKCOM-3003, BRKDCT-3444, and LABDCT-2333). He holds a master’s degree from Rutgers University in electrical and computer engineering and has CCIE certification (number 37139) in routing and switching and service provider.

Manuel Velasco is a customer support engineer for Cisco’s Data Center Server Virtualization TAC team based in San Jose, California. He has been working in the TAC for the past three years with a primary focus on data center technologies such as Cisco UCS, Cisco Nexus 1000v, and virtualization. Manuel holds a master’s degree in electrical engineering from California Polytechnic State University (Cal Poly) and VMware VCP and CCNA certifications.

Remember to use the rating system to let Vishal and Manuel know if you have received an adequate response. 

Because of the volume expected during this event, our experts might not be able to answer every question. Remember that you can continue the conversation in the Data Center community, under subcommunity Unified Computing, shortly after the event. This event lasts through April 25, 2014. Visit this forum often to view responses to your questions and the questions of other Cisco Support Community members.

7 Replies 7

Vishal Mehta
Level 1
Level 1

Below are the summarized tasks to configure Boot-From-SAN (using FC/FCoE/iSCSI)

1.      UCS Manager Tasks

A.     Create a Service Profile Template with x number of vHBAs or iSCSI vNICs.

B.     Create a Boot Policy that includes SAN Boot as the first device and link it to the Template

C.     Create x number of Service Profiles from the Template

D.     Use Server Pools, or associate servers to the profiles

E.      Let all servers attempt to boot and sit at the “Non-System Disk” style message that UCS servers return

2.      Switch Tasks

A.     Zone the server WWPN to a zone that includes the storage array controller’s WWPN.

B.     Zone the second fabric switch as well. Note: For some operating systems (Windows for sure), you need to zone just a single path during OS installation so consider this step optional.

3.      Array Tasks

A.     On the array, create a LUN and allow the server WWPNs for FC or the initiator IQN for iSCSI  to have access to the LUN.

B.     Present the LUN to the host using a desired LUN number (typically zero, but this step is optional and not available on all array models)

egordon310
Level 1
Level 1

Hi Vishal/Manuel,

What is the methodology used to troubleshoot Boot-from-SAN issues?  Really appreciate your help and expertise.  

Evan

Hello Evan

Thank you for asking this question. Most common TAC cases that we have seen on Boot-from-SAN failures are due to misconfiguration.

So our methodology is to verify configuration and troubleshoot from server to storage switches to storage array.

Before diving into troubleshooting, make sure there is clear understanding of this topology. This is very vital with any troubleshooting scenario. Know what devices you have and how they are connected, how many paths are connected, Switch/NPV mode and so on.

Always try to troubleshoot one path at a time and verify that the setup is in complaint with the SW/HW interop matrix tested by Cisco.

 

Step 1: Check at server

a. make sure to have uniform firmware version across all components of UCS

b. Verify if VSAN is created and FC uplinks are configured correctly. VSANs/FCoE-vlan should be unique per fabric

c. Verify at service profile level for configuration of vHBAs - vHBA per Fabric should have unique VSAN number

Note down the WWPN of your vhba. This will be needed in step 2 for zoning on the SAN switch and step 3 for LUN masking on the storage array.

d. verify if Boot Policy of the service profile is configured to Boot From SAN - the Boot Order and its parameters such as Lun ID and WWN are extremely important

e. finally at UCS CLI - verify the flogi of vHBAs (for NPV mode, command is (from nxos) – show npv flogi-table)

 

Step 2: Check at Storage Switch

a. Verify the mode (by default UCS is in FC end-host mode, so storage switch has to be in NPIV mode; unless UCS is in FC Switch mode)

b. Verify the switch port connecting to UCS is UP as an F-Port and is configured for correct VSAN

c. Check if both the initiator (Server) and the target (Storage) are logged into the fabric switch (command for MDS/N5k - show flogi database vsan X)

d. Once confirmed that initiator and target devices are logged into the fabric, query the name server to see if they have registered themselves correctly. (command - show fcns database vsan X)

e. Most important configuration to check on Storage Switch is the zoning

Zoning is basically access control for our initiator to  targets. Most common design is to configure one zone per initiator and target.

Zoning will require you to configure a zone, put that zone into your current zonset, then ACTIVATE it. (command - show zoneset active vsan X)

 

Step 3: Check at Storage Array

When the Storage array logs into the SAN fabric, it queries the name server to see which devices it can communicate.

LUN masking is crucial step on Storage Array which gives particular host (server) access to specific LUN

Assuming that both the storage and initiator have FLOGI’d into the fabric and the zoning is correct (as per Step 1 & 2)

Following needs to be verified at Storage Array level

a. Are the wwpn of the initiators (vhba of the hosts) visible on the storage array?

b. If above is yes then Is LUN Masking applied?

c. What LUN number is presented to the host - this is the number that we see in Lun ID on the 'Boot Order' of Step 1

 

Below document has details and troubleshooting outputs:

http://www.cisco.com/c/en/us/support/docs/servers-unified-computing/ucs-b-series-blade-servers/115764-ucs-san-tshoot-00.html

Hope this answers your question.

Thanks,

Vishal 

Thanks for that very detailed answer, Vishal.  Adding on to my question, could you also provide some common SAN Boot Failure Scenarios?

 

Thank you,

Evan

Hi Evan,

 

Common cases we have seen with SAN Boot failures are mostly related with mis-configuration.

If correct order of configuration is followed than failures can be avoided.

However below are the common failure cases we have seen in TAC cases:

 

For FC:

1. Incorrect Target WWNs specified in the Service Profile

2. Zoning mis-configured on SAN switches

3. LUN Masking incorrect on Storage Arrays

4. Boot order in boot policy in Service Profile set incorrectly

5. VSAN/FCoE-VLAN misconfiguration

6. Association of FC uplinks to correct VSAN

7. FC ports across should be in correct mode (F, NP, N, E modes)

8. Not using the correct OS drivers.

 

For iSCSI

1. Incorrect target IQN and/or IP

2. On the storage side, LUN assignment to the incorrect initiator IQN and/or  IP

3. LUN masking issues

4. Not making the iSCSI vnic vlan as native

5. Configuring the wrong vlan

6. Not allowing the correct vlan on the upstream switches.

7. Not using the correct OS drivers.

 

Let me know if we need to add further ?

Thanks

I have Cisco UCS Manager - 2.1(3a) and EMC VNX 5300 which direct attached to Fabric Interconnect A and B through FCOE I configured Local Zoning and I have two Servers

Server 1 has Local Disk and boot from and work fine

Server 2 Configured to boot from SAN the problem every Initiator can see the target two times in FC Zones 

I use Vmware 5.0

UNIV-FI-A-A(nxos)# show zones
zoneset name ucs-UNIV-FI-A-vsan-200-zoneset vsan 200
  zone name ucs_UNIV-FI-A_A_4_E1-B2-SP_E1-B2_vHBA-A vsan 200
    pwwn 20:00:00:25:b5:e1:b2:a1
    pwwn 50:06:01:64:3e:e0:58:9e

  zone name ucs_UNIV-FI-A_A_5_E1-B2-SP_E1-B2_vHBA-A vsan 200
    pwwn 20:00:00:25:b5:e1:b2:a1
    pwwn 50:06:01:6c:3e:e0:58:9e

  zone name ucs_UNIV-FI-A_A_8_E1-B2-SP_E1-B2_vHBA-A vsan 200
    pwwn 20:00:00:25:b5:e1:b2:a1
    pwwn 50:06:01:64:3e:e0:58:9e

  zone name ucs_UNIV-FI-A_A_7_E1-B2-SP_E1-B2_vHBA-A vsan 200

    pwwn 20:00:00:25:b5:e1:b2:a1
    pwwn 50:06:01:6c:3e:e0:58:9e
UNIV-FI-A-A(nxos)#

-------------------------------------------------------------------

UNIV-FI-A-B(nxos)# show zone
zone name ucs_UNIV-FI-A_B_4_E1-B2-SP_E1-B2_vHBA-B vsan 201
  pwwn 20:00:00:25:b5:e1:b2:b1
  pwwn 50:06:01:65:3e:e0:58:9e

zone name ucs_UNIV-FI-A_B_3_E1-B2-SP_E1-B2_vHBA-B vsan 201
  pwwn 20:00:00:25:b5:e1:b2:b1
  pwwn 50:06:01:6d:3e:e0:58:9e

no zone name ucs_UNIV-FI-A_B_8_E1-B2-SP_E1-B2_vHBA-B vsan 201
  pwwn 20:00:00:25:b5:e1:b2:b1
  pwwn 50:06:01:65:3e:e0:58:9e

zone name ucs_UNIV-FI-A_B_7_E1-B2-SP_E1-B2_vHBA-B vsan 201
  pwwn 20:00:00:25:b5:e1:b2:b1
  pwwn 50:06:01:6d:3e:e0:58:9e

 

UNIV-FI-A-B(nxos)#

 

Hi abdulhadi.ettwejiri

 

The reason you see two sets of zones for the same targets and initiator is because when you enable zoning on UCS and you assign a boot from SAN boot order policy, the system will automatically create a set of zones for your vHBAs with the targets associated to that policy .  In other words, if you know you want your servers to boot from SAN the only thing you need to do is to assign a boot from SAN boot order policy to service profile and the require zone will be created. 

 

If you want to test this on your servers, add an additional test target wwpn to your boot from SAN policy and you will see that a zone for this new target is created.

 

Let me know if this make sense or if you have any questions.

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: