We went through a firmware upgrade on our UCS environment from 2.1.1e to 2.1.2c on Thursday. The updates appeared to go well, but almost immediately, we've been seeing storage performance issues on 5 of our 9 blades (ESXi 5.1U1 servers). The issue seems to be isolated to the HBA on fabric A for the 5 hosts in question. The HBA for fabric B seems OK.
I'm hoping for some tips on how to troubleshoot this from the UCS side. I've been able to verify that everything is unchanged and appears fine everywhere in the chain from the blades through to the storage array, but I just don't have solid tools for troubleshooting an issue like this.
We've got 2x 6248UP talking to a 5548UP which talks to a VNX5500 via FC. Our 9 blades (B200M3) are spread across 3 chassis (3 / chassis). Blades with problems are in all three chassis, so it's not isolated to specific ones. I have a VSAN for each fabric, connected to an 8GB FC port on each SP (i.e. VSAN 200 talks to port 0 on SPA and SPB, and VSAN 201 talks to port 1 on SPA and SPB). Each FI uses a 2 port portchannel for FCoE traffic to the 5548.
I appear to have only two symptoms that are visible to me:
1) ScsiDeviceIO failures in ESXi logs. On affected systems, these are happening several times/second. It appears that IO eventually goes through, but performance is degraded.
2) PowerPath reports path failures in the ESXi logs and errors counters via rpowermt.
I'm able to put the HBA for fabric A into standby mode using powerpath to force all IO to fabric B and issues appear to clear (no ScsiDeviceIO errors, no path failures), so we're still functional.
There are no errors in UCS manager, nothing visible on the storage array or 5548 switch.
I will likely be opening a support case for this issue, but ahead of or alongside that, can any provide some feedback on how to clearly troubleshoot conditions where FCoE HBAs or connections appear to be having issues, but aren't completely down? I'd like to strengthen my knowledge in this area as it's my weakest in managing our environment and I don't like being in a position where I'm unable to help myself.
Step by Step Process to Find UCS Drivers and Install on UCS-C series(Rack Servers)
How to find Compatible Drivers:
Use Cisco UCS HCL Tool to Find the Drivers:
Select the options in the drop down as p...
Cisco UCS Platform Emulator, Release 4.2(2aPE1)
CONFIGURATION IMPORT NOTE: Importing configuration backups (All, System, or Logical) taken from the UCS Platform Emulator (UCSPE) to physical UCS Manager domains is not recommended or supported...
Our partnership with NetApp is stronger than ever with joint engineering development of the new FlexPod XCS. Introducing a new level of visibility and automation that can help propel your journey to hybrid cloud. The Cisco Intersight™ cloud operations p...
There is a prevalence of network threats across the world, and the rate, at which these threats continue to emerge, is more rapid than ever before. Cisco Next Generation Firepower (CNGFW) security technologies is information security service that provides...
Upload the Script to Datastore
1. Browse to any Datastore of ESXi Host2. Upload the folder "lsigetvmware_022817" ex. Suppose the datastore name is "Datastore3"
Run the Commands
Enable SSH of ESXi Host
SSH to ESX...