cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
555
Views
5
Helpful
3
Replies

UCS IOM Link Loss effect on FC in Port-Channel mode

krdaymark
Level 1
Level 1

Hi,

Recently we have discovered that when UCS Server Ports are in port-channel mode, loss of a member link causes disruption to the FC traffic in a VMware environment.  VMware host never sees a path down and guest go into a IO timeout wait.  Once the guest timeout wait period has expired, a SCSI reset re-establishes connectivity to storage.  We have replicated at multiple customers and have been told that this is expected behavior.  This is of concern to us and how it affects design decisions.

Any input is appreciated.

Thank you.

1 Accepted Solution

Accepted Solutions

Walter Dey
VIP Alumni
VIP Alumni

It's true, that you have to wait on SCSI timeout, and then the OS should do a flogi again. I assume of course, that you have FC multipathing setup and working; therefore no disruption of the traffic.

Without port-channel, the vhba's on the failing fabric will go down; and FC multipathing is the only solution.

View solution in original post

3 Replies 3

Walter Dey
VIP Alumni
VIP Alumni

It's true, that you have to wait on SCSI timeout, and then the OS should do a flogi again. I assume of course, that you have FC multipathing setup and working; therefore no disruption of the traffic.

Without port-channel, the vhba's on the failing fabric will go down; and FC multipathing is the only solution.

Hi Walter,

Thanks for the confirmation.  We have spent a lot of time troubleshooting this and were surprised to find out that it's expected behavior.  Do you see customers moving away from port-channels on the IOM Server ports due to this?

Kevin

Hi again Walter,

Just re-read your response.  We are testing with Linux guests that have timeout set to 180 seconds by vmtools.  When we bring a port down, the host does not see a loss of path to the lun.  The guests all go into a timeout state with IO completely halted.  Once they hit 180 seconds, the guest sends a SCSI reset which then causes the host to recognize that the path is down and move to an active path.  The guests then resume IO.  If we lower the timeout in the guest, it shortens the IO halt successfully.

Decreasing the timeout on all guests is not an option.  Any vmtools update would revert it back.  So our thought is that we have to switch back to non-port-channel mode for all customers.  Unless they are comfortable with a potential 3 minute pause in production systems...

Do you have any thoughts on this?

Thanks!  It's great to find someone familiar with the issue.

Kevin

 

Review Cisco Networking for a $25 gift card

Review Cisco Networking for a $25 gift card