We started upgrading our different domains to 2.1(3c) from 2.1(1a) to apply the patched fnic driver that was supposed to fix the issue. Seeing fnic aborts now and it's very similar to CSCuh61202 but the impact looks to be different. In one of our UCS domains, we didn't see any issues for 2+ weeks after upgrading. We got reports of high disk latency on VMs and we started digging. On another UCS domain upgrade, we saw the aborts start right after the upgrade was UCS completed but have not seen any signs of disk latency yet. We have four other UCS domains that were upgraded from 2.1(1a) to 2.1(3c) and do not see the issue in any of those domains.
I wanted to let you know the latest on our UCS engineering team's discovery regarding the above mentioned defects (CSCuh61202 & CSCuq40256).
CSCuq40256 was discovered to be a mis-programming of the register (on the IO Module) that controls how long the adapter should pause for when a pause request is sent from the IO Module to the adapter. Until a fixed release is available, resetting the DCE of the impacted blade has been confirmed by engineering to reprogram the register correctly [updated 9/26/2014].
Both defects have the same symptoms, but their respective root causes are unique.
Please monitor the status of CSCuq40256 for updates as they become available.