Solved: Re: Gracefully shutdown servers and major faults

PerseusDK · ‎09-16-2013

Hello everyone...

I was wondering what is the rationale behind the fact, that when you gracefully shutdown a baremetal host running on UCS B-series, we get major faults raised, as the vNICs and vHBAs are down (which is expected...).

See the attached file, as I for some reason can't insert the image in the post.

Jeffrey Foster · ‎09-16-2013

Hello,

Thank you for the question. The rationale behind this is that when a Service Profile is associated a Virtual Interface (VIF) is established between each interface (vNIC or vHBA) on the server through the fabric to the Fabric Interconnect. As the environment is rebooted or shutdown, the link will be dropped as the adaptor will be offline, and a fault is raised (and then shortly thereafter the fault is cleared as the VIF is re-established) in UCS Manager. UCS Manager monitors all of the hardware including fabric interconnects, IO Modules, Chassis and servers up to and not including the operating system, and there is no view from UCSM to what is happening within the operating system when faults are raised. You would need a higher level Network Management System to help provide the correlation between the hardware and operating system. If these are actions are being done in a maintenance window, you can use a new feature added in UCSM 2.1.1a called Fault Suppression, which will suppress these transient faults for a specified chassis, blade, or rack mount server in UCS Manager, assuming you do not want to receive these fault instances during the event of a planned reboot. It will only raise the faults if the fault is still occurring after the Fault Suppression window has passed.

Jeff

View solution in original post

kg6itcraig · ‎09-16-2013

This is a profoundly anoying feature. These are the same errors that would occur on a blade if an FI was down. Believe that rationale is "it is down". Still would rather not see this as a major fault when the blade is powered off.

Craig

My UCS Blog http://realworlducs.com

Jeffrey Foster · ‎09-16-2013

Hello,

Thank you for the question. The rationale behind this is that when a Service Profile is associated a Virtual Interface (VIF) is established between each interface (vNIC or vHBA) on the server through the fabric to the Fabric Interconnect. As the environment is rebooted or shutdown, the link will be dropped as the adaptor will be offline, and a fault is raised (and then shortly thereafter the fault is cleared as the VIF is re-established) in UCS Manager. UCS Manager monitors all of the hardware including fabric interconnects, IO Modules, Chassis and servers up to and not including the operating system, and there is no view from UCSM to what is happening within the operating system when faults are raised. You would need a higher level Network Management System to help provide the correlation between the hardware and operating system. If these are actions are being done in a maintenance window, you can use a new feature added in UCSM 2.1.1a called Fault Suppression, which will suppress these transient faults for a specified chassis, blade, or rack mount server in UCS Manager, assuming you do not want to receive these fault instances during the event of a planned reboot. It will only raise the faults if the fault is still occurring after the Fault Suppression window has passed.

Jeff