cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
7078
Views
0
Helpful
12
Replies

Fan Policy Override

markrosenecker
Level 1
Level 1

I have a C220 M4S server running firmware 4.02a.  I just installed a PCIe card that is not on the Cisco-approved list, and now my fans are running on a "Balanced" policy instead of "Low Power" (ie. quiet).  The card cannot be replaced (it's needed to connect to an external device).  I am OK with the risk of the card overheating...

 

 

...is there any way to OVERRIDE the override?  From what I've read, CIMC is detecting an unknown PCIe card, and therefore increases the cooling policy.  All other thermals within the server are well below even the warning threshold.

 

What I would like to do is tell it to ignore the unknown PCIe card and instead go back to adjusting the fans based on the other sensors in the server.  Is this possible?  

12 Replies 12

mojafri
Cisco Employee
Cisco Employee

Hi @markrosenecker,

 

So what you are thinking is exactly correct!! Since cimc don't recognize that card, it would spin the fan on high RPM. If actual fan policy & desired fan policy differs, this should be due to some devices in the server(Ex: GPU, additional PCI or 3rd part Devices etc). 

 

FAN Speed depends on HW configuration of the server. Minimum enforceable fan control policy is a function of server hardware configuration, e.g. the minimum enforceable policy for servers with GPUs installed is High Power or Maximum Power. CIMC can override configured FAN policy to higher policy based on situations.

 

.is there any way to OVERRIDE the override?

NO, can't be done manually.  

 

Rate if you find it helpful! 

 

Regards,

MJ

Mojafri, We ordered a new BE7H-M5-K9 that shipped with 2 Intel Ethernet Server Adapter I350-T4 PCI cards. The Fans seem to be maxed out speed and the Configured Fan Policy -> Configuration Status states "FAN POLIY OVERRIDE - Card(s) "Ethernet Server Adapter I350-T4" present. Are you saying the fans will run at high speed constantly and there is nothing we can do? 

 

What are the actual fan RPMs?

Connect to CIMC via ssh

#connect debug-shell

#sensors

Should give an output similar to:

MOD1_FAN1_SPEED | 10504.000 | RPM | OK | 808.000 | na | na | na | na | na |
MOD1_FAN2_SPEED | 10192.000 | RPM | OK | 784.000 | na | na | na | na | na |
MOD2_FAN1_SPEED | 10504.000 | RPM | OK | 808.000 | na | na | na | na | na |
MOD2_FAN2_SPEED | 10192.000 | RPM | OK | 784.000 | na | na | na | na | na |
MOD3_FAN1_SPEED | 10100.000 | RPM | OK | 808.000 | na | na | na | na | na |
MOD3_FAN2_SPEED | 10682.000 | RPM | OK | 784.000 | na | na | na | na | na |
MOD4_FAN1_SPEED | 10504.000 | RPM | OK | 808.000 | na | na | na | na | na |
MOD4_FAN2_SPEED | 10682.000 | RPM | OK | 784.000 | na | na | na | na | na |
MOD5_FAN1_SPEED | 10504.000 | RPM | OK | 808.000 | na | na | na | na | na |
MOD5_FAN2_SPEED | 10682.000 | RPM | OK | 784.000 | na | na | na | na | na |
MOD6_FAN1_SPEED | 10504.000 | RPM | OK | 808.000 | na | na | na | na | na |
MOD6_FAN2_SPEED | 10192.000 | RPM | OK | 784.000 | na | na | na | na | na |

Also note the value for: TEMP_SENS_FRONT :

TEMP_SENS_FRONT | 27.000 | degrees C | OK | na | na | na | na | 45.000 | 50.000 |

Fan policy of 'balanced' will normally not suffice when multiple PCI-E cards are present, and you mention your config includes 2 of them.

I don't believe those particular intel cards provide out of band thermal information to the CIMC, so the CIMC has to generally play it 'safe' and keep the fans in the upper ranges.

I believe the M5 fans can hit 20,000 RPMs, so am curious as to what yours are actually running at.

Please see:

https://www.cisco.com/c/en/us/support/docs/servers-unified-computing/ucs-c-series-rack-servers/214478-ucs-c-series-m5-server-components-relati.pdf

Kirk...

 

Kirk, here is the output as you asked. I am not sure what the normal is, but i can say that the fans are extremely loud. Probably 4x times the other servers in the rack (various older models). After a reboot of the CIMC and/or Host, the sound is normal for a few minutes, but then ramps up to the near deafening levels!

MOD1_FAN1_SPEED | 12120.000 | RPM | OK | 808.000 | na | na | na | na | na |
MOD1_FAN2_SPEED | 12348.000 | RPM | OK | 784.000 | na | na | na | na | na |
MOD2_FAN1_SPEED | 11514.000 | RPM | OK | 808.000 | na | na | na | na | na |
MOD2_FAN2_SPEED | 12348.000 | RPM | OK | 784.000 | na | na | na | na | na |
MOD3_FAN1_SPEED | 12120.000 | RPM | OK | 808.000 | na | na | na | na | na |
MOD3_FAN2_SPEED | 11760.000 | RPM | OK | 784.000 | na | na | na | na | na |
MOD4_FAN1_SPEED | 12120.000 | RPM | OK | 808.000 | na | na | na | na | na |
MOD4_FAN2_SPEED | 11760.000 | RPM | OK | 784.000 | na | na | na | na | na |
MOD5_FAN1_SPEED | 11514.000 | RPM | OK | 808.000 | na | na | na | na | na |
MOD5_FAN2_SPEED | 11760.000 | RPM | OK | 784.000 | na | na | na | na | na |
MOD6_FAN1_SPEED | 11514.000 | RPM | OK | 808.000 | na | na | na | na | na |
MOD6_FAN2_SPEED | 11760.000 | RPM | OK | 784.000 | na | na | na | na | na |

So based on the RPMs, those are running about 55-60% duty cycle, which is probably normal for the M5s with a couple of un-managed PCI-E cards.

The cooling requirements for the M5 is definitely different than the M4s.

I'm not sure if there is a published expected DB level for the different duty cycle ranges...

There is a bug published, https://bst.cloudapps.cisco.com/bugsearch/bug/CSCvj21242/?rfs=iqvred that noted that the default power/cooling policy was set to low for M4s but balanced for M5s.  I think the M4/M5s have around the same noise level when running at the low power profile, but the M5 hardware options have  higher potential cooling needs as well as higher RPM capable fans.

 

Kirk...

This is a little surprising to me, the Server was shipped from Cisco with the network cards pre-installed. Is there any way to make the PCI cards "managed". The noise level makes working in the server room for more than 5 minutes unbearable, plus we have to yell to hear each other. The customer instantly complained, and I imagine other complaints will come in as well.

Managed vs non-managed is a feature set of a particular component.  Those particular Intel cards do not have a out of band mechanism (such as I2C) built in to provide thermal information up to the CIMC such as do most RAID cards or the CISCO VIC cards that do.

It's not an overlooked setting that was missed.

 

Kirk...

Thanks for that Info, that part makes sense. What doesn't make sense is that when you order a BE7H-M5-K9, there is no option to choose a different card(s). The Intel I350s are auto added to the configuration. 

Since that's a call manager appliance server, they lock down the various options you have vs a standard C240M5 that you could install/order wide variety of NIC/adapters in.

Appliances typically have a much narrower focus and have passed a lot of validation checks against that specific set of hardware.

Kirk...

An update on this. The room temperature was above 79° F, which according to a TAC case was too high for the server's liking. Sure enough, when the customer lowered the temperature by a couple of degrees, the fan speed also lowered somewhat. It seems the new servers are more sensitive to the room temperatures.

Yeah, 79 is way high for data center type equipment.  Most datacenters are in the 60's, very low 70's max.

If the fans didn't accommodate for the overly high ambient temps, you would be stressing your internal components more.

As previously mentioned, you can get a sense of the ambient temps remotely from the sensor output for TEMP_SENS_FRONT.

Agree that the M5s have more cooling requirements than the M4s, and the fan duty cycle levels will get raised quicker if the ambient temps are high like that.

 

Kirk...

 

Matt Y
Level 1
Level 1

Same issue... temperatures are not even factored in.  Replacing my Cisco w/HP.

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: