01-29-2014 06:34 PM - edited 03-01-2019 11:29 AM
I've recently purchased a Cisco UCS C220 M3 chassis bundle and at the moment am in a testing phase.
As soon as it was unboxed, I upgraded it to the latest HUU bundle - 1.5(4)3 and applied all the updates. It wasn't far behind anyway - but I wanted to test with the latest builds to start with. So much easier to upgrade now than later :-)
For a few days the box was quiet and the fans ran at a reasonably quiet speed. The fans worked fine - but at a slow speed, and it was quite tolerable. Even after installing XenServer and installing a guest VM it continued to run quietly for some time.
However in the past 24 hours the fan speeds have gone through the roof. The box is literally screaming now. What's odd is that there is almost no load on the system, and the environment is cool. Multiple other devices in the rack are reporting normal temperatures of around 25-30 deg C. The chassis itself is cool to touch. Even the CIMC is reporting temperatures all in the 30's.
ucs-cimc /sensor # show temperature
Name Sensor Status Reading Units Min. Warning Max. Warning Min. Failure Max. Failure
------------------------- -------------- ---------- ---------- ------------ ------------ ------------ ------------
P1_TEMP_SENS Normal 34.0 C N/A 74.0 N/A 79.0
P2_TEMP_SENS Normal 34.0 C N/A 74.0 N/A 79.0
RISER1_INLET_TMP Normal 35.0 C N/A 60.0 N/A 70.0
RISER2_INLET_TMP Normal 33.0 C N/A 60.0 N/A 70.0
RISER1_OUTLETTMP Normal 38.0 C N/A 60.0 N/A 70.0
RISER2_OUTLETTMP Normal 33.0 C N/A 60.0 N/A 70.0
FP_TEMP_SENSOR Normal 30.0 C N/A 60.0 N/A 70.0
DDR3_P1_A1_TEMP Normal 33.0 C N/A 65.0 N/A 85.0
DDR3_P2_E1_TEMP Normal 32.0 C N/A 65.0 N/A 85.0
PSU1_TEMP Normal 28.0 C N/A 60.0 N/A 65.0
PSU2_TEMP Normal 30.0 C N/A 60.0 N/A 65.0
PCH_TEMP_SENS Normal 47.0 C N/A 80.0 N/A 85.0
ucs-cimc /sensor #
ucs-cimc /sensor # show fan
Name Sensor Status Reading Units Min. Warning Max. Warning Min. Failure Max. Failure
-------------------- -------------------- ---------- ---------- --------------- --------------- --------------- ---------------
FAN1_TACH1 Normal 10272 RPM 1712 N/A 1284 N/A
FAN1_TACH2 Normal 9844 RPM 1712 N/A 1284 N/A
FAN2_TACH1 Normal 10272 RPM 1712 N/A 1284 N/A
FAN2_TACH2 Normal 9844 RPM 1712 N/A 1284 N/A
FAN3_TACH1 Normal 9844 RPM 1712 N/A 1284 N/A
FAN3_TACH2 Normal 9844 RPM 1712 N/A 1284 N/A
FAN4_TACH1 Normal 10272 RPM 1712 N/A 1284 N/A
FAN4_TACH2 Normal 9844 RPM 1712 N/A 1284 N/A
FAN5_TACH1 Normal 10272 RPM 1712 N/A 1284 N/A
FAN5_TACH2 Normal 9844 RPM 1712 N/A 1284 N/A
ucs-cimc /sensor #
ucs-cimc /chassis # show fan-policy
Fan Policy
---------------
low-power
ucs-cimc /chassis #
[NB: fan-policy was set to 'Balanced' but is now set to 'Lower-Power' - but changing this made no difference.]
I've seen a few other postings with similar prbolems to this going back some time, and they almost all referred to upgrading to the latest firmware which has all the fixes for this sort of behaviour. But in my case I've already done that. A full cold reboot also hasn't helped.
Has anyone got any other ideas on what could be wrong or what the cause could be? It seems like a bug of some sort but...
Thanks,
Reuben
Solved! Go to Solution.
01-31-2014 04:59 AM
Reuben,
I understand your point and your concern but to give you a better idea, fan speed can actually go up to 17000 RPMs, based on that, your server's fans are still in a mid range and to me it only looks like the heat is probably not being that effectively dissipated when the door is closed; that is the reason why I mentioned our R series racks that are tested with UCS and the door has actually a mesh that assures the heat is properly moved out.
Rate ALL helpful answers.
-Kenny
01-30-2014 03:35 AM
By pure chance I may have gotten almost to the bottom of this. It seems the issue may have related to airflow or temperature around the very front of the chassis near the front panel (where the KVM connector is). By having the rack door open, the server soon calmed right down within a couple of minutes and the fan speeds have dropped to 1/3 of what they were. It has been steady this way for the past 2 hours now.
There must be something in the front panel of these units that is monitoring something that doesn't show up in the CIMC, as the temperature table above still has roughly the same values. Maybe another temperature sensor in the front panel?
This:
...suggests likewise, that there's something in the front header near the KVM port, that is involved in fan speed regulation.
Having the (glass) door open isn't a good long term solution but it at least now gives me a pretty good idea how to work around the problem. Maybe a mesh door may be a better long term fix.
The Tech Specs state that a gap of 76mm is required, I haven't measured yet but it must be reasonably close (there certainly is a gap).
But I'd be curious to know what exactly in the chassis is causing this behaviour :-)
01-30-2014 07:38 AM
Reuben,
We definitely have sensors all over the C220 that will regulate the temperature/fan speed.
The Fan Speed policy will be "ignored" if the server requires higher fan speed to cool down. This is expected behavior as it is obvious that it is better to ignore the configured policy in case the server is getting hot. The server itself does not have to be boiling on the surface to increment the fan speed; as long as any of the sensors in the server detects a high temperature, the fan speed will be incremented until that sensors lowers the alarm/temperature.
In regards to the rack itself, have you seen our R-Series?
http://www.cisco.com/en/US/products/ps11518/index.html
http://www.cisco.com/en/US/docs/unified_computing/ucs/hw/rack_power/installation/guide/power.html
Rate ALL helpful answers.
-Kenny
01-30-2014 05:55 PM
Hi Kenny
The question is though, none of the visible sensors actually appeared to be reporting a high temperature, so what exactly would have been triggering the fans to run at high speed? Without me dismantling the hardware to find out I'm interested to know what input it was that was likely causing this to occur, because as far as I can tell based on the outputs above, it didn't appear that the server was actually hot. There are no devices directly above or below the chassis and I thought there was plenty of airflow but..
Are there undocumented sensors in the chassis or more sensor inputs that just don't show up in the CIMC?
Obviously the more I can understand about how this works the easier it will be for me (and others) to avoid this in future.
Thanks,
Reuben
01-30-2014 06:04 PM
Reuben,
There are definitely more than just one sensor in the server but they all report to CIMC.
When you close the door, does it really go up again? I mean, it would be interesting to confirm this was not just a coincidence.
-Kenny
01-30-2014 06:13 PM
Yes it's entirely reproducible. Close the front door and within 60 seconds the fans all start cranking up. It's a glass door so it properly seals the noise and air around it :-)
The same phenominon doesn't occur with the rear door though, only the front door.
I understand there's more than one sensor - the output from the show commands above shows quite a few. But as I keep saying none of them seem to indicate anything is amiss, so if it wasn't one of those sensors that was showing a high temperature, what exactly was it that was signalling to the system to turn the fans up?
01-31-2014 04:59 AM
Reuben,
I understand your point and your concern but to give you a better idea, fan speed can actually go up to 17000 RPMs, based on that, your server's fans are still in a mid range and to me it only looks like the heat is probably not being that effectively dissipated when the door is closed; that is the reason why I mentioned our R series racks that are tested with UCS and the door has actually a mesh that assures the heat is properly moved out.
Rate ALL helpful answers.
-Kenny
01-31-2014 05:36 AM
Ok, thanks for your help and explanations, Keny, I'll proceed down the wire mesh door option for now and remember this for future installations.
01-31-2014 05:41 AM
Great Reuben, it was nice talking to you. Have a cool day!
-Kenny
07-03-2015 02:50 AM
Hi all , i have one customer with this problem too since i install it ( 8 months ago):The fans make a big noise out of the ordinary, as if the server was always booting. For about 15 seconds the noise increases and then slows, increases and slows down, always like a loop. I Made software upgrade to the latest version 2.0 (6) ucs-c220-hoo-2.0.6d.iso and the problem keeps. My question is whether this is a normal and known problem and can lead to more serious damage server. The customer dont like this noiseThe temperature of the room is good and the temperature sensors and fans are all ok.
Best Regards
07-03-2015 01:34 PM
Open a TAC case
-Kenny
03-11-2018 12:52 AM
We have the same problem ... CPU fan speed is very very high and server is very very loud.
Sensor Name
|
Sensor Status
|
Speed (RPMS)
|
Warning Threshold Min
|
Warning Threshold Max
|
Critical Threshold Min
|
Critical Threshold Max
|
FAN1_TACH1
|
Normal
|
16000
|
1600
|
N/A
|
1200
|
N/A
|
FAN1_TACH2
|
Normal
|
18400
|
1600
|
N/A
|
1200
|
N/A
|
FAN2_TACH1
|
Normal
|
16000
|
1600
|
N/A
|
1200
|
N/A
|
FAN2_TACH2
|
Normal
|
18400
|
1600
|
N/A
|
1200
|
N/A
|
FAN3_TACH1
|
Normal
|
14100
|
1600
|
N/A
|
1200
|
N/A
|
FAN3_TACH2
|
Normal
|
17100
|
1600
|
N/A
|
1200
|
N/A
|
FAN4_TACH1
|
Normal
|
14100
|
1600
|
N/A
|
1200
|
N/A
|
FAN4_TACH2
|
Normal
|
16000
|
1600
|
N/A
|
1200
|
N/A
|
FAN5_TACH1
|
Normal
|
14100
|
1600
|
N/A
|
1200
|
N/A
|
FAN5_TACH2
|
Normal
|
16000
|
1600
|
N/A
|
1200
|
N/A
|
FAN6_TACH1
|
Normal
|
14100
|
1600
|
N/A
|
1200
|
N/A
|
FAN6_TACH2
|
Normal
|
16000
|
1600
|
N/A
|
1200
|
N/A
|
If you see, the fan speed is 18000+ and sometime 20000+ ... we have the latest upgrade 3.0(3f).
Please, really need some help
03-11-2018 05:38 AM - edited 03-11-2018 05:45 AM
Greetings.
Aside from a CIMC memory leak that predates the code version you are running, there are some common factors that are usually involved with the fans staying at high speeds:
This is something that would normally require reviewing a tech support from the server, and opening a TAC case.
Thanks
Kirk...
03-11-2018 05:41 AM
Thx for quick reply ... I will check those thermal profile ...
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide