09-28-2018 11:41 AM
We have a couple of blades where the
09-28-2018 12:17 PM
Greetings.
Unfortunately, you are correct, in that the UCSM does not currently have a way to disable individual alert types (like you can with ACI), aside from the complete blade fault suppression.
The b200m3/m4/m5 design does allow for the 2nd CPU to run a bit warmer due to airflow design (air passes over 1st cpu, before going past 2nd cpu).
Usually when we have customers bumping into the UNC range, which is meant to trigger the chassis fans to rev up, both the CPU load is high, and the ambient temps tend to be in the lower to mid 70's F.
Please log into the UCSM CLI via putty/ssh:
#connect cimc x/y (chassis#/blade#)
cimc#sensors
Look at he P1 and P2 _TEMP_SENS values, and see what is specified for the upper critical and upper non recoverable values (to get idea of what those ranges are).
In 4.01 code, it appears the Upper non-critical value is no longer defined, so I'm wondering if you would no longer get those type of UNC alerts... What UCSM and Blade firmware level are you on?
What is the TEMP_SENS_FRONT current values?
Thanks,
Kirk...
09-28-2018 01:33 PM - edited 09-28-2018 01:39 PM
Sensor Name | Reading | Unit | ... | LNR | LC | LNC | UNC| UC | UNR |
TEMP_SENS_FRONT | 24.000 | degrees C | OK | na | na | na | na | 75.000 | 85.000 |
TEMP_SENS_REAR | 47.000 | degrees C | OK | na | na | na | na | 75.000 | 85.000 |
GPU1_TEMP_SENS | na | degrees C | na | na | na | na | na | 162.000 | 170.000 |
P1_TEMP_SENS | 54.000 | degrees C | OK | na | na | na | na | 88.000 | 93.000 |
P2_TEMP_SENS | 84.000 | degrees C | UNC | na | na | na | na | 88.000 | 93.000 |
CIMC version is [ sensors ]# version: ver: 3.1(26g). UCS package is 3.2(3g)
LNC and UNC are all "na" - so why would get get spam about UNC?
.
I don't really care about the noncritical limits being removed, the UCS will still spam us about the UC and UNR, right? So unless both values can be removed or bumped up, there's still no way to suppress unless we suppress the whole blade?
09-28-2018 01:59 PM
Now that I see a 30 degree difference between the CPUs I have a new guess about what is happening: Are the front and rear heat sinks different part numbers? I have seen this before in other hardware with a similar layout where the waste heat from CPU1 has to cool CPU2.
On other blades in Position 8 I see only a 23 or 23 deg difference, and CPU1 is running at 50 instead of 54 deg
09-30-2018 08:07 AM
What's your front temp sensor showing on your blades closest to floor (i.e. high # slots like 7,8)?
Yes, there are different heatsink part #s for the front and back heatsinks on B200M4/M5 servers.
Can you confirm some other ambient temp readings in that same environment?
The 75 degree F reading on the front temp sensor seems like that's a bit on the warm side for a data center.
Thanks,
Kirk...
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide