03-22-2021 10:01 PM
Hello,
I have an old Cisco UCS C220 M3 server which was stable for a couple weeks and has become increasingly unstable, to the point it no longer survives 10 minutes booted into an OS. There are no patterns to diagnose, just a very sudden power off that happens at no particular point, even during post, and a suspicious lack of errors. The status shows as fully healthy with everything listed, and absolutely nothing logged except for one "normal" message being spammed in the system event log. This is pretty concerning because CIMC should log something on shutdown no matter what.
"normal" message being spammed: "MAIN_POWER_PRS: Presence sensor, Device Removed / Device Absent was asserted"
Here is the idea of the spam quantity:
Inventory:
Bios and CIMC are updated to the latest 3.x, other devices are also updated to there respective versions.
BTW, I'm attempting to run such an old system because I picked up this server + part server for 200USD.
Thanks in advanced.
03-23-2021 05:10 AM
Normally, TAC case would take a look at support bundle, and look at other logs in addition to the SEL logs, for evidence of power issues, thermal shutdown, OS triggered shutdown/crash.
You might want to try reseating components, DIMMs, CPUs, PCI-E cards, PSUs, etc
Some of the CIMCs (don't have a M3 to be able to check), have ability to record both the startup/reboot sequence, as well as 'Crash recording', which may capture console output/errors as this is going down. On M5s, this is under CIMC's 'Compute/troubleshooting' area.
Kirk...
03-23-2021 11:35 PM
I have looked at the console logs, and no errors are logged OS side, I have watched it crash/power off multiple times on the KVM and it just flashes to the No Signal screen.
I've only reseated the power supply so far, I will also reseat the RAM and test each CPU individualy.
03-25-2021 06:28 PM
Reseating and testing all the ram and CPU's did not help, tested a third power supply that had the same problem, leaving it to most likely the motherboard, which is the only part I don't have a working spare of. Also, the problem is rapidly degrading and on average it no longer gets past the post screen most of the time.
09-14-2022 05:43 PM
Did you ever solve your problem? Facing a similar issue with a c220-m5 where it randomly shuts off and there are no significant messages in the logs.
09-15-2022 01:41 PM
If you're experiencing this issue on a C220-M5 server, I recommended opening a TAC case as the M5's are still within support.
09-14-2022 07:22 PM
Nope
I now just have the server parts split across some others I picked up, never managed to source a replacement motherboard to test.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide