07-18-2019 12:23 AM
Hi guys,
My UCS C460 installed wtih ESXi 6.0.0 Build 13635687(VMware-ESXi-6.0.0-9313334-Custom-Cisco-6.0.3.5.iso) is running some Linux and windows servers.After half years stable running,I began to get some PSOD with GP Exception 13.At the very start,it happened every 2 weeks.Then it happened more and more frequently, the server will need to restart every 2 days.
Each time,I will get a GP Exception 13 but the following message is different.I have replaced the mother board and memory card,nothing changed.So I believe it should not be a hardware issue.
I collect some error message but I can not locate the root cause.
My UCS C460M4's BIOS version is C460M4.2.0.12b.0.062120160920,and I have tried the lastest one before.
cpu Microcode Patch Revision is 0x0b00001d
ESXi 6.0.0 Build 3620759(VMware-ESXi-6.0.0-9313334-Custom-Cisco-6.0.3.5.iso)
Hardware:
2 X Intel(R) Xeon(R) CPU E7-8867 v4 @ 2.40GHz Type 0, Family 6, Model 79, Stepping 1
4 X 64G DDR4
3 X Intel Ethernet Server Adapter I350-T4
1 X Intel X540 10 Gbps Gbps Network Controller
1 X Intel(R) I 350 1 Gbps Network Controller
1 X Raid controller
I also connect some usb network adapter on the ucs.
Would anybody give me some help? This problem has been bothering me for months,any help would be appreciated.Thanks
07-18-2019 09:47 AM - edited 07-18-2019 09:49 AM
It could be issue with CPU 2... try replacing it...did you notice any caterr in logs for CPU?
07-18-2019 08:55 PM
07-18-2019 09:42 PM
It would be better if you can collect server CIMC tech-support. However, looking at the earlier PSOD screenshots, Its looks like issue with CPU 2
07-18-2019 09:48 PM
Unfortunately,this ucs is internal order,we don't have tech-support service contract.Which part in PSOD hint the CPU2 issue?I am not sure I should RMA again for replacing CPU.
07-18-2019 10:44 PM - edited 07-18-2019 11:49 PM
All the PSOD screenshots point to PCPU 64 which should be CPU 2 as per the given server configuration...however issue could be external to CPU 2 like DIMM modules managed by CPU 2 are misbehaving ... as you don't have the tech-support with you, following can be done
### Swap CPU 1 and CPU 2 and check if the error follows CPU 2 or not in PSOD events... if it follows CPU 2, you can replace CPU 2
07-19-2019 12:02 AM
Thank you very much for your detailed info,I will try your solution.
07-28-2019 05:31 PM
07-28-2019 06:19 PM
Greetings.
I would run the UCS diagnostic ISO through its tests and see if any thing for DIMMs or CPU is flagged. See https://software.cisco.com/download/home/286265859/type/286123307/release/6.0(2a)
Also, GP Exception 13 is not necessarily a hardware problem (although it can be).
According to VMware , Exception13, GPF occurs under one of these circumstances:
If the diagnostic tests turn up clean, then you may want to have VMware evaluate the dump files from the vmware support bundle from that host.
Kirk...
07-29-2019 07:32 AM
As Kirk mentioned, run a diagnostic test on the server... choose comprehensive test method....
Please note, Comprehensive tests can run for several hours or days. These tests run exhaustive burn-in tests on your server, such as stress tests.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide