07-18-2019 12:23 AM
Hi guys,
My UCS C460 installed wtih ESXi 6.0.0 Build 13635687(VMware-ESXi-6.0.0-9313334-Custom-Cisco-6.0.3.5.iso) is running some Linux and windows servers.After half years stable running,I began to get some PSOD with GP Exception 13.At the very start,it happened every 2 weeks.Then it happened more and more frequently, the server will need to restart every 2 days.
Each time,I will get a GP Exception 13 but the following message is different.I have replaced the mother board and memory card,nothing changed.So I believe it should not be a hardware issue.
I collect some error message but I can not locate the root cause.
My UCS C460M4's BIOS version is C460M4.2.0.12b.0.062120160920,and I have tried the lastest one before.
cpu Microcode Patch Revision is 0x0b00001d
ESXi 6.0.0 Build 3620759(VMware-ESXi-6.0.0-9313334-Custom-Cisco-6.0.3.5.iso)
Hardware:
2 X Intel(R) Xeon(R) CPU E7-8867 v4 @ 2.40GHz Type 0, Family 6, Model 79, Stepping 1
4 X 64G DDR4
3 X Intel Ethernet Server Adapter I350-T4
1 X Intel X540 10 Gbps Gbps Network Controller
1 X Intel(R) I 350 1 Gbps Network Controller
1 X Raid controller
I also connect some usb network adapter on the ucs.
Would anybody give me some help? This problem has been bothering me for months,any help would be appreciated.Thanks
07-18-2019 09:47 AM - edited 07-18-2019 09:49 AM
It could be issue with CPU 2... try replacing it...did you notice any caterr in logs for CPU?
07-18-2019 08:55 PM
07-18-2019 09:42 PM
It would be better if you can collect server CIMC tech-support. However, looking at the earlier PSOD screenshots, Its looks like issue with CPU 2
07-18-2019 09:48 PM
Unfortunately,this ucs is internal order,we don't have tech-support service contract.Which part in PSOD hint the CPU2 issue?I am not sure I should RMA again for replacing CPU.
07-18-2019 10:44 PM - edited 07-18-2019 11:49 PM
All the PSOD screenshots point to PCPU 64 which should be CPU 2 as per the given server configuration...however issue could be external to CPU 2 like DIMM modules managed by CPU 2 are misbehaving ... as you don't have the tech-support with you, following can be done
### Swap CPU 1 and CPU 2 and check if the error follows CPU 2 or not in PSOD events... if it follows CPU 2, you can replace CPU 2
07-19-2019 12:02 AM
Thank you very much for your detailed info,I will try your solution.
07-28-2019 05:31 PM
07-28-2019 06:19 PM
Greetings.
I would run the UCS diagnostic ISO through its tests and see if any thing for DIMMs or CPU is flagged. See https://software.cisco.com/download/home/286265859/type/286123307/release/6.0(2a)
Also, GP Exception 13 is not necessarily a hardware problem (although it can be).
According to VMware , Exception13, GPF occurs under one of these circumstances:
If the diagnostic tests turn up clean, then you may want to have VMware evaluate the dump files from the vmware support bundle from that host.
Kirk...
07-29-2019 07:32 AM
As Kirk mentioned, run a diagnostic test on the server... choose comprehensive test method....
Please note, Comprehensive tests can run for several hours or days. These tests run exhaustive burn-in tests on your server, such as stress tests.
Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: