cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
2161
Views
0
Helpful
9
Replies

UCS C460 wtih ESXi 6.0.0 got PSOD crash GP Exception 13 error

tianyizh
Cisco Employee
Cisco Employee

Hi guys,

My UCS C460 installed wtih ESXi 6.0.0 Build 13635687(VMware-ESXi-6.0.0-9313334-Custom-Cisco-6.0.3.5.iso) is running some Linux and windows servers.After half years stable running,I began to get some PSOD with GP Exception 13.At the very start,it happened every 2 weeks.Then it happened more and more frequently, the server will need to restart every 2 days.

Each time,I will get a GP Exception 13 but the following message is different.I have replaced the mother board and memory card,nothing changed.So I believe it should not be a hardware issue.

I collect some error message but I can not locate the root cause.

My UCS C460M4's BIOS version is C460M4.2.0.12b.0.062120160920,and I have tried the lastest one before.

cpu Microcode Patch Revision is 0x0b00001d

ESXi 6.0.0 Build 3620759(VMware-ESXi-6.0.0-9313334-Custom-Cisco-6.0.3.5.iso)

Hardware:

2 X Intel(R) Xeon(R) CPU E7-8867 v4 @ 2.40GHz Type 0, Family 6, Model 79, Stepping 1

4 X 64G DDR4

3 X Intel Ethernet Server Adapter I350-T4

1 X Intel X540 10 Gbps Gbps Network Controller

1 X Intel(R) I 350 1 Gbps Network Controller

1 X Raid controller

I also connect some usb network adapter on the ucs.

Would anybody give me some help? This problem has been bothering me for months,any help would be appreciated.Thanks

9 Replies 9

sankbhat
Cisco Employee
Cisco Employee

It could be issue with CPU 2... try replacing it...did you notice any caterr in logs for CPU?

I got some logs,but I am not sure which part is useful.Would you mind give me some help.Thanks a lo.

It would be better if you can collect server CIMC tech-support. However, looking at the earlier PSOD screenshots, Its looks like issue with CPU 2

Unfortunately,this ucs is internal order,we don't have tech-support service contract.Which part in PSOD hint the CPU2 issue?I am not sure I should RMA again for replacing CPU.

All the PSOD screenshots point to PCPU 64 which should be CPU 2 as per the given server configuration...however issue could be external to CPU 2 like DIMM modules managed by CPU 2 are misbehaving ... as you don't have the tech-support with you, following can be done

### Swap CPU 1 and CPU 2 and check if the error follows CPU 2 or not in PSOD events... if it follows CPU 2, you can replace CPU 2

Thank you very much for your detailed info,I will try your solution.

After swap cpu,4 days,I got the same PSOD page,seems it was not follow the cpu.Any advice I can do?Thanks again.

Greetings.

I would run the UCS diagnostic ISO through its tests and see if any thing for DIMMs or CPU is flagged.  See https://software.cisco.com/download/home/286265859/type/286123307/release/6.0(2a)

Also, GP Exception 13 is not necessarily a hardware problem (although it can be).

According to VMware , Exception13, GPF occurs under one of these circumstances:

  • The page being requested does not belong to the program requesting it (and is not mapped in program memory)
  • The program does not have rights to perform a read or write operation on the page

If the diagnostic tests turn up clean, then you may want to have VMware evaluate the dump files from the vmware support bundle from that host.

 

Kirk...

As Kirk mentioned, run a diagnostic test on the server... choose comprehensive test method....

 

Please note, Comprehensive tests can run for several hours or days. These tests run exhaustive burn-in tests on your server, such as stress tests. 

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: