11-24-2017 09:51 AM - edited 03-01-2019 01:22 PM
On Tuesday we got pink scree of death on our ESXi host. We tried consoling into the host and it rebooted with this pink screen below. Could someone tell me how to figure out whats happened or how to read this? Thanks
Model: UCSC-C240-M45SX
ESXi 5.5.0
BIOS version: C240M4.3.0.3a.0.0321172111
Solved! Go to Solution.
11-27-2017 04:32 AM
Hi Jasmine,
PSODs (Purple Screen of deaths) can always be tricky to troubleshoot. Following these steps can help you get to the correct answer :). At the end of this post I will upload a screenshot of how to read a PSOD.
1) Proper Log Collection
(Gather Screenshot - Which you did). Based on the screenshot we can see the following KB Articles from VMware regarding this, this gives us an idea what PF Exception 14 will do. We will need more information though
https://kb.vmware.com/s/article/1020181
* We will need to gather an ESXi log bundle!! (Below is how to do this)
An ESXi log bundle is a .tgz file generated by the automated log collection process
See KB 653 for the log collection process
NO VCENTER LOGS!!!!!!!!!! - VC logs are irrelevant to a crash. All they report is “Host was there and now it’s gone!”
* Once you gather the log bundles and you want to open a case with Cisco or VMware to find the root cause have these answers filled out when opening the case (It will get you a speedy resolution):
How widely spread is the issue? One host? Two? All hosts in the cluster? Only the new hosts?
When did the issue start? Just now, last week, or since install?
How often does this issue occur? Every day? Every week? Just this one time?
Any changes to the host or the environment recently?
Have you already run hardware diagnostics? If so, what was the result?
Was there any specific action that led to the crash or was it just sitting there?
Key for the screenshot above:
1.ESXi version and build
2.Exception and/or failure message
3.PTEs (only shown w/ exception 14 type crash)
4.CPU register info
5.PCPU & world generating the crash
6.Uptime
7.Address of frame in memory
8.Address of code in memory
9.The backtrace
10.Dump to disk is configured
11.Status of DiskDump
12.Dump to file is not configured
13.Availability of local debugging
Break down of the Purple Screen of Death error message:
Ex: Line at the top of our sample stack
0x4123c111db10:[0x4180262f8abb]LibAIODrainMergeQueue@vmkernel#nover+0x153 stack: 0x123c111db60
At the end of the day Cisco will use this information to analyze and look for hardware faults (Memory, CPU, Motherboard) failures. You also want to make sure your drivers (FNIC / ENIC) on the operating system are always up to date. You can find these drivers versions from going to the link below and navigating.
https://ucshcltool.cloudapps.cisco.com/public/#
Finding the root cause of a PSOD will require log bundles. If you do open a Cisco TAC case let me know and I will be happy to assist you.
If this post helped you please mark it as correct so other members are able to reference the information given here.
11-25-2017 04:46 AM
Questions
- is this a standalone or UCS managed server
- which UCS version
- which enic/fnic driver version on ESXi ?
- is it the only server crashing, or do you have others as well ?
- does the crash happen after some time, or immediately after reboot
- did you do a recent firmware upgrade ?
11-27-2017 04:32 AM
Hi Jasmine,
PSODs (Purple Screen of deaths) can always be tricky to troubleshoot. Following these steps can help you get to the correct answer :). At the end of this post I will upload a screenshot of how to read a PSOD.
1) Proper Log Collection
(Gather Screenshot - Which you did). Based on the screenshot we can see the following KB Articles from VMware regarding this, this gives us an idea what PF Exception 14 will do. We will need more information though
https://kb.vmware.com/s/article/1020181
* We will need to gather an ESXi log bundle!! (Below is how to do this)
An ESXi log bundle is a .tgz file generated by the automated log collection process
See KB 653 for the log collection process
NO VCENTER LOGS!!!!!!!!!! - VC logs are irrelevant to a crash. All they report is “Host was there and now it’s gone!”
* Once you gather the log bundles and you want to open a case with Cisco or VMware to find the root cause have these answers filled out when opening the case (It will get you a speedy resolution):
How widely spread is the issue? One host? Two? All hosts in the cluster? Only the new hosts?
When did the issue start? Just now, last week, or since install?
How often does this issue occur? Every day? Every week? Just this one time?
Any changes to the host or the environment recently?
Have you already run hardware diagnostics? If so, what was the result?
Was there any specific action that led to the crash or was it just sitting there?
Key for the screenshot above:
1.ESXi version and build
2.Exception and/or failure message
3.PTEs (only shown w/ exception 14 type crash)
4.CPU register info
5.PCPU & world generating the crash
6.Uptime
7.Address of frame in memory
8.Address of code in memory
9.The backtrace
10.Dump to disk is configured
11.Status of DiskDump
12.Dump to file is not configured
13.Availability of local debugging
Break down of the Purple Screen of Death error message:
Ex: Line at the top of our sample stack
0x4123c111db10:[0x4180262f8abb]LibAIODrainMergeQueue@vmkernel#nover+0x153 stack: 0x123c111db60
At the end of the day Cisco will use this information to analyze and look for hardware faults (Memory, CPU, Motherboard) failures. You also want to make sure your drivers (FNIC / ENIC) on the operating system are always up to date. You can find these drivers versions from going to the link below and navigating.
https://ucshcltool.cloudapps.cisco.com/public/#
Finding the root cause of a PSOD will require log bundles. If you do open a Cisco TAC case let me know and I will be happy to assist you.
If this post helped you please mark it as correct so other members are able to reference the information given here.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide