12-29-2015 08:20 AM - edited 03-01-2019 12:31 PM
Hi Team
Can you please tell me how to read the ucs tech support logs to find out the hardware issues
Solved! Go to Solution.
12-29-2015 06:54 PM
Greetings.
Your question is probably something that is hard to tackle in this format.
As hardware and firmware is so rapidly changing, so are the various diagnostics capabilities (and how, when, where they write to logs). I think it would pretty much be impossible to maintain an updated comprehensive guide to tech support file content and how to interpret them.
With that being said, there are some common ones for both Blade/UCSM and stand alone C series servers such as the SEL logs that generally log major errors such as DIMM failures, HD failures, etc
There are guides for general troubleshooting such as http://www.cisco.com/c/en/us/td/docs/unified_computing/ucs/ts/guide/UCSTroubleshooting/UCSTroubleshooting_chapter_0111.html
There are also courses that map to the DCUCT 642-035 exam http://www.cisco.com/c/en/us/training-events/training-certifications/exams/current-list/dcuct.html that go into some details on looking at logs, among other things.
Are there certain hardware issues you are trying to look for? Do you have some kind of polling app that scans files you are trying to setup alerts for, etc?
Thanks,
Kirk
12-30-2015 12:24 AM
Hi
UCS is agnostic to the OS; however, this document
http://www.cisco.com/c/en/us/support/docs/servers-unified-computing/ucs-manager/116349-technote-product-00.html
provides you information how to extract OS drivers; which should match those found in
http://www.cisco.com/web/techdoc/ucs/interoperability/matrix/matrix.html
Walter.
12-29-2015 06:54 PM
Greetings.
Your question is probably something that is hard to tackle in this format.
As hardware and firmware is so rapidly changing, so are the various diagnostics capabilities (and how, when, where they write to logs). I think it would pretty much be impossible to maintain an updated comprehensive guide to tech support file content and how to interpret them.
With that being said, there are some common ones for both Blade/UCSM and stand alone C series servers such as the SEL logs that generally log major errors such as DIMM failures, HD failures, etc
There are guides for general troubleshooting such as http://www.cisco.com/c/en/us/td/docs/unified_computing/ucs/ts/guide/UCSTroubleshooting/UCSTroubleshooting_chapter_0111.html
There are also courses that map to the DCUCT 642-035 exam http://www.cisco.com/c/en/us/training-events/training-certifications/exams/current-list/dcuct.html that go into some details on looking at logs, among other things.
Are there certain hardware issues you are trying to look for? Do you have some kind of polling app that scans files you are trying to setup alerts for, etc?
Thanks,
Kirk
12-29-2015 11:19 PM
Hi Kirk
Thanks for providing complete details about tech support logs.
Thanks in Advance.
12-30-2015 12:24 AM
Hi
UCS is agnostic to the OS; however, this document
http://www.cisco.com/c/en/us/support/docs/servers-unified-computing/ucs-manager/116349-technote-product-00.html
provides you information how to extract OS drivers; which should match those found in
http://www.cisco.com/web/techdoc/ucs/interoperability/matrix/matrix.html
Walter.
01-04-2016 05:00 AM
Hi Kirk and Walter
Wish you happy new year 2016.
Thanks again for giving wonderful information and provided troubleshooting steps.
I will go through reference document and follow the informations.
12-30-2015 04:34 AM
Greetings.
We frequently see tickets/requests where the OS may have an old driver that needs to be updated, or some service/process simply crashed/froze in the OS itself, with no actual hardware issues.
For VMware it is super helpful to have an external syslog server configured and a dumpfile location defined as you will frequently get more diag info from the OS. See http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1000328.
As Walter mentioned, you want to have the correct drivers that match the firmware.
The primary drivers you are normally concerned with for our blade servers are local raid controller(LSI), enic, and fnic.
To see what your current driver is for those, run the following from a putty/ssh session with the ESXi mgmt IP:
#vmkload_mod -s fnic
#vmkload_mod -s enic
#vmkload_mod -s megaraid_sas
You may want to open a ticket with VMware to have someone see if there is anything in the logs pointing to software or hardware issues. You may want to open a TAC ticket to confirm you don't have hardware issues.
Thanks,
Kirk
12-30-2015 06:28 AM
Hello,
Like Kirk said, we do not have a cook book to show where to look for specific issues (and keep that reference up to date all the time) but this post is very helpful if you are planning on getting more familiar with the logs/files you will find in the show tech:
https://supportforums.cisco.com/document/66296/how-read-ucs-b-series-tech-support-files-ucsm-detail
HTH,
-Kenny
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide