cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
8384
Views
90
Helpful
43
Replies

Ask the Expert: Switch and IOS Architecture and Unexpected Reboots on all Cisco Catalyst Switches

ciscomoderator
Community Manager
Community Manager
 

This session will provide an opportunity to learn and ask questions about Cisco Catalyst Switches IOS architecture, and how to troubleshoot any unexpected reboots and other errors on switches.

 

Ask questions from Monday, October 5 to Friday, October 16, 2015

Featured Experts

Ivan Shirshin is a customer support engineer in High-Touch Technical Services (HTTS). He is an expert on Routing, LAN Switching and Data Center products. His areas of expertise include Cisco Catalyst 2000, 3000, 4000, 6500,  Cisco Nexus 7000, ISRs, as well as Cisco routers ASR1000, 7600, 10000 and XR platforms. He has over 7 years of industry experience working with large Enterprise and Service Provider networks. Shirshin holds a CCNA, CCNP, CCDP, and CCIE (# 43481) in routing and swtiching, as well as XR specialist certifications. 

 

Naveen Venkateshaiah is a customer support engineer in High-Touch Technical Services (HTTS). He is an expert on Routing, LAN Switching and Data Center products. His areas of expertise include Cisco Catalyst 3000, 4000, 6500,  and Cisco Nexus 7000. He has over 7 years of industry experience working with large enterprise and Service Provider networks. Venkateshaiah holds a CCNA, CCNP, and  CCDP-ARCH, AWLANFE, LCSAWLAN Certification. He is currently working to obtain a CCIE in routing and switching.

 

Find other  https://supportforums.cisco.com/expert-corner/events.

** Ratings Encourage Participation! **
Please be sure to rate the Answers to Questions

 


 

43 Replies 43

Jessica Deaken
Level 1
Level 1

Hello,

We have WS-X6704-10GE  Module along with  WS-F6700-DFC3B, How to identify DFC-Equipped Module Has Reset on its own?

 

Thank you for your prompt response.

 

Jessica

 

Hi Jessica,

Thanks for raising this question.

If a Distributed Forwarding Card (DFC)-module has rebooted  on its own without user manual reload, you can check the bootflash of the DFC card in order to see if it crashed. If a crash information file is available, you can find the root cause of the crash.

Issue the dir dfc#module#-bootflash: command in order to verify if there is a crash information file and when it was written.

If the DFC reset matches the crashinfo timestamp, issue the more dfc#module#-bootflash:filename command.

We can also issue the copy dfc#module#-bootflash:filename tftp command in order to transfer the file via TFTP to a TFTP server.

cat6kSwitch#dir dfc#6-bootflash:
Directory of dfc#6-bootflash:/
-#- ED ----type---- --crc--- -seek-- nlen -length- -----date/time------ name
1   ..   crashinfo 2B245A6A   C24D0   25   261332 Sep 22 2014 21:35:25 crashinfo_
 20140922-204842

After you have the crashinfo file available, collect the output of the show logging command and the show tech command and contact you can reach to TAC support to further find the cause of this crash.

Let me know if you have any further doubt.

Regards,
Naveen Venkateshaiah.

Hi Naveen ,

 

One of our 6500 switch with sup32 has been reloaded automatically twice. And there were crashinfo files has been generated. In show version command output , the last reset was power on. We tried to use output intrepretor for crashinfo files and there were no relevant results for that. So should we have to raise TAC only for decoding  the files or as we are a partner privileged to use any tools to decode?

Thanks,

Tamil.

Hi Tamil,

Thanks for raising this question,

The Cisco Catalyst 6000/6500 Switches can unexpectedly reload due to an unknown cause. The output of the show version command displays a similar error message:
System returned to ROM by unknown reload cause - suspect
boot_data[BOOT_COUNT] 0x0, BOOT_COUNT 0, BOOTDATA 19 (SP by power-on)

This issue is documented in Cisco bug ID CSCef80423 (registered customers only) . Upgrade the switch to the latest Cisco IOS Software release unaffected by the bug in order to resolve this issue.

  • Normally I would recommend you to check the crash info from both SP and RP ,which is stored in sup-bootflash and boot flash respectively, Look for the respected Time stamp.
  • In the crash info you can also check the crash was due to SP or RP,based on which crashed first.

example:

From SP:
======

Mar 25 10:46:19.074 GMT: %C6K_PLATFORM-SP-2-PEER_RESET: SP is being reset by the RP    << Here Switch Processor is reset due to Route Processor.

Hence we have to look for RP crash info.

From RP:
======
Mar 25 10:46:11.166 GMT: %SYSTEM_CONTROLLER-3-ERROR: Error condition detected: TM_NPP_PARITY_ERROR
Mar 25 10:46:11.166 GMT: %SYSTEM_CONTROLLER-3-FATAL: An unrecoverable error has been detected. The system is being reset.

  • There are no tools available to decode the crash info for CCO access,I would request you to raise an SR with Cisco Tac and share both the crash info along with show tech support of the sup which was reloaded.

Let me know if you have any further doubt.

Regards,
Naveen Venkateshaiah.

 

Hi,

Can you please explain the IOS XE images naming convention and how to  identify what is the current version running on my switch?Is there any difference while we run the show commands on IOS and IOS-XE?

 

Regards

Dhiresh

Hi,

  • The naming convention has changed since there are many features that need to be highlighted. Let me give you an example for below  image name:

image name: cat4500e-universalk9.SPA.03.01.00.SG.15-01.SG

 cat4500e: Platform Designator.
 universal: Feature Set Designator.
 k9: Crypto Designator if crypto code is present in IOSd package.
 SPA: Indicates image is digitally signed.
 03.01.00.SG: IOS XE Release Version number.
 15.01.SG: IOSd package version number – this will allow you to correlate the version of IOSd   to another platform running classic IOS

Kernel Version:
=========
cat4500e#show version running
 Package: Base, version: 03.00.00, status: active
 File: cat4500e-basek9.SPA.03.00.00.pkg, on: Slot3
 From Bundle: cat4500e-universalk9.03.01.00.SG

Infrastructure Version
==============
Package: Infra, version: 03.00.00, status: active
File: cat4500e-infra.SPA.03.00.00.pkg, on: Slot3
From Bundle: cat4500e-universalk9.03.01.00.SG

IOSd Version
========
 Package: IOS, version: 150-1.SG, status: active
 File: cat4500e-universalk9.SPA.15-01.SG.pkg, on: Slot3
 From Bundle: cat4500e-universalk9.03.01.00.SG
 

  • As IOSd is fully integrated to IOS-XE, feature-level CLI  are quite identical , However there are few Changes do exist in the system management due to underlying Linux.

Below are the Few commands for example:

IOS Command IOS-XE Command   Comments on New CLI 
  • Show proc
  • Show proc cpu detailed iprocess iosd
  • Show proc cpu detailed
  • Identical to classic “show proc”. W/o detailed shows only Linux processes
  • Shows detailed description of all iosd and non-iosd processes across all CPU cores
 
  • Show proc memory
  • Show proc memory  
  • Show proc memory detailed process iosd   
  • Shows memory util of entire system
  • Shows memory usage by iosd process
 
  • Show proc cpu history
  • Show proc cpu history summary
  • Show proc cpu history
  • Shows overall CPU Utilization of the platform similar to single core classic IOS
  • Shows CPU utilization by core
 

 

Regards,
Naveen Venkateshaiah.

Ivan Petrov
Level 1
Level 1

Hello Ivan and Naveen,

We are getting this error message: “%CFIB-7-CFIB_EXCEPTION: FIB TCAM exception, Some entries will be software switched”

We could not find what is the meaning of this error in the documentation. Can you  let me know what it means and if there is anything I should do in order to stop receiving it?

Thank you,

Ivan

Hey Ivan I seen this before

heres the link explains It and what to do, go to the section ----FIB TCAM Exception

https://supportforums.cisco.com/document/59926/troubleshooting-high-cpu-6500-sup720

The error message indicates that number of route entries that are installed is about to reach the hardware FIB capacity or the maximum routes limit set for the specified protocol. If the limit is reached, some prefixes are dropped.

 

There is a workaround available. You need to reload the router in order to exit the exception mode.

Then enter the “mls cef maximum-routes” command in global configuration mode in order to increase the maximum number of routes for the protocol.

You should use the “show mls cef maximum-routes” command in order to check the maximum-routes. And use the “show mls cef summary” command, which shows the summary of CEF table information, in order to check the current usage.

 

Kind Regards,
Ivan

Raja_D
Level 1
Level 1

Hi Naveen & Ivan, 

We have an issue identified on cisco RSP8 (R7000) 7513mx chasis where the router has got 2 mpls links which went down both at the same time and did not return up untill we made the router to physically reboot suspecting that the router might have got hung up. Below is the error message that I have noticed when executed the "show ver" command. Does this error message correspond to hardware or ios issue ? Kindly clarify. 

System returned to ROM by processor memory parity error at PC 0x406A0D1C, address 0x0 at 02:42:04 

FYI.. This error remained still even after the router got rebooted. The both mpls links restored however the error remained in router. Does this error message can lead to any issues like making the router hung up again later,  if so kindly advise possible solution to overcome the issue. 

Thanks in advance.. 

Hi Ajar, 

 

The message indicates that there was a memory parity error in the processor DRAM. This is a problem related to hardware. 

Note that the message is related to the reason of last restart and it won't be cleared till next restart. It does not mean the problem is still occurring at this time.


There are two kinds of parity errors:
1. Soft parity errors
These errors occur when an energy level within the chip (for example, a one or a zero) changes. When referenced by the CPU, such errors cause the system to either crash (if the error is in an area that is not recoverable) or they recover other systems (for example, a CyBus complex restarts if the error was in the packet memory (MEMD)). In case of a soft parity error, there is no need to swap the board or any of the components. See the Related Information section for additional information about soft parity errors.

 

2. Hard parity errors
These errors occur when there is a chip or board failure that corrupts data. In this case, you need to re-seat or replace the affected component, which usually involves a memory chip swap or a board swap. There is a hard parity error when multiple parity errors occur at the same address. There are more complicated cases that are harder to identify. In general, if you see more than one parity error in a particular memory region in a relatively short period, you can consider it to be a hard parity error.

 

Studies have shown that soft parity errors are 10 to 100 times more frequent than hard parity errors. Therefore, Cisco highly recommends you to wait for a second parity error before you replace anything. This greatly reduces the impact on your network. 

 

If you see this error once, I recommend to monitor for 2-3 days. If the issue reoccurs, you need to replace DRAM on this card.

 

Kind Regards,

Ivan 

Kind Regards,
Ivan

Hi Ivan, 

Thanks for the info...


As of now the issue does not re-occur for this branch after physical reboot, but will go with your advise of replacing the processor DRAM on the router. 

Incase if the issue repeats then after replacing the DRAM does the "system parity error" should vanish from the output of "show ver" 

what are the possible commands that can be issued on this router cisco RSP8 (R7000) 7513mx chasis when this kind of "system parity error" generates on the router

Also please let me know what are the post checks with required commands that can be issued on this router incase we have the new processor DRAM placed in router ?

Please advise.. 

Hi Ajar, 

 

If this problem happened once and does not reoccur, then it is highly probable it was a soft parity and there is no need for action. Soft parities are very rare to be seen twice. 

You can read more about this kind of issues and difference between soft and hard parities here:

http://www.cisco.com/c/en/us/support/docs/routers/7200-series-routers/6345-crashes-pmpe.html#softvshard

Parities are either reported in "show ver", in crashinfo file if memory corruption led to a crash (file is saved in the flash or bootflash usually), or in SYSLOG - which you can check on your syslog server or with "show log" command.

System run parity check automatically on the bootup and during operation, so you do not need to do anything manually to test new memory.

 

Kind Regards, 

Ivan

Kind Regards,
Ivan

Thanks lot Ivan for the information.. 

Review Cisco Networking for a $25 gift card