cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
888
Views
0
Helpful
0
Replies
mtp337
Beginner

Issues with TAC support procedures for router hangs.... =(

Folks,

I have a problem with the direction TAC is asking me to take:

They essentialy want me to make changes to  the config register, and setup the router to repond to the console during a crash. The issue I have is we have only one router and its production all the time.  Sitting waiting for it to impact our production network isn't a great solution. 

We are running a relatively newer version of code:

c2900-universalk9-mz.SPA.152-4.M1.bin

Is there any other way dto proceed?  Would anyone be able to speak of moving to different software instead of being a guinea pig?  I suppose we could have bad hardware, but I doubt it.  Should I open a ticket to have Cisco help us find more stable hardware?

Exact steps recomeneded below:

                                                               Procedure to be performed when a device stops responding

The procedure has 3 parts, one has to be done when the router is working properly (no network impacting), so you need to enter the 3 commands: config-register , scheduler allocate, and write the configuration (steps 1-3).

The second part involves sending the device to rommon, and this has to be done in a maintenance window (steps 4 -6) since is network impacting; the third part is done when the router is hanging and has not being reloaded.

Note: If the console is unresponsive during the event and if the device is an ISR router add this command so the router will be forced to crash and it could be sent to us for analysis.

Router>enable

Router#config t

Router(config)#scheduler isr-watchdog

Router(config)#exit

Router#write

Here is a very detailed step-by-step instructions list to troubleshoot router hangs:

1st Part

1. Set the configuration register to 0x2002 using the "config-register 0x2002" command issued from global configuration mode of the device.

Router>en

Router#config t

Router(config)#config-register 0x2002

2. Configure the device to allow console access during high CPU utilization by issuing the "scheduler allocate 30000 1000" command from configuration mode of the device.

Router(config)# scheduler allocate 30000 1000

3. Write the configuration to memory using the "write mem" command from enable mode of the device.

Router(config)#exit

Router#write

2nd Part

4. Reload the router using the "reload" command from enable mode of the device.

Router#reload

5. Test the router's ability to drop to ROMMON by using the break sequence.   http://www.cisco.com/en/US/products/hw/routers/ps133/products_tech_note09186a0080174a34.shtml

Then test the use of the "stack 50" of "k 50" command from the ROMMON prompt to gather diagnostic data.  Then test the use of the "cont"

command from the ROMMON prompt to return to IOS.  This will be service impacting, so please do this during a service window.

rommon2>stack 50 (if the device doesn't take "stack 50" use "k 50")

rommon3>cont

6. Reload the router again using the "reload" command from enable mode of the device.

rommon3>reload

Steps 5 to 6 of the second part can be omitted if you have no time to do testing because the router in question is in production, but is definitively a very accurate way to find out if the configuration changes were applied correctly and that they will work at the time of the real hang.

3rd Part

AT TIME OF HANG:

1. Connect to console port of the device and start logging output to a plain text file.

2. Confirm that the device is in a hang state by pressing Enter several times.  No response means that the device is in a hang state.

3. Use break sequence to drop router to ROMMON (in Windows/Hyperterminal, it is usually Ctrl-Break.

4. Enter the command "stack 50" or "k 50" from the ROMMON prompt in order to display diagnostic information about the hang state.

rommon3>stack 50  (if the device doesn't take "stack 50" use "k 50")

5. Enter the command "cont" from the ROMMON prompt in order to go back to the IOS and the hang state.

rommon3>cont

6. Repeat steps 2-5 about 10 times.  This is important in order to make sure we get an accurate reading of where the CPU is hanging.

7. Stop capture and send in the resulting plain text file containing the log from the entire procedure.

More information on router hangs troubleshooting:

http://www.cisco.com/en/US/products/hw/routers/ps359/products_tech_note09186a0080106fd7.shtml

0 REPLIES 0