Solved: Re: Bug Toolkit is unable to display CSCsg43532 bug details.

Sergio Gaytan · ‎07-22-2010

Hi,

It contains proprietary information that cannot be disclosed at this time.Why ¿?

-Sergio

Phillip Remaker · ‎07-23-2010

The problem is that the bug was originally found in an internal testbed on a non-shipping branch, so it belongs to an internal group ("labtrunk") which is not externally visible.

However, the bug that the testing team uncovered seems to also be present in shipping branches (as evidenced by "sys" bugs CSCtb55433 and CSCsl02596 which declare themselves duplicates of CSCsg43532.)

Nowhere in the process of getting marked as a duplicate was the bug reclassified as shipping/production ("sys") which would make it visible in bug toolkit (this is a process hole). Similarly, bug toolkit does not recognize that this "labtrunk" bug is duplicated by "sys" bugs (logic hole). If the bug toolkit team can take action here, they can either either reclassify this bug as "sys" or change the bug tooklit to recognize that this bug has a valid release note and is duplicated by a "sys" bug and should therefore be visible. The latter is where the bug toolkit team has the most control.

Summary: you have uncovered a combination process/logic hole in the system. Thanks.

And, in case you care, this is the (current) release note for CSCsg43532 is pated below. Bottom line is that the bug is largely cosmetic and should have no operational impact; you can ignore the message.

Symptom:

The user sees a TIMERNEG message related to VTP bulk sync on the stand by supervisor console as the stand by supervisor is booting up.  (Please note that the reason for the supervisor rebooting is not related to this symptom; the only requirement is that the stand by supervisor is coming up in SSO mode.)

No traffic should be impacted by this symptom.

Conditions:

The chassis must be set to SSO mode (thus triggering a VTP bulk sync upon stand by supervisor boot up) for this TIMERNEG message to show up.

The active supervisor must have been up for some time for this effect to show up, and even then, it would appear to be "sporadic" (see below).

If the TIMERNEG message does show up on the stand by console, rebooting just the stand by at that time will very likely show the message again.

Workaround:

There is no traffic or service impact, so there is no need to work around this TIMERNEG message.

Further Problem Description:

Explanation of the word "sporadic" above:  Analysis shows the length of time the active supervisor has been up determines whether this TIMERNEG message shows up.  Internally, the stand by supervisor is hitting an arithmetic overflow condition such that every other 27 day period (1st, 3rd, 5th, ...), this message will show up upon boot, except that every other 270 day period (2nd, 4th, 6th, ...), the messages will not show up at all upon boot.

View solution in original post

Phillip Remaker · ‎07-23-2010

OK, I think I solved it:

Limiting the output to relevant information, OI identifies CSCec51750 as the problem.

The release note is:

Symptoms: A router that is configured for HTTP secure-server may 
 reload unexpectedly because of an internal memory corruption.
 
Conditions: IOS HTTP Secure server enabled

Workaround: Disable HTTPS with "no ip http secure-server"

Also fixed in 12.2(33)SXH3

How did I do that?

The part of the crashinfor that catches my attention are the messages right before the restart:

Feb 3 14:28:54.120 Mexico: %SYS-2-NOPROCESS: No such process 218959117
-Process= "HTTP CORE", ipl= 0, pid= 177
-Traceback= 4105702C 4105731C 40A64E64 40A655A0 407E097C 407D2184 407D24E4 410129B4 410129A0
%ALIGN-1-FATAL: Illegal access to a low address 14:28:54 Mexico Wed Feb 3 2010
addr=0x66C, pc=0x40A64EF8, ra=0x40A655A0, sp=0x50FBDFA8

%ALIGN-1-FATAL: Illegal access to a low address 14:28:54 Mexico Wed Feb 3 2010
addr=0x66C, pc=0x40A64EF8, ra=0x40A655A0, sp=0x50FBDFA8

14:28:54 Mexico Wed Feb 3 2010: TLB (store) exception, CPU signal 10, PC = 0x40A64EF8

So, that's that cause of the crash - it looks like something is accessing it by HTTP ("HTTP CORE") in a weird way that is crashing the box.

Limiting the OI input to:

-----------

Feb 3 14:28:54.120 Mexico: %SYS-2-NOPROCESS: No such process 218959117
-Process= "HTTP CORE", ipl= 0, pid= 177
-Traceback= 4105702C 4105731C 40A64E64 40A655A0 407E097C 407D2184 407D24E4 410129B4 410129A0
%ALIGN-1-FATAL: Illegal access to a low address 14:28:54 Mexico Wed Feb 3 2010
addr=0x66C, pc=0x40A64EF8, ra=0x40A655A0, sp=0x50FBDFA8

%ALIGN-1-FATAL: Illegal access to a low address 14:28:54 Mexico Wed Feb 3 2010
addr=0x66C, pc=0x40A64EF8, ra=0x40A655A0, sp=0x50FBDFA8

14:28:54 Mexico Wed Feb 3 2010: TLB (store) exception, CPU signal 10, PC = 0x40A64EF8

--------------

OI focuses on the relevant messages and correctly (I think) finds the bug.

Sometimes it is usefull to manually trim input to OI. If you have the show version and the messages that seem to cause the problem, it does a much finer job of zeroing in on the issue.

View solution in original post

asolleti · ‎07-23-2010

Hi Sergio

To better assist you with below case, it would be great if you can help me answer below questions:

1) Are you looking for a software version that has fix for "CSCsg43532".

2) Can you please let me know how did you get awareness about "CSCsg43532" that you tried to see in Bug Toolkit?

Thanks

Arun

Product Manager

Cisco Bug Toolkit

Phillip Remaker · ‎07-23-2010

The problem is that the bug was originally found in an internal testbed on a non-shipping branch, so it belongs to an internal group ("labtrunk") which is not externally visible.

However, the bug that the testing team uncovered seems to also be present in shipping branches (as evidenced by "sys" bugs CSCtb55433 and CSCsl02596 which declare themselves duplicates of CSCsg43532.)

Nowhere in the process of getting marked as a duplicate was the bug reclassified as shipping/production ("sys") which would make it visible in bug toolkit (this is a process hole). Similarly, bug toolkit does not recognize that this "labtrunk" bug is duplicated by "sys" bugs (logic hole). If the bug toolkit team can take action here, they can either either reclassify this bug as "sys" or change the bug tooklit to recognize that this bug has a valid release note and is duplicated by a "sys" bug and should therefore be visible. The latter is where the bug toolkit team has the most control.

Summary: you have uncovered a combination process/logic hole in the system. Thanks.

And, in case you care, this is the (current) release note for CSCsg43532 is pated below. Bottom line is that the bug is largely cosmetic and should have no operational impact; you can ignore the message.

Symptom:

The user sees a TIMERNEG message related to VTP bulk sync on the stand by supervisor console as the stand by supervisor is booting up.  (Please note that the reason for the supervisor rebooting is not related to this symptom; the only requirement is that the stand by supervisor is coming up in SSO mode.)

No traffic should be impacted by this symptom.

Conditions:

The chassis must be set to SSO mode (thus triggering a VTP bulk sync upon stand by supervisor boot up) for this TIMERNEG message to show up.

The active supervisor must have been up for some time for this effect to show up, and even then, it would appear to be "sporadic" (see below).

If the TIMERNEG message does show up on the stand by console, rebooting just the stand by at that time will very likely show the message again.

Workaround:

There is no traffic or service impact, so there is no need to work around this TIMERNEG message.

Further Problem Description:

Explanation of the word "sporadic" above:  Analysis shows the length of time the active supervisor has been up determines whether this TIMERNEG message shows up.  Internally, the stand by supervisor is hitting an arithmetic overflow condition such that every other 27 day period (1st, 3rd, 5th, ...), this message will show up upon boot, except that every other 270 day period (2nd, 4th, 6th, ...), the messages will not show up at all upon boot.

Sergio Gaytan · ‎07-23-2010

Hi,

An unexpectedly reset on the active supervisor caused a switchover between the active and standby supervisor engines 32 / MSFC 2A
on cisco WS-C6509-E, Cisco IOS Software -s3223_rp Software (s3223_rp-IPSERVICESK9_WAN-M), Version 12.2(33)SXH4.

I used the output interpreter to analyze "show logging" command output and found a possible CSCsl02596 or CSCsg43532 bug, so I tried to find more details with Cisco Bug Toolkit but I did not have success, so I decided to upgrade from SXH4 to SXH6 because I found a traceback at the cras_info files -%ALIGN-1-FATAL. but I would like to know more details about CSCsg43532 bug.

I have attached the crash_info files.

Regards,

-Sergio

Phillip Remaker · ‎07-23-2010

The unexpected reset is your root problem but the messages that Output Interpreter is catching are the cosmetic artifacts seen during the boot process caused by the bug. So you are chasing the wrong thing. :-) That is, if you applied this patch, the nuisance messages would go away but the problem might still remain.

Unfortunately, this is the wrong forum for chasing a 6500 issue - this group is about the data handling of the bug toolkit. You should either re-post your request in a LAN switching area on or open a TAC case at http://www.cisco.com/tac

Sergio Gaytan · ‎07-23-2010

Hi

Thanks for your explanation, I wanted only to know more details about the bug and I will re-post in a LAN switching area.

Regards,

-Sergio

Phillip Remaker · ‎07-23-2010

OK, I think I solved it:

Limiting the output to relevant information, OI identifies CSCec51750 as the problem.

The release note is:

Symptoms: A router that is configured for HTTP secure-server may 
 reload unexpectedly because of an internal memory corruption.
 
Conditions: IOS HTTP Secure server enabled

Workaround: Disable HTTPS with "no ip http secure-server"

Also fixed in 12.2(33)SXH3

How did I do that?

The part of the crashinfor that catches my attention are the messages right before the restart:

Feb 3 14:28:54.120 Mexico: %SYS-2-NOPROCESS: No such process 218959117
-Process= "HTTP CORE", ipl= 0, pid= 177
-Traceback= 4105702C 4105731C 40A64E64 40A655A0 407E097C 407D2184 407D24E4 410129B4 410129A0
%ALIGN-1-FATAL: Illegal access to a low address 14:28:54 Mexico Wed Feb 3 2010
addr=0x66C, pc=0x40A64EF8, ra=0x40A655A0, sp=0x50FBDFA8

%ALIGN-1-FATAL: Illegal access to a low address 14:28:54 Mexico Wed Feb 3 2010
addr=0x66C, pc=0x40A64EF8, ra=0x40A655A0, sp=0x50FBDFA8

14:28:54 Mexico Wed Feb 3 2010: TLB (store) exception, CPU signal 10, PC = 0x40A64EF8

So, that's that cause of the crash - it looks like something is accessing it by HTTP ("HTTP CORE") in a weird way that is crashing the box.

Limiting the OI input to:

-----------

Feb 3 14:28:54.120 Mexico: %SYS-2-NOPROCESS: No such process 218959117
-Process= "HTTP CORE", ipl= 0, pid= 177
-Traceback= 4105702C 4105731C 40A64E64 40A655A0 407E097C 407D2184 407D24E4 410129B4 410129A0
%ALIGN-1-FATAL: Illegal access to a low address 14:28:54 Mexico Wed Feb 3 2010
addr=0x66C, pc=0x40A64EF8, ra=0x40A655A0, sp=0x50FBDFA8

%ALIGN-1-FATAL: Illegal access to a low address 14:28:54 Mexico Wed Feb 3 2010
addr=0x66C, pc=0x40A64EF8, ra=0x40A655A0, sp=0x50FBDFA8

14:28:54 Mexico Wed Feb 3 2010: TLB (store) exception, CPU signal 10, PC = 0x40A64EF8

--------------

OI focuses on the relevant messages and correctly (I think) finds the bug.

Sometimes it is usefull to manually trim input to OI. If you have the show version and the messages that seem to cause the problem, it does a much finer job of zeroing in on the issue.