cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1463
Views
6
Helpful
6
Replies

VSS standby Supervisor reloaded without warning

scottwilliamson
Level 2
Level 2

Hi All,

The standby Sup in our VSS deployment  reloaded itself. It looks a lot like a software bug as the crash dump  info refers to the a memory access issue

%ALIGN-1-FATAL: Illegal access to a low address 18:12:57 GMT Sat Dec 25 2010
  addr=0x5B, pc=0x408D58C0, ra=0x416EC42C, sp=0x42476F70

%ALIGN-1-FATAL: Illegal access to a low address 18:12:57 GMT Sat Dec 25 2010
  addr=0x5B, pc=0x408D58C0, ra=0x416EC42C, sp=0x42476F70

We're running 12.2(33) SXI so no doubt  we're overdue for an upgrade, but I'd appreciate it someone was able to  confirm that such a bug would have this affect on our system.

There is also another issue - the reload seems to have  affected all of the GRE tunnels used throughout the organisation in  that when the reload was complete the tunnels did not operate and we had  to do various things to get them to work (ie flushing the routing table  worked for some, then we had to deconfigure and reconfigure some but in  a few cases we had to do this two or three times before the tunnels  started working again). Does anyone have any experience of or explanation for this?

I've attached the crashinfo. If any other info is required please let me know.

Many Thanks for your help,

Scott

6 Replies 6

Frederic Vanderbecq
Cisco Employee
Cisco Employee

Hello,

Looking to the crashinfo file, we can see the following:

Dec 25 18:12:57: %SYSTEM_CONTROLLER-SW2_SPSTBY-3-ERROR: Error condition detected: TM_DATA_PARITY_ERROR
Dec 25 18:12:57: %SYSTEM_CONTROLLER-SW2_SPSTBY-3-FATAL: An unrecoverable error has been detected. The system is being reset.
just before the crash.

It seems the SP experienced a parity error and crashed. For such problems, it is usually recommended
to monitor the SP as this could be a one time event.
If it crashes again with the same cause, have it replaced.

Reza Sharifi
Hall of Fame
Hall of Fame

Hi Scott,

Although I can not find a specific bug for this image, but looks like there is memory leak that will cause the IOS to crash/reboot.

As you noted, you are defiantly overdue for on upgrade.  I remember, the first version of SXI I tested with VSS was about 2 years ago and we had a lot of issues with memory leaks and software crashes.  Since this version, SXI releases have come a long way with bug fixes. I have tested SXI4a with VSS and it seems to be pretty stable.  You may want to run it in the lab for a few weeks and keep an eye on it and see if you absorb any issues before going to production.

Good Luck

Reza

scottwilliamson
Level 2
Level 2

Thanks folks,

Any suggestions regarding the GRE tunnels?

Regards,

Scott

Scott,

answering why the GRE tunnels did not come up is not going to be easy with the available information especially that you have now cleared the problem.

Were there any syslog messages at the time of the problem ?
When you say the tunnels did not operate, do you mean you could not even ping the tunnel remote end from the 6500 ? Can you give more details about that problem ?


Fred

Hi Fred,

Thanks for replying.

I'm afraid there are no further syslogs or similar info from the time of the incident and I appreciate it is hard to determine what the problem could be. I was hoping that perhaps someone on the forum may have had a similar issue and may have had a theory or answer.

Anyway, to answer your question the tunnels were all in the up/up state but we could not ping the far ends within our School buildings (the Schools' network being seperate from the rest of the organisation's). Fortunately we are able to telnet to the remote routers involved and we had to do various combinations of removing and reconfiguring the tunnel configs or part of them at least (in some cases changing the encapsulation to IPIP then back to GRE) to get the tunnels back up. In the initial cases, clearing the routing table on the 6509s was enough to get a handful of them working and after that we had to kick the configs around as above to get the remainder passing traffic, which took some time due to the trial and error nature of each fix.

Not only did the tunnels terminating on the 6509s have problems but the tunnels used by our Schools' colleagues that pass through the 6509s to other sites in our network also had similar issues. That makes it look as though the reload of the 6509 stopped handling GRE traffic for some reason, but I don't think that's the answer.

Many Thanks,

Scott

Scott,

for such a problem, I guess the only way to really go down to the bottom of it would be to look to the problem while it is present. Should you see that issue again, contact the TAC and ask us to look to the 6500 live. Maybe the hardware programming for the GRE tunnel got messed up by the crash.

Also, to be honest, 12.2(33)SXI is quite old and it is quite possible a later software could fix this issue..

Fred

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community:

Review Cisco Networking products for a $25 gift card