cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1214
Views
9
Helpful
6
Replies

do we hit BugID CSCed52841? is it fixed in IOS 12.2(33)SXI6 ?

Martin Ermel
VIP Alumni
VIP Alumni

a customer observes SNMP timeout problems on a Cat6500 with IOS 12.2(33)SXI. As a result every 2 days (more or less) all interfaces are marked DOWN and a couple of minutes later all interface are up again - but in fact there is no interruption, it is just the snmp request getting a timout...

the customer does not have problems with IOS 12.2(18)SXF14

In the BugDetails the "Known affected versions"  lists the following beside others:

[...]

12.2(33)SXI 

12.2(33)SXI1 

12.2(33)SXI2 

12.2(33)SXI2a 

12.2(33)SXI3

12.2(33)SXI3a 

12.2(33)SXI3z 

12.2(33)SXI4 

12.2(33)SXI4a 

12.2(999)SXI  

[...]

12.2(18)SXF 

12.2(18)SXF1 

12.2(18)SXF2 

12.2(18)SXF3 

12.2(18)SXF4 

12.2(18)SXF5 

12.2(18)SXF6 

12.2(18)SXF7 

12.2(18)SXF8 

12.2(18)SXF9 

12.2(18)SXF10 

12.2(18)SXF10a 

12.2(18)SXF11 

12.2(18)SXF12 

12.2(18)SXF12a 

12.2(18)SXF13 

12.2(18)SXF13a 

12.2(18)SXF13b 

12.2(18)SXF14 

12.2(18)SXF15 

12.2(18)SXF15a 

12.2(18)SXF16 

12.2(18)SXF17 

12.2(18)SXF17a  

[...]

Now I am cunfused; both IOS versions are listed as affected (12.2(33)SXI and 12.2(18)SXF14) but the customer does have problems only with one version.

Is the customer hitting this bug or is it another one ?

He upgraded 2 Cat65xx on which he observed the problem to IOS 12.2(33)SXI6 and the problem is gone; is this just a coincidence or is CSCed52841 really fixed in 12.2(33)SXI6. 

This version is not listed as affected but on the other hand, "Fixed-In" lists only these 3:

12.1(22.3)E1

12.2(17d)SXB5

12.2(18)SXD

Before going to upgrade around 50 Core / Distribution switches the customer wants to be sure with the IOS version.

Tracing the issue is not that easy because the failure occures only from time to time..

6 Replies 6

Michel Hegeraat
Level 7
Level 7

Hi Martin,

I would certainly take the result of the 2 upgraded switches at heart.

The updates on the bugid notes is sometimes a month or two late in my experience, so the results of tests on the network is more relevant to me that what a bugid says.

Why were these switch upgrade to 12.2(33)SXI6 and not the latest IOS in the train? I have my doubts about cisco updating bugid notes but they are pretty good in regression testing  :-).

Cheers,

Michel

thanks Michel for your response. Because core switches are affected we have to be sure that an IOS update will fix the issue and all features used are supported and "bug-free" - but you know these kind of stories...

And certainly, it would be greate if we finally know the reason for this issue - and not just "avoid" it by using another IOS release without being sure that it will not reappear under certain circumstances.

Currently I cannot say how they decided to use the 12.2(33)SXI6 IOS release.

Of course Martin,

I always try to explain to my customers, there is no such thing as certainty, just increased probability.   Indeed they don't want to know. Most of them however simply can't afford to wait for the certainty though.

You expirience will tell you how comparable the switches that were upgraded are compared to the backbone switches. If the switches are used in an entirely different way then we obviously don't even know if they were sufferering from the same defect.

And even if your TAC engineer is 100% sure what went wrong, and that it is certainly  fixed in version X.Y.Z, you are still required to have a fallback plan to a previous (no so good but still mostly) working state.

Good luck,

Michel

Joe Clarke
Cisco Employee
Cisco Employee

I think the "fix" is a coincidence.  The bug you reference is quite old and no specific fix was ever made to the SXI branch.  There have, however, been numerous fixes between SXI and SXI6 that could account for timing fixes.  Seeing a stack trace of the SNMP ENGINE and IP SNMP processes would help identify potential candidates.

Am I right if I assume that it is necessary to get the stack trace while the issue appears? If so, do you have a suggestion how this could be done automatically - according to the customer it would be hardly possible to have a session open to the switcht that is currently affected.

If there is an distribution switch which is affected more often I thought about EEM to get a stack trace every minute or so and append the output to a file on flash... - but I am not sure if this is realistic idea...

Yes, the stack traces will need to be obtained at the time the problem is occurring.  Since there is not necessarilly an EEM-visible trigger that could kick off your policy, the timer ED is a way to go (provided you have enough disk space and you can reproduce this fairly easily).  You could add another EEM policy to run in the "off time" to email the file on flash, then delete it to prevent flash from filling up.

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: