%SNMP-3-INPUT_QFULL_ERR, ssh session dies, 3750 stack crashes on reload

russbeutel · ‎04-30-2014

Hi List,

we are running a lot of 3750 stacks and in the last few weeks we are faced with a strange phenomenon that now has affected the fifth switch stack in series. All affected switches had in common that they had an uptime of almost two years and an IOS 12.2(44)SE.

It starts with the switch complaining "%SNMP-3-INPUT_QFULL_ERR" on our syslog server for no reason (the switch gets the same snmp requests as every other switch on our network). If we ssh to the affected switch and do a "show interfaces status" it shows a couple of interfaces of the first switch, then the ssh session crashes. The same with "show etherchannel summary" .

If we reconnect to the switch again a "show users" lists the broken connection but a "clear line vty ..." does not reset it.

If we do a "show tech-support | redirect tftp:..." the ssh session from which we are doing this also crashes. The file on the tftserver ends with the interface at which "sho int statu" breaks.

At this stage the stack still seems to forward traffic but if we do a reload in a maintenance window things get worse: The switch that has been the stack master instantly crashes and does not recover. Forwarding stops and the management interface does not come back. Even the console is unusable. The only thing to remedy the situation is to unplug the mains cables. After that the switch comes back as though nothing had happened.

Has anyone heard of such a situation? Is there a way to predict that a switch will show this behaviour in the future? Is it safe to do a firmware upgrade on a switch that will run into this soon? Is there a way to prevent this remotely (without manually unplugging the stack)?

Thanks in advance,

Sebastian.

Leo Laohoo · ‎04-30-2014

Upgrade your IOS to 12.2(55)SE9.

russbeutel · ‎05-05-2014

Dear Leo Laohoo,

thanks for the reply. My problem though is, that i suspect the reload of a switch to be dangerous if it is in that unstable condition. Simply put: I cannot update almost 800 devices evenly spread over a huge area if i don't know how many of them will survive the reload. In particular as the update requires so much flash space, that the running image needs to be deleted. Therefore my main question are:

Is there a way to predict whether a switch is at risk of running into this or not and is there another way to remedy this besides of unpluging the mains cables.

Leo Laohoo · ‎05-05-2014

i suspect the reload of a switch to be dangerous if it is in that unstable condition.

I disagree. If reloading the switch stack causes your stack to crash then this is reason enough for you to strongly consider upgrading the IOS of your stack.

I cannot update almost 800 devices evenly spread over a huge area if i don't know how many of them will survive the reload.

ROFL! I performed an IOS upgraded reloaded >500 switches a few weeks ago. I didn't have any failure. And the IOS was NOT by automation. Everything was done MANUALLY.

In particular as the update requires so much flash space, that the running image needs to be deleted.

This is true if your 3750 series only has 16 mb flash. If your switch has 32 mb flash then you can put up to two full IOS and an additional one BIN file.

Maybe I should try to make it slightly clearer: The IOS you are running, 12.2(44)SE, is not a stable version. Nothing is more true when you are using the FIRST version of the train. Cisco has published numerous IOS bugs as well as security vulnerabilities.

Test the IOS I've recommended in a few units. See if the problem goes away. Then you can do a staggered deployment should you see improvements.

Rex Chia · ‎01-08-2016

Hi Sebastian,

I am facing the exactly same issue here on the cisco 3750 stacking environment...

- SSH session hang when “show etherchannel summary”.
- SSH session hang when “show interface status”.
- The log is full of “%SNMP-3-INPUT_QFULL_ERR: Packet dropped due to input queue full”.

Appreciated if you can share your solution to resolve this problem ?

Thank you very much !

Leo Laohoo · ‎01-08-2016

Rex,

This thread is > 2 years old. Please create a new thread and post the IOS of the switch (stack) used.

russbeutel · ‎11-04-2016

Hi Rex Chia,

sorry for the late answer. We choose the painfull way. We scheduled a lot of maintenance windows to upgrade small groups of stacks in close geographical areas. That way we avoided loosing more stacks at once than we are able to recover at the same day or more switches than we have in our spare parts storage.

In the end we suffered less losses than suspected but more than enough to break our neck if we have had them all at the same day.

Best,

Sebastian.