Solved: Re: 4510+R ssh error after upgrade xe 3.11.07 to xe 3.11.09 - Page 4

Hubert Kupper · ‎10-24-2023

Hi,

after a upgrade from xe 3.11.07 to 3.11.09 we cannot open a ssh session to our 4510+R Switch. Before the upgrade everything works fine. The error message in the log is:

%SSH-3-BAD_PACK_LEN: Bad packet lenghth

We zeroised the rsa key and generate it new but the error still occours.

Any idea?

Regards, Hubert

John.panzer · ‎10-31-2023

Good find. I have included this to TAC so they can investigate further.

It sounds like those additional algorithms could possibly not have been intended for that platform, but got left in from code for a different platform. I’ve let them know that this is the apparent cause.

It makes sense that it could be trying to use those other algorithms that may not be implemented elsewhere in software.

Leo Laohoo · ‎11-06-2023

As of today, 07 November 2023, the Crypto Image file is still available for download and Release Notes have not been updated.

The best way to dissuade anyone from downloading this is to give it a "one star" (and leave supporting remark).

Leo Laohoo · ‎11-07-2023

@RVTim, @Hubert Kupper, @John.panzer, @mistertom, etc.

Please try my EEM:

event manager applet KEX_ALGO
event syslog pattern "SSH-3-BAD_PACK_LEN"
 action 1.0 cli command "enable"
 action 2.0 cli command "conf t"
 action 3.0 cli command "ip ssh server algorithm kex diffie-hellman-group-exchange-sha1 diffie-hellman-group14-sha1"
 action 4.0 cli command "end"

And reboot the switch.

Hope this helps.

RVTim · ‎11-07-2023

@Leo Laohoo

I like the idea, but I tried it tonight and it didn't work for me. I see what you are trying to do though. I think I know why it didn't work.

You looked for "SYS-5-RESTART".

When I rebooted, I had this in the logs:

Nov 7 22:00:16 my-sw 2719: Nov 7 22:00:15: %RF-5-RF_RELOAD: Shelf reload. Reason: Reload Shelf CLI
Nov 7 22:00:16 my-sw 2720: Nov 7 22:00:16: %SYS-5-RELOAD: Reload requested by ME on vty0 (10.6.25.3). Reload Reason: Reload Shelf CLI.

But those entries were just as it was about to reload, not when it came back up.

Of course, remote syslog didn't catch anything from when it reloaded, until it was back booted and routing re-converged.

After it booted up, I did have these messages that made it to the syslog:

Nov 7 22:10:53 my-sw 342: Nov 7 22:10:52: %SSH-3-BAD_PACK_LEN: Bad packet length -288614957
Nov 7 22:10:56 my-sw 343: Nov 7 22:10:55: %SSH-3-BAD_PACK_LEN: Bad packet length -1695192534
Nov 7 22:11:08 my-sw 344: Nov 7 22:11:07: %SYS-5-CONFIG_I: Configured from console by OTHERME on console
Nov 7 22:11:15 my-sw 345: Nov 7 22:11:14: %SEC_LOGIN-5-LOGIN_SUCCESS: Login Success [user: ME] [Source: 10.6.250.3] [localport: 22] at 22:11:14 CST Tue Nov 7 2023
Nov 7 22:11:24 my-sw 346: Nov 7 22:11:23: %C4K_REDUNDANCY-5-CONFIGSYNC: The private-config has been successfully synchronized to the standby supervisor
Nov 7 22:11:24 my-sw 347: Nov 7 22:11:23: %C4K_REDUNDANCY-5-CONFIGSYNC: The startup-config has been successfully synchronized to the standby supervisor
Nov 7 22:12:50 my-sw 348: Nov 7 22:12:50: %OSPF-5-ADJCHG: Process 1, Nbr 172.17.3.4 on Port-channel12 from LOADING to FULL, Loading Done

So you can see it failed with bad packet length a couple times when I tried to log in, but, then from the console I ran the "ip ssh server algorithm kex ...." command and set those old kex's back, and then I was able to log in.

By the way, in my config is still:

event manager applet KEX_ALGO
event syslog pattern "SYS-5-RESTART"
action 1.0 cli command "enable"
action 2.0 cli command "conf t"
action 3.0 cli command "ip ssh server algorithm kex diffie-hellman-group-exchange-sha1 diffie-hellman-group14-sha1"
action 4.0 cli command "end"
!

What I'm thinking is, maybe a better thing to watch the syslog for would be:

SSH-3-BAD_PACK_LEN

Then, if you DO get that error, if I'm understanding the event manager thing well enough, it should automatically change the algorithm(s). Right?

I've never used even manager applets before so that's new to me. When you do "enable" I'm assuming that it doesn't make you authenticate like it would if you consoled in? If that's true, then I think your solution would be great.

What do you think?

PS: My 4500x's are in VSS config with 2 of them. My 4510R+E upgrade using the same rev, went just fine, but that upgrade was a little different in that I did the issu process, and so that just keeps the switch running but fails the control back and forth between the redundant supervisors. So it apparently doesn't matter?

Wait, one more theory on why it didn't matter in my case:

My original config on that particular switch only had one of the 2:

ip ssh server algorithm kex diffie-hellman-group14-sha1

So, here's my theory. When I upgraded the other switch, I had both of the 2 KEX algs. Those were the default config, so, when you did a "show run" they didn't show up in the config. So, it assumed the default when it upgraded. When it upgraded to 03.11.09E, the default is now all 5 algorithms. So then you're locked out because it won't accept the other 2 any longer for whatever reason. But since on my 4510 I had a NON-default config, of only the one algorithm, it stayed in the config that way. And when I rebooted, rather than upgrading to the new 5 algorithms, it kept to just the 1 algorithm. It's a theory anyway. It's either that, or when you just do a redundancy force-switchover type issu upgrade, it doesn't actually modify the running config and you're ok.

Well, thanks for your post and I hope someone gets to try this with the "SSH-3-BAD_PACK_LEN" in the syslog event line and that it works for them. That would be pretty cool. What would be cooler? Probably if Cisco would add stuff to the release notes warning people a little, or maybe come out with 03.11.09aE and NOT add those 3 algorithms as default.

Leo Laohoo · ‎11-07-2023

I've updated the EEM to "SYS-5-RELOAD". See if that makes any difference.

RVTim · ‎11-08-2023

I mentioned above, but, I believe the SYS-5-RELOAD comes when the switch is going down, not when it's coming up. So while I haven't tried that, I don't think it would work because adding the command doesn't survive the software upgrade once it boots back up. You'd need to find a sys message that it gives when it is booting up. That's why I suggested doing it when the block size message shows up. That way it doesn't trigger unless there is actually a block size issue.

Leo Laohoo · ‎11-08-2023

Give it a try and replace the 2nd line with:

event syslog pattern "SSH-3-BAD_PACK_LEN"

And then reboot the box.

RVTim · ‎11-15-2023

I can confirm that Leo's applet, with SSH-3-BAD_PACK_LEN in it works perfectly. Leo, you're fantastic!

Here's the full applet:

event manager applet KEX_ALGO
event syslog pattern "SSH-3-BAD_PACK_LEN"
action 1.0 cli command "enable"
action 2.0 cli command "conf t"
action 3.0 cli command "ip ssh server algorithm kex diffie-hellman-group-exchange-sha1 diffie-hellman-group14-sha1"
action 4.0 cli command "end"
!

I reverted back to 03.11.08E, in which the default KEX's are only the 2 older ones:

diffie-hellman-group-exchange-sha1 diffie-hellman-group14-sha1

I verified that they no longer show in the config when you do "sh run | i ssh" because they are the default in 03.11.08E.

I did a write mem just to ensure that the startup config was saved in 03.11.08E.

I set the boot variable to 03.11.09E and rebooted the VSS Switch.

When it came up I watched our syslog server to see if it was fully booted and watched for "SSH-3-BAD_PACK_LEN".

I attempted a couple of very quick ssh logins and I did see both get caught in the syslog.

Then in the syslog, I saw the log entry that the switch had just been configured, and once I saw that, my 3rd attempt connected without issue. And, since the applet puts in those 2 KEX algo's, which are no longer the default, they showed up in the config and everything was fine.

This is a great workaround, and probably the only way that you will be able to keep yourself from being locked out if you're working on a remote switch like I am, unless you have a console server on the console port to be able to access it that way.

Thanks again Leo, and for anyone considering upgrading, make sure you prepare and throw this applet in before you upgrade!

Leo Laohoo · ‎11-15-2023

Thanks for the update and confirmation.

I have re-posted the EEM here to save people the time.

event manager applet KEX_ALGO
event syslog pattern "SSH-3-BAD_PACK_LEN"
 action 1.0 cli command "enable"
 action 2.0 cli command "conf t"
 action 3.0 cli command "ip ssh server algorithm kex diffie-hellman-group-exchange-sha1 diffie-hellman-group14-sha1"
 action 4.0 cli command "end"

Update (27 November 2023): CSCwi02895

IMPORTANT: The above EEM is a Workaround and should not be treated as a permanent "fix".

Musonick · ‎02-14-2024

Does anyone experience the iosd dying after changing back the algorithms?

It seems like some how i've crashed the ssh server.

I now this post is "old" but I wanted to share it with you all.

sh ver

Cisco IOS Software, IOS-XE Software, Catalyst 4500 L3 Switch Software (cat4500e-UNIVERSALK9-M), Version 03.11.09.E RELEASE SOFTWARE (fc4)

After trying to ssh to the switch:

Exception to IOS Thread:
Frame pointer 5E63EE60, PC = 2838975C

IOSD-EXT-SIGNAL: Segmentation fault(11), Process = SSH Process
-Traceback= 1#47eeb9f03ad0a3b51c57df853665a3b0 :1FE63000+852675C :1FE63000+852675C :1FE63000+5F89B88 :1FE63000+5F8D950 :1FE63000+5F8E354 :1FE63000+5F7AD18

Fastpath Thread backtrace:
-Traceback= 1#47eeb9f03ad0a3b51c57df853665a3b0 c:1C029000+DF8F4 c:1C029000+DF8D4 iosd_unix:1C40E000+1B3B0 pthread:1A6FE000+6450

CMI Thread backtrace:
-Traceback= 1#47eeb9f03ad0a3b51c57df853665a3b0 c:1C029000+E8108 c:1C029000+E80EC xos:1A738000+38838 xos:1A738000+38CB4 xos:1A738000+30DB0 tdlcmicompatlib:1C65B000+1D95C tdlcmicompatlib:1C65B000+206F8 iosd_unix:1C40E000+1D2DC pthread:1A6FE000+6450

Monitor Thread backtrace:
-Traceback= 1#47eeb9f03ad0a3b51c57df853665a3b0 c:1C029000+AE8F4 c:1C029000+AE8E0 c:1C029000+E03A8 iosd_unix:1C40E000+1C94C pthread:1A6FE000+6450

Auxiliary Thread backtrace:
-Traceback= 1#47eeb9f03ad0a3b51c57df853665a3b0 pthread:1A6FE000+BB8C pthread:1A6FE000+BB6C c:1C029000+F6804 iosd_unix:1C40E000+2A15C pthread:1A6FE000+6450

Buffered messages: (last 8192 bytes only)
_SESSION: SSH2 Session request from X.Y.W.Z (tty = 0) using crypto cipher 'aes128-ctr', hmac 'hmac-sha2-256' Succeeded
Feb 13 16:13:55.932: %SSH-3-BAD_PACK_LEN: Bad packet length 129965652
Feb 13 16:13:55.932: %SSH-5-SSH2_USERAUTH: User '' authentication for SSH2 Session from X.Y.W.Z (tty = 0) using crypto cipher 'aes128-ctr', hmac 'hmac-sha2-256' Failed
Feb 13 16:13:55.932:

<Tue Feb 13 16:23:01 2024> Message from sysmgr: Reason Code:[2] Reset Reason:Service [iosd] pid:[5188] terminated abnormally [11].
Details:
--------
Service: IOSd service
Description: IOS daemon
Executable: /tmp/sw/mount/cat4500e-universalk9.SPA.152-7.E9.pkg//usr/binos/bin/iosd

Started at Mon Feb 12 23:46:07 2024 (611361 us)
Stopped at Tue Feb 13 16:23:01 2024 (424375 us)
Uptime: 16 hours 36 minutes 54 seconds

Start type: SRV_OPTION_RESTART_STATELESS (23)
Death reason: SYSMGR_DEATH_REASON_FAILURE_SIGNAL (2)
Last heartbeat 0.00 secs ago

PID: 5188
Exit code: signal 11

CWD: /var/sysmgr/work

PID: 5188
UUID: 512

Leo Laohoo · ‎02-15-2024

Whao, that's a behaviour I've never seen before!

Please raise a TAC Case.

Stadt Regensburg · ‎02-19-2024

Yes, we also experienced this reboot behavior. Interestingly, this exception only occurs when connecting through Mobaxterm SSH (OpenSSH_9.0p1, OpenSSL 1.1.1q 5 Jul 2022)!

When connectiong from Windows SSH (OpenSSH_for_Windows_8.1p1, LibreSSL 3.0.2) it connects normally without errors.

For a test I set up a spare C4500 to replicate this bug, and indeed I can repeat it over and over.

Leo Laohoo · ‎03-18-2024

@Hubert Kupper, @RVTim, @John.panzer, @tbarton, @mistertom, @CoreyH, @Musonick, @Stadt Regensburg

03.11.10 has appeared but the Release Notes have not (yet) been updated.

Fabian Kessler · ‎03-19-2024

I was unable to reproduce the kex issue (CSCwi02895) with IOS 03.11.10 in our lab.

Leo Laohoo · ‎03-19-2024

@Fabian Kessler wrote:
I was unable to reproduce the kex issue (CSCwi02895) with IOS 03.11.10 in our lab.

Thanks for the update, @Fabian Kessler.