09-24-2025 06:30 AM
I've got a few 9410 switches with Supervisor 1's in them, and one of them is giving me issues.
On the previous upgrade to 17.9.7, I had issues with Supervisor 2 (dual sup's in all) and I had a failure of Sup 2 to boot after a non-ISSU upgrade. Previously that switch also had given me issues with ISSU upgrades.
It was stuck in a boot loop, and was trying to perform an autoupgrade of the software on the sup before it joined up, because something wasn't matched. I turned off auto-upgrade and booted it using the image on a USB stick from rommon, and after it was booted up I did:
write mem
install remove inactive
And that must have cleaned up the switch and when I turned auto-upgrade back on it triggered an auto upgrade and rebooted the sup. At that point the bootflash of stby-bootflash had all the package files and everything seemed fine.
Now though, I have been trying to copy a new image to that stby-bootflash:
The same source via SCP copies fine onto all the other SUPs, both their internal bootflash, and their USB drives. But whenever I copy the image onto that stby-bootflash, regardless if it's from SCP or a direct file copy from USB, the checksum always fails to match. It's not always the same checksum either, so my thought is that something of that bootflash is just totally corrupted, or the controller that does the file writes is flaky.
I'm considering the following to see if I can fix it:
1)Boot to rommon
2)erase the flash
3)boot from usb image
4)do whatever I have to, to make sure the file structure and packages all get on there.
I do think that I've already done an erase of the flash many months ago when I was having issues during my ISSU upgrade, so I don't have a lot of faith that this will actually fix the underlying issue of the bad writes.
Anyone else have any good tips for what you've experienced or what you'd recommend?
09-24-2025 07:43 AM
- @RVTim >...regardless if it's from SCP or a direct file copy from USB, the checksum always fails to match. It's not always the same checksum either, so my thought is that something of that bootflash is just totally corrupted, or the controller that does the file writes is flaky.
Check logs on the switch after this particular file copy is tried
M.
09-24-2025 08:01 AM
I just tried it again and nothing gets logged with a file copy. The file checksum again didn't match. Nothing got logged after doing the verify /md5 either. But thanks for giving me something to look for.
09-24-2025 08:09 AM
- @RVTim Which command are you using to verify the checksum ?
M.
09-24-2025 08:14 AM - edited 09-24-2025 08:22 AM
verify /md5 cat9k_iosxe.17.12.06.SPA.bin
I get successful matches on 5 other supervisors, and 6 other USB drives, but just consistent non-matching on this one supervisor module.
verify /sha512 also fails, by the way.
09-24-2025 08:30 AM
- @RVTim 1) I presume that should be verify /md5 stby-bootflash:cat9k_iosxe.17.12.06.SPA.bin
Meaning make sure that you are verifying the intended file and not one on a 'wrong flash'
2) Check if there is sufficient free disk space the stby-bootflash:
3) Use the command monitor logging level verbose , try the copy and check the logs again
4) Issue the command fsck stby-bootflash: to execute a file system check on the stby-bootflash:
M.
09-24-2025 09:12 AM
Ah yes, I should have been more precise in my reply. I was putting in the prefix stby-bootflash: So yes, it is for sure on the standby supervisor. I didn't cut and past the command.
Running through your last post:
The only things on the flash are the existing package files and the new image .bin.
Directory of stby-bootflash:/
146018 drwx 4096 Sep 24 2025 09:51:38 -05:00 .installer
146054 -rw- 1339381361 Sep 24 2025 09:50:26 -05:00 cat9k_iosxe.17.12.06.SPA.bin
146076 -rw- 33554432 Sep 5 2025 10:58:26 -05:00 nvram_config_bkup
146044 -rw- 33554432 Sep 5 2025 10:58:25 -05:00 nvram_config
365048 drwx 4096 Sep 5 2025 10:58:19 -05:00 .dbpersist
146041 -rw- 12136 Jul 1 2025 12:11:31 -05:00 vlan.dat
146120 -rw- 16 Apr 12 2025 23:55:21 -05:00 nplusm-power
146077 -rw- 18170 Apr 12 2025 23:54:22 -05:00 rdope_out.txt
146050 -rw- 0 Apr 12 2025 23:54:22 -05:00 dope_hist
146078 -rw- 89 Apr 12 2025 23:54:21 -05:00 rdope.log
154129 drwx 4096 Apr 12 2025 23:53:26 -05:00 .prst_sync
146090 -rwx 2049 Apr 12 2025 23:53:22 -05:00 svl_ipc.tcl
146043 -rw- 137940 Apr 12 2025 23:53:22 -05:00 memleak.tcl
146047 -rw- 436 Apr 12 2025 23:52:57 -05:00 boothelper.log
243366 drwx 4096 Apr 12 2025 23:52:56 -05:00 dc_profile_dir
146048 -rw- 35 Apr 12 2025 23:52:12 -05:00 mcelog.txt
146020 -rw- 2072 Apr 12 2025 23:52:12 -05:00 bootloader_evt_handle.log
146094 -rw- 7536 Apr 12 2025 23:49:23 -05:00 packages.conf
146093 -rw- 9220 Apr 12 2025 23:49:21 -05:00 cat9k-wlc.17.09.07.SPA.pkg
146081 -rw- 44409864 Apr 12 2025 23:49:21 -05:00 cat9k-srdriver.17.09.07.SPA.pkg
146092 -rw- 18924548 Apr 12 2025 23:49:21 -05:00 cat9k-webui.17.09.07.SPA.pkg
146079 -rw- 66020356 Apr 12 2025 23:49:20 -05:00 cat9k-sipspa.17.09.07.SPA.pkg
146064 -rw- 51852296 Apr 12 2025 23:49:19 -05:00 cat9k-sipbase.17.09.07.SPA.pkg
146061 -rw- 53955680 Apr 12 2025 23:49:18 -05:00 cat9k-rpboot.17.09.07.SPA.pkg
146057 -rw- 822924292 Apr 12 2025 23:49:06 -05:00 cat9k-rpbase.17.09.07.SPA.pkg
146055 -rw- 2159624 Apr 12 2025 23:48:56 -05:00 cat9k-guestshell.17.09.07.SPA.pkg
146056 -rw- 9220 Apr 12 2025 23:48:56 -05:00 cat9k-lni.17.09.07.SPA.pkg
146053 -rw- 174785544 Apr 12 2025 23:48:55 -05:00 cat9k-espbase.17.09.07.SPA.pkg
146101 -rw- 28398604 Apr 12 2025 23:48:53 -05:00 cat9k-cc_srdriver.17.09.07.SPA.pkg
146023 drwx 4096 Apr 12 2025 22:25:43 -05:00 .rollback_timer
146058 -rw- 7536 Apr 12 2025 22:23:32 -05:00 cat9k_iosxe.17.09.07.SPA.conf
316369 drwx 4096 Apr 12 2025 22:23:25 -05:00 .images
227139 drwx 4096 Sep 30 2024 19:53:00 -05:00 .PATCH
154130 drwx 4096 Jun 18 2024 18:11:31 -05:00 .PATCH-backup
146096 -rw- 3782192 Oct 30 2023 20:14:52 -05:00 mcnewfpgabinclose.img
146095 -rw- 12 Oct 30 2023 20:14:52 -05:00 mcnewfpgabinclose.hdr
146019 drwx 4096 Aug 16 2023 19:45:38 -05:00 fp_cc_crash
146032 drwx 4096 Aug 16 2023 19:45:37 -05:00 tech_support
146113 -rwx 0 Aug 16 2023 19:43:12 -05:00 mode_event_log
365053 drwx 4096 Aug 16 2023 19:43:12 -05:00 SHARED-IOX
405601 drwx 4096 Jan 6 2022 13:16:29 -06:00 callhome
365041 drwx 4096 Jan 6 2022 13:16:12 -06:00 Tbot
332593 drwx 4096 Jan 6 2022 13:16:00 -06:00 sys_report
146068 -rw- 5242880 Jan 6 2022 13:15:59 -06:00 ssd
146024 drwx 4096 Jan 6 2022 13:15:07 -06:00 .ssh
146045 drwx 4096 Jan 18 2019 00:30:58 -06:00 onep
146021 drwx 4096 Jan 18 2019 00:30:18 -06:00 core
11250098176 bytes total (7993761792 bytes free)
9410R-2-130#
Set logging level, and ran fsck:
9410R-2-130#monitor logging level verb
9410R-2-130#fsck stby-bootflash:
fsck of stby-bootflash: complete
Verified the bin file on the primary sup:
9410R-2-130#verify /md5 bootflash:cat9k_iosxe.17.12.06.SPA.bin
(result)
verify /md5 (bootflash:cat9k_iosxe.17.12.06.SPA.bin) = 9cf21f6fadc89fea47596df3a2f5f2e7
Delete the standby .bin file:
9410R-2-130#delete stby-bootflash:cat9k_iosxe.17.12.06.SPA.bin
Copy the file from slot 5 sup to slot 6 sup:
9410R-2-130#copy bootflash:cat9k_iosxe.17.12.06.SPA.bin stby-bootflash:cat9k_iosxe.17.12.06.SPA.bin
1339381361 bytes copied in 257.704 secs (5197363 bytes/sec)
Do another fsck for good measure: (I don't normally do this)
9410R-2-130#fsck stby-bootflash:
fsck of stby-bootflash: complete
Verbose log from that operation:
2025/09/24 10:56:34.763312064 {iosrp_R0-0}{1}: [parser_cmd] [6207]: (note): id= 10.130.124.5@vty1:user=me cmd: 'fsck stby-bootflash:' SUCCESS 2025/09/24 10:56:34.408 CDT
Now verify the image on stby-bootflash:
9410R-2-130#verify /md5 stby-bootflash:cat9k_iosxe.17.12.06.SPA.bin
(result)
verify /md5 (stby-bootflash:cat9k_iosxe.17.12.06.SPA.bin) = 8b1dfdca0afd3d53a5c88ff5a8a2f899
Verbose log entry from the verify:
2025/09/24 11:03:36.532460650 {iosrp_R0-0}{1}: [parser_cmd] [6207]: (note): id= 10.130.124.5@vty1:user=me cmd: 'verify /md5 stby-bootflash:cat9k_iosxe.17.12.06.SPA.bin' SUCCESS 2025/09/24 10:58:25.53
So as far as I can tell, the system thinks it's copying the file successfully, and yet the checksums don't match.
Like I said previous, when I copy this same image to 5 other supervisors and 6 other USB drives, the checksum always comes out proper. So the only issue is this stby-bootflash: in slot 6 supervisor. And this is the supervisor that has given lots of issues with reboot-loops on past upgrades. I'm inclined to think it's just something bad with the flash. I am thinking it just needs an RMA, but wanted to see if I was missing something first.
09-24-2025 03:06 PM
I am confused.
Is this firmware upgrade going to be Install or Bundle Mode?
If this is going to be Install Mode, let the automated script do the copying of the BIN file to all the standby supervisor card by itself.
Please read this: One-Hit-Wonder NSFW
09-26-2025 06:17 AM
Leo, I do let the automated script copy the bin, but I copy it there manually also in case the process fails or I am left with an unbootable system. I like to have multiple available copies of the .bin file to help recover if stuck in rommon. Regarding the One-Hit-Wonder, I've seen that before posted by you. I like it. I haven't tried it yet, however. I'll have to do that on one of the next rounds where I upgrade a switch I'm local to. The ones I'm upgrading now are many hours away from me, so I need to be prepared for contingencies if they don't boot.
09-26-2025 06:27 AM
@RVTim wrote:
but I copy it there manually also in case the process fails or I am left with an unbootable system.
Manually copying the BIN file to the secondary/standby supervisor card is not just a waste of time (because the process will overwrite it nonetheless) but it can introduce a level of danger because there are some firmware version that will be unable to overwrite and causing the entire process to fail altogether. I personally would not do it nor would I recommend it.
I have seen some really weird sh*t happening with unpacking or extracting the packages from the BIN file. Don't make it any worse.
09-26-2025 09:37 AM
Is this the same for other switches such as 9300s in a stack? So only put the bin file on the primary switch and let the install command do its thing?
Only reason I copied bin files to all members of a 9300 stack was because it would fail install if I tried without as other members of the stack didn't have the bin file (using the old platform command - request platform software package install switch all file flash:cat9k_iosxe.17.12.06.SPA.bin)
Though for a while now for 9300s I've been using the below which copies the bin to all members of a stack I believe.
request platform software package install switch all flash:<FILENAME> on-reboot new auto-copy verbose
09-26-2025 06:44 PM - edited 09-26-2025 06:47 PM
@Blair C wrote:
Is this the same for other switches such as 9300s in a stack? So only put the bin file on the primary switch and let the install command do its thing?
The command (between the 9300 and others) may be different, however, the behaviour is the same for a stack of switches &/or a VSS Switch or a VSS WLC: Invoke the command and the Switch/WLC master will copy the firmware down to the slave(s) automatically. After the BIN file has been distributed to the slaves, the process of extracting the package files will start simultanously.
Just want to give a prospective of the One-Hit-Wonder NSFW: 48 hours ago (25 September 2025), a colleague was upgrading a pair of 9500 (VSS). The VSS pair had an uncomfortable (to me) uptime of 15 months and was on 17.6.X. When he invoked the command "install add file bootflash:filename.bin activate commit" and after the distribution of the firmware to the slave was completed, he was surprised to see the "y/n" question present itself in less than 30 seconds (instead of 3 to 4 minutes). Because he was very familiar with the One-Hit-Wonder NSFW process, he had a look at bootflash: and bootflash-2: contents. All the package files, including 17.6.X ones, were erased from both units. If he had not followed the One-Hit-Wonder NSFW process (and answered "y" instead of "no"), both units would have booted into ROMMON.
And lastly, to reinforce the importance of checking the "packages.conf" file before rebooting the platform is CSCwq80600:
We found that the packages.conf file on slot 5 didn't get updated w/ 17.12.4 on slot 5 which is why it used the old image.
09-24-2025 11:00 PM
- @RVTim Take a random file , verify it's checksum (md5) first, copy it to the stby-bootflash: , verify the checksum again on the stby-bootflash:
Do you then have the same problem ?
M.
09-26-2025 09:05 PM
Regarding the random file copy test, here's where I'm at right now. A couple of days ago, ANY file I copied onto that stby-bootflash: would fail checksum compared to the proper checksum and what the source location verification showed. Then I ran through those steps above where I did the fsck probably 3 times. I think after that it was still failing. But, I decided to try it again the next day and was shocked when a file I copied there actually matched in checksum. So I copied files from various locations about 6 more times, and it passed every time. So now I'm in a confused state.
I have a new Supervisor that got shipped to me, so I could RMA this one. Keep in mind that I mentioned in my original postings that this particular switch has had repeated issues not just with this upgrade, but ones in the past as well. So this supervisor has been unreliable. I also, however, hate to send back a working supervisor.
So I need to decide, do I swap it out, test the new one, and send the old one back? Or, do I keep the old one, and run it though the upgrade process, say a little prayer, and if it succeeds, give the supplier back their unused supervisor. I like to treat vendors with respect, but at the same time, I want to know that my hardware is 100% perfect.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide