cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
4037
Views
15
Helpful
9
Replies

C2232TM Fex'es failing image download

erik.hammervold
Level 1
Level 1

Hi

 

 

We had a software upgrade and we lost four N2K-C2232TM-10GE connected to one of our Nexus 5596UP chassis. We have redundancy for 99% so there's not an emergency but...

 

The Fex'es is going in to a boot loob trying to do a image download. Tried on both 5596 both new and old software. Tried to move one of them to another 5596 chassis but still doesn't come online.

 

There's not like there is a com port or anything, but is there another way to totally reset one of these little critters?

Or maybe involve TAC..?

 

06/18/2020 09:56:44.771671: Module register received
06/18/2020 09:56:44.773527: Image Version Mismatch
06/18/2020 09:56:44.774334: Registration response sent
06/18/2020 09:56:44.774592: Requesting satellite to download image
06/18/2020 09:57:25.411055: Deleting route to FEX
06/18/2020 09:57:25.423205: Module disconnected
06/18/2020 09:57:25.424337: Module Offline
06/18/2020 09:57:59.590417: Deleting route to FEX
06/18/2020 09:57:59.598387: Module disconnected
06/18/2020 09:57:59.599400: Offlining Module
06/18/2020 09:58:31.680381: Deleting route to FEX
06/18/2020 09:58:31.688521: Module disconnected
06/18/2020 09:58:31.689532: Offlining Module
06/18/2020 09:58:36.725099: Deleting route to FEX
06/18/2020 09:58:36.737995: Module disconnected
06/18/2020 09:58:36.739246: Offlining Module
06/18/2020 10:00:04.401332: Module register received
06/18/2020 10:00:04.403147: Image Version Mismatch
06/18/2020 10:00:04.403950: Registration response sent
06/18/2020 10:00:04.404210: Requesting satellite to download image
06/18/2020 10:01:50.603436: Deleting route to FEX
06/18/2020 10:01:50.614645: Module disconnected
06/18/2020 10:01:50.615878: Module Offline
06/18/2020 10:03:31.061698: Module register received
06/18/2020 10:03:31.063505: Image Version Mismatch
06/18/2020 10:03:31.064306: Registration response sent
06/18/2020 10:03:31.064567: Requesting satellite to download image
06/18/2020 10:04:10.643785: Module register received
06/18/2020 10:04:10.645571: Image Version Mismatch



Erik
2 Accepted Solutions

Accepted Solutions

Hi,

A while ago someone had the same issue:

https://community.cisco.com/t5/data-center-switches/nexus-5548-upgrade-fex-stuck-in-blinking-green-uplinks-not/td-p/4100705

Looks like you hit a bug .

https://bst.cloudapps.cisco.com/bugsearch/bug/CSCvt58479

 

After a NX-OS upgrade of Nexus 5K/6K with Nexus 2232TM or Nexus 2232TM-E FEX to NX-OS 7.3(7)N1(1), following symptoms can be seen

1)Nexus 2232TM or Nexus 2232TM-E FEX FEX fails to come online.

2)In some instances, the FEX can be online but few FEX fabric interfaces will remain down.

2)Output errors (IntMacTx-Er)/CRC increase towards FEX. Errors are sent from N5K to FEX uplinks, and some of these errors are seen on FEX HIFs

 

Known Fixed Releases:

7.3(7)N1(1a)

View solution in original post

Hello Eric

There is one way to recover the fex

The fex is stuck in a booting loop and will not boot up even after downgrade or upgrade to a new version 

To recover the fex you need to upgrade or downgrade to new version ( version that is not affected by the bug)

Then we will upload dplug image to login to  a linux kernel of the switch and attach to the fex . Then we will manually delete the  7.3(7)N1(1 version and the fex will boot the new image 

 

7.x.x === New version

Action plan

 

Copy the dplug image for the 7.x.x  into the Nexus  bootflash:.

2. Load the dplug image matching the N5k image for the 7.x.x using the command "load bootflash:nuova-or-dplug-mzg.7.x.x.N1.1.bin".

3. Now, rlogin to the FEX as root. (rlogin -l root 127.15.1.<fexid> ). For example, if the FEX to recover is the FEX x, you can do rlogin -l root 127.15.1.(fex_ID).

4. Do rm -rf /mnt/sysimage/* >>> This command is supposed to delete the current image from the FEX's bootflash.

After some minutes the fex boot up

 

 

 

 

 

View solution in original post

9 Replies 9

erik.hammervold
Level 1
Level 1
Software versions we were moving from 7.3.5.N1.1 to 7.3.7.N1.1


Erik

Hi,

A while ago someone had the same issue:

https://community.cisco.com/t5/data-center-switches/nexus-5548-upgrade-fex-stuck-in-blinking-green-uplinks-not/td-p/4100705

Looks like you hit a bug .

https://bst.cloudapps.cisco.com/bugsearch/bug/CSCvt58479

 

After a NX-OS upgrade of Nexus 5K/6K with Nexus 2232TM or Nexus 2232TM-E FEX to NX-OS 7.3(7)N1(1), following symptoms can be seen

1)Nexus 2232TM or Nexus 2232TM-E FEX FEX fails to come online.

2)In some instances, the FEX can be online but few FEX fabric interfaces will remain down.

2)Output errors (IntMacTx-Er)/CRC increase towards FEX. Errors are sent from N5K to FEX uplinks, and some of these errors are seen on FEX HIFs

 

Known Fixed Releases:

7.3(7)N1(1a)

Might very well be.

We'll check it out. If I'm lucky we can test that version tonight.



Erik

We tested the new software, powercycle fex etc, but it did not fix the problem.

It might be that the initial upgrade bricked the units, and that TAC needs to get involved.

 

So our mistake was that we had too long between our test-upgrade in lab and prod-upgrade, and did so on hardware without this type of fex. Hard to cover all combinations.

 

But thanx tho. starting a TAC case.



Erik

Hello Eric

There is one way to recover the fex

The fex is stuck in a booting loop and will not boot up even after downgrade or upgrade to a new version 

To recover the fex you need to upgrade or downgrade to new version ( version that is not affected by the bug)

Then we will upload dplug image to login to  a linux kernel of the switch and attach to the fex . Then we will manually delete the  7.3(7)N1(1 version and the fex will boot the new image 

 

7.x.x === New version

Action plan

 

Copy the dplug image for the 7.x.x  into the Nexus  bootflash:.

2. Load the dplug image matching the N5k image for the 7.x.x using the command "load bootflash:nuova-or-dplug-mzg.7.x.x.N1.1.bin".

3. Now, rlogin to the FEX as root. (rlogin -l root 127.15.1.<fexid> ). For example, if the FEX to recover is the FEX x, you can do rlogin -l root 127.15.1.(fex_ID).

4. Do rm -rf /mnt/sysimage/* >>> This command is supposed to delete the current image from the FEX's bootflash.

After some minutes the fex boot up

 

 

 

 

 

Good.

 

If this works, we won't have to replace the units at least.

I don't have the debug plugin, so i recon a TAC-engineer or partner needs to get involved.

 

We have started the procedure to open a TAC case via our service-partner, but this took some time since the units are old an was acquired through a old contract/partner.



Erik

VIa TAC we were able to get the fex'es back online.

The procedure was just as you described. Spot on, and used the d-plug.

 

We had to tickle the fex'es a bit to get them to cooperate. A couple of extra reboots and in one case connect it to another uplink port before it began to connect to the 55xx så that we could gain root access via the dplug.

 

Much better than a replacement.



Erik

Hi, I have same issue and thanks for your instruction but how can I access to dplug image?

Hi Peyman
The Debug Plugin is only available via Cisco TAC I'm afraid. Create a TAC-case and they'll sort you out.


Erik
Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: