cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
8752
Views
9
Helpful
8
Replies

Error Logs on Nexus 7K Switch : %SYSMGR-2-STANDBY_BOOT_FAILED

Hi Team,

We are getting some error logs on Nexus switch 7K.

Please check and let me know is it critical?..........

Loggs:

----------

2012 Oct 30 22:36:07 SWITCH %CMPPROXY-STANDBY-2-LOG_CMP_UP: Connectivity Management processor(on module 6) is now UP

2012 Oct 30 22:36:40 SWITCH %SYSMGR-2-GSYNC_SNAPSHOT_SRVFAILED: Service "ipqosmgr" on active supervisor failed to sto

re its snapshot (error-id 0x40480005).

2012 Oct 30 22:36:40 SWITCH %SYSMGR-2-STANDBY_BOOT_FAILED: Standby supervisor failed to boot up.

2012 Oct 30 22:36:42 SWITCH %PLATFORM-2-MOD_REMOVE: Module 6 removed (Serial number JAF1550ATBR)

2012 Oct 30 22:42:08 SWITCH %BOOTVAR-5-NEIGHBOR_UPDATE_AUTOCOPY: auto-copy supported by neighbor supervisor, starting

...

2012 Oct 30 22:42:26 SWITCH %CMPPROXY-STANDBY-2-LOG_CMP_UP: Connectivity Management processor(on module 6) is now UP

2012 Oct 30 22:43:12 SWITCH %SYSMGR-2-GSYNC_SNAPSHOT_SRVFAILED: Service "ipqosmgr" on active supervisor failed to sto

re its snapshot (error-id 0x40480005).

2012 Oct 30 22:43:12 SWITCH %SYSMGR-2-STANDBY_BOOT_FAILED: Standby supervisor failed to boot up.

2012 Oct 30 22:43:15 SWITCH %PLATFORM-2-MOD_REMOVE: Module 6 removed (Serial number JAF1550ATBR)

2012 Oct 30 22:47:59 SWITCH %BOOTVAR-5-NEIGHBOR_UPDATE_AUTOCOPY: auto-copy supported by neighbor supervisor, starting

...

SWITCH#

Thanks.........

Regards,

Senthil

8 Replies 8

Arumugam Muthaiah
Cisco Employee
Cisco Employee

Hi Senthil,

Looks like this is matching with a known Bug CSCtt94327, but please share the #show version to verify this issue.

CSCtt94327 - Standby unable to restore - Killing ipqosmgr manually at active&standby

Symptom:
Error message on Active Sup :

Nexus7000# 2011 Oct 20 18:14:54 Nexus7000%$ VDC-1 %$ %SYSMGR-2-GSYNC_SNAPSHOT_SRVFAILED: Service "ipqosmgr" on active supervisor failed to store its snapshot (error-id 0x40480005).
2011 Oct 20 18:14:55 Nexus7000 %$ VDC-1 %$ %SYSMGR-2-STANDBY_BOOT_FAILED: Standby supervisor failed to boot up.
2011 Oct 20 18:14:58 Nexus7000%$ VDC-1 %$ %PLATFORM-2-MOD_REMOVE: Module 6 removed (Serial number JAF1516DTNP)

Error-id decode :

Nexus7000# sh system error-id 0x40480005

Error Facility: pss
Error Description: too big pss key or value size

Conditions:
Killing IPQOSMGR manually on Active and Standbyone, one after another at once will cause the standby supervisor to reload continuously.

Workaround(s):
A reload of the active supervisor will resolve the issue.

The current fix for CSCtt94327 on the 5.2(x) and 6.0(x) trains are 5.2(3) and 6.0(3) respectively

Fix in

6.1(0.130)S0

5.2(2.71)S0

6.1(0.134)S0

5.2(2.75)S0

6.0(2)S1

6.2(0.40)S0

Regards,

Aru

*** Please rate if the post is useful ***

Regards, Aru *** Please rate if the post useful ***

Hi Arumugam,

Thanks for your great support.........

if we reload the supervisor? How long? Why isn’t the stanby sup working?

Please send me the cisco bug CSCtt94327 link?....

advice exact IOS upgrade

Logs:

------

NODCPXX-NX7K002# sh version

Cisco Nexus Operating System (NX-OS) Software

TAC support: http://www.cisco.com/tac

Documents: http://www.cisco.com/en/US/products/ps9372/tsd_products_support_series_home.html

Copyright (c) 2002-2011, Cisco Systems, Inc. All rights reserved.

The copyrights to certain works contained in this software are

owned by other third parties and used and distributed under

license. Certain components of this software are licensed under

the GNU General Public License (GPL) version 2.0 or the GNU

Lesser General Public License (LGPL) Version 2.1. A copy of each

such license is available at

http://www.opensource.org/licenses/gpl-2.0.php and

http://www.opensource.org/licenses/lgpl-2.1.php

Software

  BIOS:      version 3.22.0

  kickstart: version 5.2(1)

  system:    version 5.2(1)

  BIOS compile time:       02/20/10

  kickstart image file is: bootflash:///n7000-s1-kickstart.5.2.1.bin

  kickstart compile time:  12/25/2020 12:00:00 [07/29/2011 04:34:35]

  system image file is:    bootflash:///n7000-s1-dk9.5.2.1.bin

  system compile time:     6/7/2011 13:00:00 [07/29/2011 06:29:25]

Hardware

  cisco Nexus7000 C7010 (10 Slot) Chassis ("Supervisor module-1X")

  Intel(R) Xeon(R) CPU         with 8260944 kB of memory.

  Processor Board ID JAF1550ATFS

  Device name: NODCPXX-NX7K002

  bootflash:    2048256 kB

  slot0:              0 kB (expansion flash)

Kernel uptime is 198 day(s), 19 hour(s), 46 minute(s), 44 second(s)

Last reset

  Reason: Unknown

  System version: 5.2(1)

  Service:

plugin

  Core Plugin, Ethernet Plugin

CMP (Module 5) ok

CMP Software

  CMP BIOS version:        02.01.05

  CMP Image version:       5.1(1) [build 5.0(0.66)]

  CMP BIOS compile time:   8/ 4/2008 19:39:40

  CMP Image compile time:  8/5/2011 13:00:00

CMP (Module 6) no response

NODCPXX-NX7K002#

Hi Senthil,

The fix is on 5.2(3) and 6.0(3)

See the bug detail on beloe page:

http://tools.cisco.com/Support/BugToolKit/search/getBugDetails.do?method=fetchBugDetails&bugId=CSCtt94327

Conditions:

Killing IPQOSMGR manually on Active and Standbyone, one after another at once will cause the standby supervisor to reload continuously.

Workaround(s):

A reload of the active supervisor will resolve the issue.

Regards,

Aru

*** Please rate if the post is useful ***

Regards, Aru *** Please rate if the post useful ***

Hi Arumugam,

your solution is only upgrade is it? or we can reload the current active sup?.... if reload how many hours it will take?...

I need to get downtime from customer?.......Please update me ASAP.

Hi Senthil,

Reload will fix the problem this time and Ugradation provides the permanent solultion.

The CISCO NEXUS 7000 platform provides 1+1 redundant supervisor modules that can perform a supervisor switchover (SSO) in critical failure situations.

The time it takes to switchover from active to standby depends on the configuration (# of VDCs, IGP config, etc.), however it is into seconds. Consider though that the switchover is hitless, so you won't see any packets drops

A supervisor switchover can be manually initiated in a chassis with two supervisor modules present. Once the switchover is performed, the previous active supervisor reloads and come back online as the standby supervisor

n7000# system switchover

Note:

To ensure that an HA switchover is possible, use the show system redundancy status command or the show module command. If the command output displays the ha-standby state for the standby supervisor module, you can manually initiate a switchover

Refer:

http://www.cisco.com/en/US/docs/switches/datacenter/sw/5_x/nx-os/high_availability/configuration/guide/ha_system.html

Regards,

Aru

*** Please rate if the post is uesful ***

Regards, Aru *** Please rate if the post useful ***

Hi Aaru,

I can't see any ha standby mode......

Logs:

-----

NX7K002# sh system redundancy status
Redundancy mode
---------------
      administrative:   HA
         operational:   None

This supervisor (sup-1)
-----------------------
    Redundancy state:   Active
    Supervisor state:   Active
      Internal state:   Active with warm standby

Other supervisor (sup-2)
------------------------
    Redundancy state:   Standby
    Supervisor state:   Unknown
      Internal state:   Other
NX7K002#
NX7K002#
NX7K002# sh module
Mod  Ports  Module-Type                         Model              Status
---  -----  ----------------------------------- ------------------ ----------
1    0      10 Gbps Ethernet XL Module                             powered-dn
2    32     10 Gbps Ethernet XL Module          N7K-M132XP-12L     ok
3    32     10 Gbps Ethernet XL Module          N7K-M132XP-12L     ok
5    0      Supervisor module-1X                N7K-SUP1           active *
6    0      Supervisor module-1X                                   powered-up
7    48     10/100/1000 Mbps Ethernet XL Module N7K-M148GT-11L     ok
8    48     10/100/1000 Mbps Ethernet XL Module N7K-M148GT-11L     ok

Mod  Power-Status  Reason
---  ------------  ---------------------------
1    powered-dn     failure(powered-down) since maximum number of bringups were exceeded

Mod  Sw              Hw
---  --------------  ------
2    5.2(1)          1.3
3    5.2(1)          1.3
5    5.2(1)          2.3
7    5.2(1)          1.2
8    5.2(1)          1.2


Mod  MAC-Address(es)                         Serial-Num
---  --------------------------------------  ----------
2    44-d3-ca-84-b2-e8 to 44-d3-ca-84-b3-0c  JAF1529DLEF
3    28-94-0f-f8-fa-74 to 28-94-0f-f8-fa-98  JAF1550ANPG
5    64-a0-e7-45-e2-b8 to 64-a0-e7-45-e2-c0  JAF1550ATFS
7    28-94-0f-a8-48-fc to 28-94-0f-a8-49-30  JAF1549DJTS
8    28-94-0f-54-bd-fc to 28-94-0f-54-be-30  JAF1549ANGF

Mod  Online Diag Status
---  ------------------
2    Pass
3    Pass
5    Pass
7    Pass
8    Pass

Xbar Ports  Module-Type                         Model              Status
---  -----  ----------------------------------- ------------------ ----------
1    0      Fabric Module 1                     N7K-C7010-FAB-1    ok
2    0      Fabric Module 1                     N7K-C7010-FAB-1    ok
3    0      Fabric Module 1                     N7K-C7010-FAB-1    ok

Xbar Sw              Hw
---  --------------  ------
1    NA              1.2
2    NA              1.2
3    NA              1.2


Xbar MAC-Address(es)                         Serial-Num
---  --------------------------------------  ----------
1    NA                                      JAF1549ANPT
2    NA                                      JAF1549ANMC
3    NA                                      JAF1549AMKD

* this terminal session
NX7K002#

Thanks.........

Regards,

Senthil


Hi,

We plan to upgrade another nexus 7k also which is located in redmond, this one also Bug related issue (CSCtt94327)

If we  use the ISSU method is it possible? because i can see one of the supervior model showing Powered-up...is not showing HA-standby?...

Thanks...


Logs:
====
NX7K002# sh module
Mod  Ports  Module-Type                         Model              Status
---  -----  ----------------------------------- ------------------ ----------
1    0      10 Gbps Ethernet XL Module                             powered-dn
2    32     10 Gbps Ethernet XL Module          N7K-M132XP-12L     ok
3    32     10 Gbps Ethernet XL Module          N7K-M132XP-12L     ok
5    0      Supervisor module-1X                N7K-SUP1           active *
6    0      Supervisor module-1X                                   powered-up
7    48     10/100/1000 Mbps Ethernet XL Module N7K-M148GT-11L     ok
8    48     10/100/1000 Mbps Ethernet XL Module N7K-M148GT-11L     ok

Mod  Power-Status  Reason
---  ------------  ---------------------------
1    powered-dn     failure(powered-down) since maximum number of bringups were exceeded

Mod  Sw              Hw
---  --------------  ------
2    5.2(1)          1.3
3    5.2(1)          1.3
5    5.2(1)          2.3
7    5.2(1)          1.2
8    5.2(1)          1.2


Mod  MAC-Address(es)                         Serial-Num
---  --------------------------------------  ----------
2    44-d3-ca-84-b2-e8 to 44-d3-ca-84-b3-0c  JAF1529DLEF
3    28-94-0f-f8-fa-74 to 28-94-0f-f8-fa-98  JAF1550ANPG
5    64-a0-e7-45-e2-b8 to 64-a0-e7-45-e2-c0  JAF1550ATFS
7    28-94-0f-a8-48-fc to 28-94-0f-a8-49-30  JAF1549DJTS
8    28-94-0f-54-bd-fc to 28-94-0f-54-be-30  JAF1549ANGF

Mod  Online Diag Status
---  ------------------
2    Pass
3    Pass
5    Pass
7    Pass
8    Pass

Xbar Ports  Module-Type                         Model              Status
---  -----  ----------------------------------- ------------------ ----------
1    0      Fabric Module 1                     N7K-C7010-FAB-1    ok
2    0      Fabric Module 1                     N7K-C7010-FAB-1    ok
3    0      Fabric Module 1                     N7K-C7010-FAB-1    ok

Xbar Sw              Hw
---  --------------  ------
1    NA              1.2
2    NA              1.2
3    NA              1.2


Xbar MAC-Address(es)                         Serial-Num
---  --------------------------------------  ----------
1    NA                                      JAF1549ANPT
2    NA                                      JAF1549ANMC
3    NA                                      JAF1549AMKD

* this terminal session
NX7K002#


NX7K002# sh version
Cisco Nexus Operating System (NX-OS) Software
TAC support: http://www.cisco.com/tac
Documents: http://www.cisco.com/en/US/products/ps9372/tsd_products_support_series_home.html
Copyright (c) 2002-2011, Cisco Systems, Inc. All rights reserved.
The copyrights to certain works contained in this software are
owned by other third parties and used and distributed under
license. Certain components of this software are licensed under
the GNU General Public License (GPL) version 2.0 or the GNU
Lesser General Public License (LGPL) Version 2.1. A copy of each
such license is available at
http://www.opensource.org/licenses/gpl-2.0.php and
http://www.opensource.org/licenses/lgpl-2.1.php

Software
  BIOS:      version 3.22.0
  kickstart: version 5.2(1)
  system:    version 5.2(1)
  BIOS compile time:       02/20/10
  kickstart image file is: bootflash:///n7000-s1-kickstart.5.2.1.bin
  kickstart compile time:  12/25/2020 12:00:00 [07/29/2011 04:34:35]
  system image file is:    bootflash:///n7000-s1-dk9.5.2.1.bin
  system compile time:     6/7/2011 13:00:00 [07/29/2011 06:29:25]


Hardware
  cisco Nexus7000 C7010 (10 Slot) Chassis ("Supervisor module-1X")
  Intel(R) Xeon(R) CPU         with 8260944 kB of memory.
  Processor Board ID JAF1550ATFS

  Device name: NX7K002
  bootflash:    2048256 kB
  slot0:              0 kB (expansion flash)

Kernel uptime is 211 day(s), 4 hour(s), 34 minute(s), 41 second(s)

Last reset
  Reason: Unknown
  System version: 5.2(1)
  Service:

plugin
  Core Plugin, Ethernet Plugin


CMP (Module 5) ok
CMP Software
  CMP BIOS version:        02.01.05
  CMP Image version:       5.1(1) [build 5.0(0.66)]
  CMP BIOS compile time:   8/ 4/2008 19:39:40
  CMP Image compile time:  8/5/2011 13:00:00

NX7K002#


NX7K002# sh logging last 25
2012 Nov 14 18:12:18 NX7K002 %PLATFORM-2-MOD_REMOVE: Module 6 removed (Serial number JAF1550ATBR)
2012 Nov 14 18:17:43 NX7K002 %BOOTVAR-5-NEIGHBOR_UPDATE_AUTOCOPY: auto-copy supported by neighbor supervisor, starting
...
2012 Nov 14 18:18:13 NX7K002 %CMPPROXY-STANDBY-2-LOG_CMP_UP: Connectivity Management processor(on module 6) is now UP
2012 Nov 14 18:18:51 NX7K002 last message repeated 1 time
2012 Nov 14 18:18:51 NX7K002 %SYSMGR-2-GSYNC_SNAPSHOT_SRVFAILED: Service "ipqosmgr" on active supervisor failed to sto
re its snapshot (error-id 0x40480005).
2012 Nov 14 18:18:52 NX7K002 %SYSMGR-2-STANDBY_BOOT_FAILED: Standby supervisor failed to boot up.
2012 Nov 14 18:18:54 NX7K002 %PLATFORM-2-MOD_REMOVE: Module 6 removed (Serial number JAF1550ATBR)
2012 Nov 14 18:24:22 NX7K002 %BOOTVAR-5-NEIGHBOR_UPDATE_AUTOCOPY: auto-copy supported by neighbor supervisor, starting
...
2012 Nov 14 18:24:52 NX7K002 %CMPPROXY-STANDBY-2-LOG_CMP_UP: Connectivity Management processor(on module 6) is now UP
2012 Nov 14 18:25:30 NX7K002 last message repeated 1 time
2012 Nov 14 18:25:33 NX7K002 %SYSMGR-2-GSYNC_SNAPSHOT_SRVFAILED: Service "ipqosmgr" on active supervisor failed to sto
re its snapshot (error-id 0x40480005).
2012 Nov 14 18:25:33 NX7K002 %SYSMGR-2-STANDBY_BOOT_FAILED: Standby supervisor failed to boot up.
2012 Nov 14 18:25:37 NX7K002 %PLATFORM-2-MOD_REMOVE: Module 6 removed (Serial number JAF1550ATBR)
2012 Nov 14 18:30:50 NX7K002 %BOOTVAR-5-NEIGHBOR_UPDATE_AUTOCOPY: auto-copy supported by neighbor supervisor, starting
...
2012 Nov 14 18:31:19 NX7K002 %CMPPROXY-STANDBY-2-LOG_CMP_UP: Connectivity Management processor(on module 6) is now UP
2012 Nov 14 18:31:57 NX7K002 last message repeated 1 time
2012 Nov 14 18:32:02 NX7K002 %SYSMGR-2-GSYNC_SNAPSHOT_SRVFAILED: Service "ipqosmgr" on active supervisor failed to sto
re its snapshot (error-id 0x40480005).
2012 Nov 14 18:32:02 NX7K002 %SYSMGR-2-STANDBY_BOOT_FAILED: Standby supervisor failed to boot up.
2012 Nov 14 18:32:05 NX7K002 %PLATFORM-2-MOD_REMOVE: Module 6 removed (Serial number JAF1550ATBR)
2012 Nov 14 18:36:43 NX7K002 %BOOTVAR-5-NEIGHBOR_UPDATE_AUTOCOPY: auto-copy supported by neighbor supervisor, starting
...
2012 Nov 14 18:37:13 NX7K002 %CMPPROXY-STANDBY-2-LOG_CMP_UP: Connectivity Management processor(on module 6) is now UP
2012 Nov 14 18:37:51 NX7K002 last message repeated 1 time
2012 Nov 14 18:37:54 NX7K002 %SYSMGR-2-GSYNC_SNAPSHOT_SRVFAILED: Service "ipqosmgr" on active supervisor failed to sto
re its snapshot (error-id 0x40480005).
2012 Nov 14 18:37:54 NX7K002 %SYSMGR-2-STANDBY_BOOT_FAILED: Standby supervisor failed to boot up.
2012 Nov 14 18:37:57 NX7K002 %PLATFORM-2-MOD_REMOVE: Module 6 removed (Serial number JAF1550ATBR)
NX7K002#

Thanks...

Regards,

Senthil

Hi Senthil,

The standby supervisor is showing unknown state presenly,

NX7K002# sh system redundancy status
Redundancy mode
---------------
      administrative:   HA
         operational:   None

This supervisor (sup-1)
-----------------------
    Redundancy state:   Active
    Supervisor state:   Active
      Internal state:   Active with warm standby

Other supervisor (sup-2)
------------------------
    Redundancy state:   Standby
    Supervisor state:   Unknown
      Internal state:   Other

Unknown state indicates that the system is in an invalid state and requires a support call to TAC. The standby sup is not booting correctly.

Suggestion: To fix this issue, you need to reload the complete Box. Many times this workaround helps to fix the issue, then both Active and standby supervisor get sync.

For ISSU upgradation, please see the below requirment:
In a Nexus 7000 series chassis with dual supervisors, you can use the in-service software upgrade (ISSU) feature to upgrade the system software while the system continues to forward traffic. An ISSU uses the existing features of nonstop forwarding (NSF) with stateful switchover (SSO) to perform the software upgrade with no system downtime

In a redundant system with two supervisors, one of the supervisors is active while the other operates in the standby mode. During an ISSU, the new software is loaded onto the standby supervisor while the active supervisor continues to operate using the old software. As part of the upgrade, a switchover occurs between the active and standby supervisors, and the standby supervisor becomes active and begins running the new software. After the switchover, the new software is loaded onto the (formerly active) standby supervisor.

Please get fix the Standby supervisor issue, then perform the ISSU.

Refer:

Understanding In-Service Software Upgrades

http://www.cisco.com/en/US/docs/switches/datacenter/sw/5_x/nx-os/high_availability/configuration/guide/ha_issu.html

Cisco NX-OS Software Upgrade or Downgrade

http://www.cisco.com/en/US/docs/switches/datacenter/sw/best_practices/cli_mgmt_guide/nxos_upgrade.html

Regards,

Aru

*** Please rate if the post is useful ***

Regards, Aru *** Please rate if the post useful ***
Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: