cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
16315
Views
10
Helpful
2
Comments
Tomas de Leon
Cisco Employee
Cisco Employee
NOTE: This article is to be used for reference only and the information listed below is from 05-04-2016.  As a result, changes and updates to the following information may be different today.  If you have any doubts or questions in regards to the following information, please open a Cisco TAC case for clarification .
 

This Tech Zone Article is to address some Frequently Asked Questions in regards to dual supervisors and hot standby feature for the Nexus 9508 series switch running in ACI mode.

 

  • Are dual supervisors supported in ACI on the 9508 for HA?
  • Will the spine fail over to standby supervisor if primary supervisor fails? 
  • Is the standby supervisor a HOT standby or a COLD standby supervisor?
  • Will the primary supervisor copy the ACI firmware and CERT files to standby supervisor?
  • Does the standby supervisor require it's own Certificate(CERT) files?
  • What does "warm standby" mean in the output of "show system redundancy status"


Different redundancy modes for Cisco Devices:

HOT STANDBY
Hot redundancy refers to a degree of resiliency where the redundant system is fully prepared to handle the traffic of the primary system. Substantial state information is saved, so the network service is continuous, and the effect on traffic flow is minimal or nil in the case of a failover.

 

WARM STANDBY
Warm redundancy refers to a degree of resiliency beyond the cold standby system. In this case, the redundant system is partially prepared. However, the system does not have all the state information that the primary system knows for an immediate take-over. Some additional information must be determined or gleaned from the traffic flow or the peer network devices to handle packet forwarding.

 

COLD STANDBY
Cold redundancy refers to the degree of resiliency that a redundant system traditionally provides. A redundant system is cold when no state information is maintained between the backup or standby system and the system it protects.


The Release Notes below mentions and a "show system redundancy status" the state of redundancy as "Warm" and "Warm Standby". This can cause some confusion if you learned and understand the 3 modes listed above. As of today, The Cisco Nexus 9508 ACI-mode switch supports "COLD STANDBY". The only mirrored items between the active and standby supervisors are: aci firmware image and certificate files. Both of these need to installed independent of each other. There is no auto-synchronization or state information exchange. A CDET was filed for this (CSCuq18178 ) to change the wording to "Cold Standby". The Development Team decided not to change and use the following documentation to explain the current supported redundancy for the Cisco Nexus 9508 ACI-mode switch.


Information from Release Notes:
The Cisco Nexus 9508 ACI-mode switch supports warm (stateless) standby where the state is not synched between the active and the standby supervisor modules. For an online insertion and removal (OIR) or reload of the active supervisor module, the standby supervisor module becomes active, but all modules in the switch are reset because the switchover is stateless. In the output of the show system redundancy status command, warm standby indicates stateless mode.


ACTIVE SUPERVISOR

spine1# show system redundancy status
Redundancy mode
---------------
administrative: Warm
operational: Warm

This supervisor (sup-27)
-----------------------
Redundancy state: Active
Supervisor state: Active
Internal state: Active with warm standby

Other supervisor (sup-28)
------------------------
Redundancy state: Standby
Supervisor state: Warm standby
Internal state: Warm standby


STANDBY SUPERVISOR

 

(none)# show system redundancy status
Redundancy mode
---------------
administrative: Warm
operational: Warm

 


Reference Material:

 

Cisco NX-OS Release 11.0(1d) Release Notes for Cisco Nexus 9000 Series ACI-Mode Switches
http://www.cisco.com/c/en/us/td/docs/switches/datacenter/aci/apic/sw/1-x/release/notes/aci_nxos_rn_1...

 

CSCuq18178 - show system redundancy status terminology incorrect
http://cdetsweb-prd.cisco.com/apps/dumpcr?identifier=CSCuq18178

 

 

The following example Output shows a N9508 with two supervisors in ACI Mode.  I have provided and example of that shows the two supervisors are NOT synchronized with Certificate files. This is key to note.  Why?  Because if the standby supervisor does NOT have a valid CERT file. The VALID CERT file is stored on the 9500 Chassis and will be copied and applied ONLY to the ACTIVE Supervisor.

 


ACTIVE SUPERVISOR

spine1# show system redundancy status
Redundancy mode
---------------
administrative: Warm
operational: Warm

This supervisor (sup-27)
-----------------------
Redundancy state: Active
Supervisor state: Active
Internal state: Active with warm standby

Other supervisor (sup-28)
------------------------
Redundancy state: Standby
Supervisor state: Warm standby
Internal state: Warm standby


(none)# dir /bootflash
aci-n9000-dk9.11.0.1c.bin
auto-s
disk_log.txt
mem_log.txt
mem_log.txt.old.gz


spine1# show version
Cisco Nexus Operating System (NX-OS) Software
TAC support: http://www.cisco.com/tac

Software
BIOS: version 07.08
kickstart: version 11.0(1c) [build 11.0(1c)]
system: version 11.0(1c) [build 11.0(1c)]
BIOS compile time: 03/28/2014
kickstart image file is: /bootflash/aci-n9000-dk9.11.0.1c.bin
kickstart compile time: 09/03/2014 05:48:50 [09/03/2014 05:48:50]
system image file is: /bootflash/auto-s
system compile time: 09/03/2014 05:48:50 [09/03/2014 05:48:50]


Hardware
cisco N9K-SUP-A ("supervisor")
Intel(R) Xeon(R) CPU E5-2403 0 @ 1.80GHz with 16400384 kB of memory.
Processor Board ID FGE18200AVQ

Device name: spine1
bootflash: 62522368 kB

 

spine1# cat /mit/sys/summary
# System
address : 192.168.0.220
childAction :
currentTime : 2014-10-17T12:09:35.712+00:00
dn : sys
fabricId : 1
fabricMAC : 00:22:BD:F8:19:FF
id : 201
inbMgmtAddr : 0.0.0.0
lcOwn : local
modTs : 2014-10-17T10:00:28.990+00:00
mode : unspecified
monPolDn : uni/fabric/monfab-default
name : spine1
oobMgmtAddr : 0.0.0.0
podId : 1
rn : sys
role : spine
serial : FGE18200AVQ
state : in-service
status :
systemUpTime : 00:02:13:42.000


spine1# cat /proc/cmdline
console=ttyS0,9600n8nn card_index=21000 loader_ver="7.08" quiet ksimg=bootflash:aci-n9000-dk9.11.0.1c.bin rw root=/dev/ram0 rdbase=0x8000000 ip=off ramdisk_size=131072 kgdboc=ttyS0,115200,B mtdparts=physmap-flash.0:512k(mtdoops),256k(RR),256k(SM_LOG),512k(KLOG),512k(EXTRA),12m(KTRACES),50m(PLOG) elevator=noop intel_idle.max_cstate=2 pcie_ports=native

 


spine1# cat /mnt/cfg/0/boot/grub/menu.lst.local
#
# General configuration
#
disable certificate
title bootflash:aci-n9000-dk9.11.0.1c.bin
boot bootflash:aci-n9000-dk9.11.0.1c.bin

 

spine1# cat /mnt/cfg/1/boot/grub/menu.lst.local
#
# General configuration
#
disable certificate
title bootflash:aci-n9000-dk9.11.0.1c.bin
boot bootflash:aci-n9000-dk9.11.0.1c.bin

 

 

CERTIFICATE VERIFICATION (Valid CERT File)

 

spine1# whoami
root

 

spine1# openssl asn1parse < /securedata/ssl/server.crt | grep PRINTABLESTRING
WARNING: can't open config file: /usr/lib/ssl/openssl.cnf
51:d=5 hl=2 l= 13 prim: PRINTABLESTRING :Cisco Systems
75:d=5 hl=2 l= 22 prim: PRINTABLESTRING :Cisco Manufacturing CA
142:d=5 hl=2 l= 28 prim: PRINTABLESTRING :PID:N9K-C9508 SN:FGE18200AVQ
181:d=5 hl=2 l= 11 prim: PRINTABLESTRING :FGE18200AVQ

 

 

INITITATE FAILOVER:

spine1# reload

This command will reload the chassis, Proceed (y/n)? [n]: y
[ 9891.651189] nvram_klm wrote rr=9 rr_str=PolicyElem Ch reload to nvramspine1#
[ 9891.726160] obfl_klm writing reset reason 9, switch reset
[ 9891.806345] Collected 8 ext4 filesystems
[ 9891.854046] Freezing filesystems
[ 9891.973546] Collected 1 ubi filesystems
[ 9892.020222] Freezing filesystems
[ 9892.060810] Done freezing filesystems
[ 9892.106536] Putting SSD in stdby
[ 9892.653211] Done putting SSD in stdby 0
[ 9892.699876] Done offlining SSD

INSIEME SPINE Ver 7.8

INSIEME SPINE Ver 7.8
Memory Size (Bytes): 0x0000000080000000 + 0x0000000380000000
Relocated to memory
Detected CISCO IOFPGA
Code Signing Results: 0x0
Using Upgrade FPGA
Booting from Primary Bios
FPGA Revison : 0x20
FPGA ID : 0x1168153
FPGA Date : 0x20140317
Reset Cause Register: 0x20
Boot Ctrl Register : 0x60ff
EventLog Register1 : 0x2000000
EventLog Register2 : 0xfbc77fff
Found Grub
Version 2.15.1236. Copyright (C) 2012 American Megatrends, Inc.
Board type 1
IOFPGA @ 0xe8000000
SLOT_ID @ 0x1b
Filesystem type is ext2fs, partition type 0x83
Trying to read config file /boot/grub/menu.lst.local from (hd0,4)
Filesystem type is ext2fs, partition type 0x83

Booting bootflash:aci-n9000-dk9.11.0.1c.bin...
Booting bootflash:aci-n9000-dk9.11.0.1c.bin
Trying diskboot
Filesystem type is ext2fs, partition type 0x83
Image valid

 

 

########################################################################

 


STANDBY SUPERVISOR

(none)# show system redundancy status
Redundancy mode
---------------
administrative: Warm
operational: Warm

list index out of range
Error executing command, check logs for details


spine1# dir /bootflash
aci-n9000-dk9.11.0.1c.bin
auto-s
disk_log.txt
mem_log.txt
mem_log.txt.old.gz


(none)# show version
Cisco Nexus Operating System (NX-OS) Software
TAC support: http://www.cisco.com/tac

Software
BIOS: version Unknown
kickstart: Unknown
system: Unknown
BIOS compile time: 12/25/2020
kickstart image file is: Unknown
kickstart compile time: 12/25/2020 12:00:00 [12/25/2020 12:00:00]
system image file is: Unknown
system compile time: 12/25/2020 12:00:00 [12/25/2020 12:00:00]


Hardware
cisco Unknown ("supervisor")
Unknown CPU with 0 kB of memory.
Processor Board ID Unknown

Device name: none
bootflash: 0 kB

 

(none)# cat /mit/sys/summary
# System
address : 0.0.0.0
childAction :
currentTime : 2014-10-18T03:07:06.041+00:00
dn : sys
fabricId : 1
fabricMAC : 00:22:BD:F8:19:FF
id : 0
inbMgmtAddr : 0.0.0.0
lcOwn : local
modTs : 2014-10-18T00:54:01.994+00:00
mode : unspecified
monPolDn : uni/fabric/monfab-default
name :
oobMgmtAddr : 0.0.0.0
podId : 1
rn : sys
role : unsupported
serial :
state : out-of-service
status :
systemUpTime : 00:02:14:04.000


(none)# cat /proc/cmdline
console=ttyS0,9600n8nn card_index=21000 loader_ver="7.08" quiet ksimg=bootflash:aci-n9000-dk9.11.0.1c.bin rw root=/dev/ram0 rdbase=0x8000000 ip=off ramdisk_size=131072 kgdboc=ttyS0,115200,B mtdparts=physmap-flash.0:512k(mtdoops),256k(RR),256k(SM_LOG),512k(KLOG),512k(EXTRA),12m(KTRACES),50m(PLOG) elevator=noop intel_idle.max_cstate=2 pcie_ports=native


(none)# cat /mnt/cfg/0/boot/grub/menu.lst.local
#
# General configuration
#
disable certificate
title bootflash:aci-n9000-dk9.11.0.1c.bin
boot bootflash:aci-n9000-dk9.11.0.1c.bin

 

(none)# cat /mnt/cfg/1/boot/grub/menu.lst.local
#
# General configuration
#
disable certificate
title bootflash:aci-n9000-dk9.11.0.1c.bin
boot bootflash:aci-n9000-dk9.11.0.1c.bin

 


CERTIFICATE VERIFICATION (Invalid CERT File)

 

(none)# openssl asn1parse < /securedata/ssl/server.crt | grep PRINTABLESTRING
WARNING: can't open config file: /usr/lib/ssl/openssl.cnf
37:d=5 hl=2 l= 2 prim: PRINTABLESTRING :XX
137:d=5 hl=2 l= 2 prim: PRINTABLESTRING :US


(none)# openssl asn1parse < /securedata/ssl/server.crt | grep UTF8STRING
WARNING: can't open config file: /usr/lib/ssl/openssl.cnf
50:d=5 hl=2 l= 12 prim: UTF8STRING :Default City
73:d=5 hl=2 l= 19 prim: UTF8STRING :Default Company Ltd
150:d=5 hl=2 l= 2 prim: UTF8STRING :CA
163:d=5 hl=2 l= 7 prim: UTF8STRING :SanJose
181:d=5 hl=2 l= 16 prim: UTF8STRING :Insieme Networks
208:d=5 hl=2 l= 7 prim: UTF8STRING :Insieme


Note: When checking the CERT files on the Supervisors (in this case the standby supervisor), you will see that the standby supervisor has an invalid CERT file.  Why?  Actually, both the primary and standby supervisors have "invalid" CERT files on the supervisors themselves. In the 9500 Modular Chassis, the VALID CERT file is stored on the 9500 Chassis.  Only the ACTIVE Supervisor will use a VALID CERT  file.  When the supervisor boots or becomes ACTIVE supervisor, the VALID CERT file on the Chassis is copied to the ACTIVE Supervisor and is validated.

 

For example:

 

Before Supervisor switchover of Standby supervisor:

 

(none)# openssl asn1parse < /securedata/ssl/server.crt | grep PRINTABLESTRING
37:d=5 hl=2 l= 2 prim: PRINTABLESTRING :XX
137:d=5 hl=2 l= 2 prim: PRINTABLESTRING :US


(none)# openssl asn1parse < /securedata/ssl/server.crt | grep UTF8STRING
50:d=5 hl=2 l= 12 prim: UTF8STRING :Default City
73:d=5 hl=2 l= 19 prim: UTF8STRING :Default Company Ltd
150:d=5 hl=2 l= 2 prim: UTF8STRING :CA
163:d=5 hl=2 l= 7 prim: UTF8STRING :SanJose
181:d=5 hl=2 l= 16 prim: UTF8STRING :Insieme Networks
208:d=5 hl=2 l= 7 prim: UTF8STRING :Insieme

AFTER Supervisor switchover of Standby supervisor:

 

(none)# Certificate verification passed
Certificate verification passed

 

(none)# openssl asn1parse < /securedata/ssl/server.crt | grep UTF8STRING

 

(none)# openssl asn1parse < /securedata/ssl/server.crt | grep PRINTABLESTRING
51:d=5 hl=2 l= 13 prim: PRINTABLESTRING :Cisco Systems
75:d=5 hl=2 l= 22 prim: PRINTABLESTRING :Cisco Manufacturing CA
142:d=5 hl=2 l= 28 prim: PRINTABLESTRING :PID:N9K-C9508 SN:FGE18200AVQ
181:d=5 hl=2 l= 11 prim: PRINTABLESTRING :FGE18200AVQ

 

The NEW standby supervisor wil revert back to the invalid "Insieme Networks" CERT file.

This can be confusing to a customer if they check and see that the standby supervisor has a invalid CERT file.  The key take away is the VALID CERT file is stored on the Chassis and is only copied and applied to the ACTIVE supervisor.

 

 

ACTIVE SUPERVISOR INITITATED FAILOVER WITH RELOAD:

(none)#
INSIEME SPINE Ver 7.8

INSIEME SPINE Ver 7.8
Memory Size (Bytes): 0x0000000080000000 + 0x0000000380000000
Relocated to memory
Detected CISCO IOFPGA
Code Signing Results: 0x0
Using Upgrade FPGA
Booting from Primary Bios
FPGA Revison : 0x20
FPGA ID : 0x1168153
FPGA Date : 0x20140317
Reset Cause Register: 0x80000022
Boot Ctrl Register : 0x60ff
EventLog Register1 : 0x2000000
EventLog Register2 : 0xfbc77fff
Found Grub
Version 2.15.1236. Copyright (C) 2012 American Megatrends, Inc.
Board type 1
IOFPGA @ 0xe8000000
SLOT_ID @ 0x1c
Filesystem type is ext2fs, partition type 0x83
Trying to read config file /boot/grub/menu.lst.local from (hd0,4)
Filesystem type is ext2fs, partition type 0x83

Booting bootflash:aci-n9000-dk9.11.0.1c.bin...
Booting bootflash:aci-n9000-dk9.11.0.1c.bin
Trying diskboot
Filesystem type is ext2fs, partition type 0x83
Image valid

 

TOTD: Technote of the Day

Comments
odahlqvist
Level 4
Level 4

Hi Expert 

How do you that ?

"Note: You will need to add valid CERT file to the standby supervisor."

I have dual supervisor and need to get up the standby to "hot" by moving over  Cert !

/Ola

aleccham
Cisco Employee
Cisco Employee

Hey Ola,

Hot Standby is supported in NXOS mode, whereas if you are running in ACI mode a Warm Standy is the only option. Tomas already included that in his post above, but I went ahead and pulled the important part.

Information from Release Notes:
The Cisco Nexus 9508 ACI-mode switch supports warm (stateless) standby where the state is not synched between the active and the standby supervisor modules. For an online insertion and removal (OIR) or reload of the active supervisor module, the standby supervisor module becomes active, but all modules in the switch are reset because the switchover is stateless. In the output of the show system redundancy status command, warm standby indicates stateless mode.

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: