Outline of NAT Box to Box High Availability Redundancy Operation (Basic Design)

CscTsWebDocs · ‎04-06-2015

This document explains basic function design related to redundancy operation of NAT Box to Box High Availability (hereinafter called NAT B2B HA).

1. Redundancy Configuration of Control/Data Link

a. Control Link failure

With NAT B2B HA, a physical failure occurs in the Control Link, failing to secure redundancy in a state where all devices to configure HA become Active (hereinafter called split-brain).
To avoid split-brain caused by a Control Link failure, securing maximum reliability is recommended for a network that connects Control Link.
A redundancy Link configuration, such as Port-Channel, is effective for Control Link to avoid split-brain caused by a single Control Link failure.

The configuration diagram and the configurations of the sample are described below:

R1#
redundancy
 application redundancy
  group 1
   name RG1
   preempt
   control Port-channel1.1001 protocol 1
   data Port-channel1.1002
!
interface Port-channel1
no ip address
!
interface Port-channel1.1001
encapsulation dot1Q 1001 primary Ethernet0/1 secondary Ethernet1/1
ip address 10.1.1.2 255.255.255.0
!
interface Port-channel1.1002
encapsulation dot1Q 1002 primary Ethernet1/1 secondary Ethernet0/1
ip address 10.2.2.2 255.255.255.0
!
interface Ethernet0/1
no ip address
channel-group 1
!
interface Ethernet1/1
no ip address
channel-group 1
!

R2#
redundancy
 application redundancy
  group 1
   name RG1
   preempt
   priority 200 failover threshold 150
   control Port-channel1.1001 protocol 1
   data Port-channel1.1002
   asymmetric-routing interface Port-channel1.1002
!
interface Port-channel1
no ip address
!
interface Port-channel1.1001
encapsulation dot1Q 1001 primary Ethernet0/1 secondary Ethernet1/1
ip address 10.1.1.1 255.255.255.0
!
interface Port-channel1.1002
encapsulation dot1Q 1002 primary Ethernet1/1 secondary Ethernet0/1
ip address 10.2.2.1 255.255.255.0
!
interface Ethernet0/1
no ip address
channel-group 1
!
interface Ethernet1/1
no ip address
channel-group 1
!

When Control Link is manually shut down, RG group of the device which is shut down becomes AdminDown.
When Control Link is shut down at ACTIVE side device, it becomes INIT (Disable), causing Failover.

R1#show redundancy application protocol group 1

RG Protocol RG 1
------------------
        Role: Init
        Negotiation: Disabled [RG-VP-Request]
        Priority: 105
        Protocol state: Disable
        Ctrl Intf(s) state: AdminDown
        Active Peer: Not exist
        Standby Peer: Not exist
        Log counters:
                role change to active: 1
                role change to standby: 1
                disable events: rg down state 1, rg shut 0
                ctrl intf events: up 1, down 0, admin_down 1
                reload events: local request 0, peer request 0

b. Data Link failure

When a failure occurs in Data Link, an ACTIVE router continues to operate as ACTIVE, and a STANDBY router becomes INIT.
When Data Link is manually shut down, RG group of the device which is shut down becomes AdminDown.
When Data Link is shut down at the ACTIVE side device, it becomes INIT (Disable), causing Failover.

2. Timer Tuning

This section describes two typical timers used for tuning the operation during a NAT BtoB HA failure.

1. timers hellotime [msec] number holdtime [msec] number

A sending interval of hello, and a holdtime to determine that the opposite side cannot respond when not receiving hello can be changed.

A default value of hellotime is 3 sec, and that of holdtime is 10 sec.


R1#
redundancy
 application redundancy
  group 1
  protocol 1
   timers hellotime msec 500 holdtime msec 2000 

R2#
redundancy
 application redundancy
  group 1
  protocol 1
   timers hellotime msec 1000 holdtime msec 3000

R1#show redundancy application protocol group 1

RG Protocol RG 1
------------------
        Role: Active
        Negotiation: Enabled
        Priority: 105
        Protocol state: Active
        Ctrl Intf(s) state: Up
        Active Peer: Local
        Standby Peer: address 10.0.0.2, priority 100, intf Et0/1
        Log counters:
                role change to active: 2
                role change to standby: 2
                disable events: rg down state 1, rg shut 0
                ctrl intf events: up 2, down 0, admin_down 1
                reload events: local request 0, peer request 0

RG Media Context for RG 1
--------------------------
        Ctx State: Active
        Protocol ID: 1
        Media type: Default
        Control Interface: Ethernet0/1
        Current Hello timer: 500
        Configured Hello timer: 500, Hold timer: 2000
        Peer Hello timer: 1000, Peer Hold timer: 3000
        Stats:
                Pkts 393, Bytes 24366, HA Seq 0, Seq Number 393, Pkt Loss 0
                Authentication not configured
                Authentication Failure: 0
                Reload Peer: TX 1, RX 0
                Resign: TX 0, RX 1
        Standby Peer: Present. Hold Timer: 2000
                Pkts 163, Bytes 5542, HA Seq 0, Seq Number 286186, Pkt Loss 0

2. timers delay <seconds> [reload <seconds>]

This timer specifies time taken to start a negotiation of RG role for recovering a device from a failure or after recovering from reloading.
A default value of delay is 10 sec, and that of reload timer is 120 sec.
These values are used when tuning, in accordance with recovery time for routing protocol after recovering from a failure, is required.

After booting, Negotiation Delay Timer in accordance with timers reload <seconds> begins the countdown as shown below.
A Pre-init state is kept until the Timer is expired.

R1#show redundancy application protocol group 1
RG Protocol RG 1
------------------
        Role: Init
        Negotiation: Delayed; remaining 48 sec
        Priority: 105
        Protocol state: Pre-init
        Ctrl Intf(s) state: Up
        Active Peer: Not exist
        Standby Peer: Not exist
        Log counters:
                role change to active: 0
                role change to standby: 0
                disable events: rg down state 0, rg shut 0
                ctrl intf events: up 1, down 0, admin_down 0
                reload events: local request 0, peer request 0

It transits to the Standby-cold state after the Delay Timer is expired.


%RG_PROTOCOL-5-ROLECHANGE: RG id 1 role change from Init to Standby

R1#show redundancy application protocol group 1
RG Protocol RG 1
------------------
        Role: Standby
        Negotiation: Enabled
        Priority: 105
        Protocol state: Standby-cold
        Ctrl Intf(s) state: Up
        Active Peer: address 10.0.0.2, priority 100, intf Et0/1
        Standby Peer: Local
        Log counters:
                role change to active: 0
                role change to standby: 1
                disable events: rg down state 0, rg shut 0
                ctrl intf events: up 1, down 0, admin_down 0
                reload events: local request 0, peer request 0

If the Preempt settings are enabled, the Active state will be recovered again.

%RG_PROTOCOL-5-ROLECHANGE: RG id 1 role change from Standby to Active

R1#show redundancy application protocol group 1
RG Protocol RG 1
------------------
        Role: Active
        Negotiation: Enabled
        Priority: 105
        Protocol state: Active
        Ctrl Intf(s) state: Up
        Active Peer: Local
        Standby Peer: Not exist
        Log counters:
                role change to active: 1
                role change to standby: 1
                disable events: rg down state 0, rg shut 0
                ctrl intf events: up 1, down 0, admin_down 0
                reload events: local request 0, peer request 0

Please refer to the following documents:

NAT Box to Box High Availability Overview

Related Information

Original Document: https://supportforums.cisco.com/ja/document/12331106
Author: Daijiro Kido
Posted on Oct 21, 2014

Outline of NAT Box to Box High Availability Redundancy Operation (Basic Design)

NAT Box to Box High Availability Basic Operation Check (Part 1)

NAT Box to Box High Availability Basic Operation Check (Part 2)

NAT Box to Box High Availability Redundancy Operation Outline (Switching Operation During Failure)

NAT Box to Box High Availability Asymmetric Routing Function Operation Outline

Related Information