cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
7124
Views
5
Helpful
4
Replies

Nexus 5k change peer-keepalive link

Phillip Wilson
Level 1
Level 1

I have a pair of Nexus 5548UPs that have some high priority servers running on them.  Servers are ESX hosts running Nexus 1000v's.  Each host has multple connections in a VPC to both 5548s.  We have been having intermittant ping loss and slowness of traffic to the VM's on these hosts.  I was poking around trying to figure out what the issue could be and found that the peer-keepalive command was not set to send the heart beat across the mgmt0 interface.  I would like to change this to point it accross the mgmt0 interface.  Can I do this live without causing any issues?  Anyone have any tips or advice for me on making this change with production servers running on the switches?  I do not want to cause any loss to any systems when I make this change.

"Switch 2"

vpc domain 101
  role priority 22222
  peer-keepalive destination 172.27.1.18 source 172.27.1.19
  auto-recovery

"Switch 1"

pc domain 101

  role priority 11111

  peer-keepalive destination 172.27.1.19 source 172.27.1.18

  auto-recovery

I've also just noticed tonight that we are having a lot of input erros on one of the 10g links going from 5548-2 back to Core 6513-1.  The link on 5548-1 back to core 6513-1 does not have any input errors.  Also in the log it is showing that the interface keeps going down and back up.  I'm thinking that the Peer link keep alive is the culprit to the VPC for this link going down and back up since it is not using the mgmt0.

2013 Apr 20 21:47:35 GWCP0-2 %ETH_PORT_CHANNEL-5-PORT_DOWN: port-channel98: Ethernet1/1 is down

2013 Apr 20 21:47:35 GWCP0-2 %ETH_PORT_CHANNEL-5-PORT_DOWN: port-channel98: port-channel98 is down

2013 Apr 20 21:47:35 GWCP0-2 %ETH_PORT_CHANNEL-5-FOP_CHANGED: port-channel98: first operational port changed from Ethernet1/1 to none

2013 Apr 20 21:47:35 GWCP0-2 %ETHPORT-5-IF_DOWN_PORT_CHANNEL_MEMBERS_DOWN: Interface port-channel98 is down (No operational members)

2013 Apr 20 21:47:35 GWCP0-2 %ETHPORT-5-IF_DOWN_INITIALIZING: Interface Ethernet1/1 is down (Initializing)

2013 Apr 20 21:47:35 GWCP0-2 %ETHPORT-5-IF_DOWN_PORT_CHANNEL_MEMBERS_DOWN: Interface port-channel98 is down (No operational members)

2013 Apr 20 21:47:35 GWCP0-2 %ETHPORT-5-SPEED: Interface port-channel98, operational speed changed to 10 Gbps

2013 Apr 20 21:47:35 GWCP0-2 %ETHPORT-5-IF_DUPLEX: Interface port-channel98, operational duplex mode changed to Full

2013 Apr 20 21:47:35 GWCP0-2 %ETHPORT-5-IF_RX_FLOW_CONTROL: Interface port-channel98, operational Receive Flow Control state changed to off

2013 Apr 20 21:47:35 GWCP0-2 %ETHPORT-5-IF_TX_FLOW_CONTROL: Interface port-channel98, operational Transmit Flow Control state changed to off

2013 Apr 20 21:47:39 GWCP0-2 %ETH_PORT_CHANNEL-5-PORT_UP: port-channel98: Ethernet1/1 is up

2013 Apr 20 21:47:39 GWCP0-2 %ETH_PORT_CHANNEL-5-FOP_CHANGED: port-channel98: first operational port changed from none to Ethernet1/1

2013 Apr 20 21:47:39 GWCP0-2 %ETHPORT-5-IF_UP: Interface Ethernet1/1 is up in mode trunk

2013 Apr 20 21:47:39 GWCP0-2 %ETHPORT-5-IF_UP: Interface port-channel98 is up in mode trunk

Ethernet1/1 is up

Dedicated Interface

  Belongs to Po98

93 interface resets

  30 seconds input rate 118480 bits/sec, 34 packets/sec

  30 seconds output rate 61744 bits/sec, 18 packets/sec

  Load-Interval #2: 5 minute (300 seconds)

    input rate 113.92 Kbps, 28 pps; output rate 230.39 Kbps, 17 pps

  RX

    761957889 unicast packets  20849861 multicast packets  6478172 broadcast packets

    789285922 input packets  349145216994 bytes

    171626124 jumbo packets  0 storm suppression bytes

    6 runts  0 giants  3557670 CRC  0 no buffer

    3557676 input error  0 short frame  0 overrun   0 underrun  0 ignored

    0 watchdog  0 bad etype drop  0 bad proto drop  0 if down drop

    0 input with dribble  0 input discard

    0 Rx pause

  TX

    336576988 unicast packets  107914 multicast packets  1665274 broadcast packets

    338350176 output packets  189154059253 bytes

    91051993 jumbo packets

    0 output errors  0 collision  0 deferred  0 late collision

    0 lost carrier  0 no carrier  0 babble 0 output discard

    0 Tx pause

1 Accepted Solution

Accepted Solutions

pille1234
Level 3
Level 3

Hallo Phillip,

I believe something doesn't add up here. To the best of my knowledge a vpc domain never comes up whithout a working peer keep-alive link, so either the keep-alive link was originally there and has been removed later for whatever reason, or the peer adjacency has never really formed. Before you change anything have a look at 'show vpc' and compare it with the output provided by steve-fuller.

Besides, a disruption of the peer keep-alive link does not explain interface flappings, crc errors or packet loss at all. All that points to a layer 1 problem. Check the wiring of e1/1, replace the transceivers on both sides and the fiber if neccessary, then clear the error counters and check again if they increase. That would be my highest priority here.

Regards

Pille

View solution in original post

4 Replies 4

Steve Fuller
Level 9
Level 9

Hi Phillip,

You should be fine to migrate the peer-link to use the mgmt0 interface with no loss of connectivity on any vPC. As per the section vPC Peer-Keepalive Failure on page 29 0f the Cisco NX-OS Virtual PortChannel: Fundamental Design Concepts with NXOS 5.0:

If connectivity of the peer-keepalive link is lost but peer-link connectivity is not changed, nothing happens; both vPC peers continue to synchronize MAC address tables, IGMP entries, and so on. The peer-keepalive link is mostly used when the peer link is lost, and the vPC peers use the peer keepalive to resolve the failure and determine which device should shut down the vPC member ports.

And just to show it's OK, here's an example of what happens when I failed a vPC peer-keepalive link. Initially we can see the vPC is operational, the peer-link (Po1) is up, as are Po101 and Po102 which are vPC to my FEX:

ocs5548-1# sh vpc

Legend:

                (*) - local vPC is down, forwarding via vPC peer-link

vPC domain id                   : 1

Peer status                     : peer adjacency formed ok

vPC keep-alive status           : peer is alive

Configuration consistency status: success

Per-vlan consistency status     : success

Type-2 consistency status       : success

vPC role                        : secondary

Number of vPCs configured       : 67

Peer Gateway                    : Enabled

Peer gateway excluded VLANs     : -

Dual-active excluded VLANs      : -

Graceful Consistency Check      : Enabled

vPC Peer-link status

---------------------------------------------------------------------

id   Port   Status Active vlans

--   ----   ------ --------------------------------------------------

1    Po1    up     10,171-178

vPC status

----------------------------------------------------------------------------

id     Port        Status Consistency Reason                     Active vlans

------ ----------- ------ ----------- -------------------------- -----------

101    Po101       up     success     success                    -

102    Po102       up     success     success                    -

102400 Eth101/1/1  down*  Not         Consistency Check Not      -

                          Applicable  Performed

102401 Eth101/1/2  up     success     success                    171

[..]

At this point my peer-keepalive link (via mgmt0 in this case) is operational:

ocs5548-1# sh vpc peer-keep

vPC keep-alive status           : peer is alive

--Peer is alive for             : (231853) seconds, (729) msec

--Send status                   : Success

--Last send at                  : 2013.04.21 08:37:34 620 ms

--Sent on interface             : mgmt0

--Receive status                : Success

--Last receive at               : 2013.04.21 08:37:34 620 ms

--Received on interface         : mgmt0

--Last update from peer         : (0) seconds, (346) msec

vPC Keep-alive parameters

--Destination                   : 192.168.1.6

--Keepalive interval            : 1000 msec

--Keepalive timeout             : 5 seconds

--Keepalive hold timeout        : 3 seconds

--Keepalive vrf                 : management

--Keepalive udp port            : 3200

--Keepalive tos                 : 192

When I shut the port on my out-of-band switch that connects to the mgmt0 interface I then see the peer-keepalive fail:

ocs5548-1# ter mon

ocs5548-1# 2013 Apr 21 08:39:01.068 ocs5548-1 %IM-5-IM_INTF_STATE: mgmt0 is DOWN in vdc 1

2013 Apr 21 08:39:01.600 ocs5548-1 %VPC-2-PEER_KEEP_ALIVE_RECV_FAIL: In domain 1, VPC peer keep-alive receive has failed

If I look at my peer-keepalive and confirm it has failed, but see the vPC operational state is as before the peer-keepalive failure:

ocs5548-1# sh vpc peer-keep

vPC keep-alive status           : peer is not reachable through peer-keepalive

--Send status                   : Success

--Last send at                  : 2013.04.21 08:39:58 620 ms

--Sent on interface             :

--Receive status                : Failed

--Last update from peer         : (62) seconds, (910) msec

vPC Keep-alive parameters

--Destination                   : 192.168.1.6

--Keepalive interval            : 1000 msec

--Keepalive timeout             : 5 seconds

--Keepalive hold timeout        : 3 seconds

--Keepalive vrf                 : management

--Keepalive udp port            : 3200

--Keepalive tos                 : 192

ocs5548-1# sh vpc

Legend:

                (*) - local vPC is down, forwarding via vPC peer-link

vPC domain id                   : 1

Peer status                     : peer adjacency formed ok

vPC keep-alive status           : peer is not reachable through peer-keepalive

Configuration consistency status: success

Per-vlan consistency status     : success

Type-2 consistency status       : success

vPC role                        : secondary

Number of vPCs configured       : 67

Peer Gateway                    : Enabled

Peer gateway excluded VLANs     : -

Dual-active excluded VLANs      : -

Graceful Consistency Check      : Enabled

vPC Peer-link status

---------------------------------------------------------------------

id   Port   Status Active vlans

--   ----   ------ --------------------------------------------------

1    Po1    up     10,171-178

vPC status

----------------------------------------------------------------------------

id     Port        Status Consistency Reason                     Active vlans

------ ----------- ------ ----------- -------------------------- -----------

101    Po101       up     success     success                    -

102    Po102       up     success     success                    -

102400 Eth101/1/1  down*  Not         Consistency Check Not      -

                          Applicable  Performed

102401 Eth101/1/2  up     success     success                    171

[..]

My FEX are still on-line...

ocs5548-1# sh fex

  FEX         FEX           FEX                       FEX

Number    Description      State            Model            Serial

------------------------------------------------------------------------

101        FEX0101                Online    N2K-C2232PP-10GE   SSI155001QZ

102        FEX0102                Online    N2K-C2232PP-10GE   SSI15460AT7

And when the peer-keepalive is re-established, again we see no operational state changes to the vPC:

2013 Apr 21 08:47:19.068 ocs5548-1 %IM-5-IM_INTF_STATE: mgmt0 is UP in vdc 1

ocs5548-1# sh vpc peer-keep

vPC keep-alive status           : peer is alive

--Peer is alive for             : (29) seconds, (749) msec

--Send status                   : Success

--Last send at                  : 2013.04.21 08:47:49 30 ms

--Sent on interface             : mgmt0

--Receive status                : Success

--Last receive at               : 2013.04.21 08:47:48 763 ms

--Received on interface         : mgmt0

--Last update from peer         : (0) seconds, (719) msec

vPC Keep-alive parameters

--Destination                   : 192.168.1.6

--Keepalive interval            : 1000 msec

--Keepalive timeout             : 5 seconds

--Keepalive hold timeout        : 3 seconds

--Keepalive vrf                 : management

--Keepalive udp port            : 3200

--Keepalive tos                 : 192

Regards

pille1234
Level 3
Level 3

Hallo Phillip,

I believe something doesn't add up here. To the best of my knowledge a vpc domain never comes up whithout a working peer keep-alive link, so either the keep-alive link was originally there and has been removed later for whatever reason, or the peer adjacency has never really formed. Before you change anything have a look at 'show vpc' and compare it with the output provided by steve-fuller.

Besides, a disruption of the peer keep-alive link does not explain interface flappings, crc errors or packet loss at all. All that points to a layer 1 problem. Check the wiring of e1/1, replace the transceivers on both sides and the fiber if neccessary, then clear the error counters and check again if they increase. That would be my highest priority here.

Regards

Pille

I ended up shutting the interface off that had all the errors and it corrected my issues.  I traced it down to a bad X2 module in my 6513.

Thanks for all the help!

Phil

Hello, I know this thread is quite old.

But I need your help, I´ve configured two N5K and they're sharing the same vPC domain... and the peering Keep-alive is attached into mgmt0 in case of failure of one Keep-alive link all vPC(Port-channel) become unavailable.

Hope someone can assist in the right direction.

Regards,

Lucas Miguel 

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community:

Review Cisco Networking products for a $25 gift card