Solved: VPC-2-PEER_KEEP_ALIVE_RECV_FAIL log message on Nexus 3064

Arthur Kant · ‎09-25-2020

We have a pair of Nexus 3064 switches running VPC. These have been in production for 3 years now running the same code version since the initial deploy. We recently (this year) started getting log messages for Keep alive failures. I enabled debug logging and they show the following:

2020 Sep 18 05:51:40.427 sw1.dc24 %VPC-5-PEER_KEEP_ALIVE_RECV_INT_LATEST: In domain 100, VPC peer-keepalive received on interface Vlan10
2020 Sep 18 05:51:40.427 sw1.dc24 %VPC-2-PEER_KEEP_ALIVE_RECV_FAIL: In domain 100, VPC peer keep-alive receive has failed
2020 Sep 18 05:53:16.462 sw1.dc24 %VPC-5-PEER_KEEP_ALIVE_RECV_INT_LATEST: In domain 100, VPC peer-keepalive received on interface Vlan10
2020 Sep 18 05:53:16.462 sw1.dc24 %VPC-5-PEER_KEEP_ALIVE_RECV_SUCCESS: In domain 100, vPC peer keep-alive receive is successful
2020 Sep 18 15:14:46.102 sw1.dc24 %VPC-5-PEER_KEEP_ALIVE_RECV_INT_LATEST: In domain 100, VPC peer-keepalive received on interface Vlan10
2020 Sep 18 15:14:46.102 sw1.dc24 %VPC-2-PEER_KEEP_ALIVE_RECV_FAIL: In domain 100, VPC peer keep-alive receive has failed
2020 Sep 18 15:14:57.108 sw1.dc24 %VPC-5-PEER_KEEP_ALIVE_RECV_INT_LATEST: In domain 100, VPC peer-keepalive received on interface Vlan10
2020 Sep 18 15:14:57.108 sw1.dc24 %VPC-5-PEER_KEEP_ALIVE_RECV_SUCCESS: In domain 100, vPC peer keep-alive receive is successful
2020 Sep 19 07:40:14.931 sw1.dc24 %VPC-5-PEER_KEEP_ALIVE_RECV_INT_LATEST: In domain 100, VPC peer-keepalive received on interface Vlan10
2020 Sep 19 07:40:14.931 sw1.dc24 %VPC-2-PEER_KEEP_ALIVE_RECV_FAIL: In domain 100, VPC peer keep-alive receive has failed
2020 Sep 19 07:40:25.936 sw1.dc24 %VPC-5-PEER_KEEP_ALIVE_RECV_INT_LATEST: In domain 100, VPC peer-keepalive received on interface Vlan10
2020 Sep 19 07:40:25.936 sw1.dc24 %VPC-5-PEER_KEEP_ALIVE_RECV_SUCCESS: In domain 100, vPC peer keep-alive receive is successful
2020 Sep 21 06:14:06.268 sw1.dc24 %VPC-5-PEER_KEEP_ALIVE_RECV_INT_LATEST: In domain 100, VPC peer-keepalive received on interface Vlan10
2020 Sep 21 06:14:06.268 sw1.dc24 %VPC-2-PEER_KEEP_ALIVE_RECV_FAIL: In domain 100, VPC peer keep-alive receive has failed
2020 Sep 21 06:14:34.281 sw1.dc24 %VPC-5-PEER_KEEP_ALIVE_RECV_INT_LATEST: In domain 100, VPC peer-keepalive received on interface Vlan10
2020 Sep 21 06:14:34.281 sw1.dc24 %VPC-5-PEER_KEEP_ALIVE_RECV_SUCCESS: In domain 100, vPC peer keep-alive receive is successful

-----

show vpc peer-keepalive

vPC keep-alive status : peer is alive
--Peer is alive for : (180507) seconds, (803) msec
--Send status : Success
--Last send at : 2020.09.25 10:47:01 390 ms
--Sent on interface : Vlan10
--Receive status : Success
--Last receive at : 2020.09.25 10:47:01 390 ms
--Received on interface : Vlan10
--Last update from peer : (0) seconds, (188) msec

vPC Keep-alive parameters
--Destination : 10.0.0.2
--Keepalive interval : 1000 msec
--Keepalive timeout : 5 seconds
--Keepalive hold timeout : 3 seconds
--Keepalive vrf : keepalive
--Keepalive udp port : 3200
--Keepalive tos : 192

------

vlan 10
name KEEPALIVE

interface Vlan10
description vPC Keepalive
no shutdown
vrf member keepalive
ip address 10.0.0.1/30

-----

interface Ethernet1/43
description Keepalive Connection to sw2.dc24 e1/43
switchport mode trunk
switchport trunk allowed vlan 10

Ethernet1/43 is up
admin state is up, Dedicated Interface
Hardware: 100/1000/10000 Ethernet, address: fc99.4755.1b72 (bia fc99.4755.1b72)
Description: Keepalive Connection to sw2.dc24 e1/43
MTU 1500 bytes, BW 10000000 Kbit, DLY 10 usec
reliability 255/255, txload 1/255, rxload 1/255
Encapsulation ARPA, medium is broadcast
Port mode is trunk
full-duplex, 10 Gb/s, media type is 10G
Beacon is turned off
Auto-Negotiation is turned off, FEC mode is Auto
Input flow-control is off, output flow-control is off
Auto-mdix is turned off
Rate mode is dedicated
Switchport monitor is off
EtherType is 0x8100
EEE (efficient-ethernet) : n/a
Last link flapped 109week(s) 4day(s)
Last clearing of "show interface" counters 29w0d
0 interface resets
30 seconds input rate 1232 bits/sec, 0 packets/sec
30 seconds output rate 944 bits/sec, 0 packets/sec
Load-Interval #2: 5 minute (300 seconds)
input rate 960 bps, 0 pps; output rate 680 bps, 0 pps
RX 17630663 unicast packets 9688393 multicast packets 962 broadcast packets
27320018 input packets 2447072885 bytes
0 jumbo packets 0 storm suppression packets
0 runts 0 giants 0 CRC 0 no buffer
0 input error 0 short frame 0 overrun 0 underrun 0 ignored
0 watchdog 0 bad etype drop 0 bad proto drop 0 if down drop
0 input with dribble 0 input discard
0 Rx pause
TX 17629610 unicast packets 880725 multicast packets 1073 broadcast packets
18511408 output packets 1812251192 bytes
0 jumbo packets
0 output error 0 collision 0 deferred 0 late collision
0 lost carrier 0 no carrier 0 babble 0 output discard
0 Tx pause

*** I am not sure what the strategy was to make this a tagged vlan on a trunk but it reported no logs for 2 years. The vlan is not trunked or accessed anywhere else on the switch.

Software
BIOS: version 4.0.0
NXOS: version 7.0(3)I4(6)
BIOS compile time: 12/06/2016
NXOS image file is: bootflash:///nxos.7.0.3.I4.6.bin
NXOS compile time: 3/9/2017 22:00:00 [03/10/2017 01:05:18]

- sw2 reports the same errors but at different times which dont sync up with sw1 events.

- We have another deployment at a test site which has the same configuration and has never reported an error. The difference between the sites is that the test site is running older code:

Software
BIOS: version 2.6.0
loader: version N/A
kickstart: version 6.0(2)U6(8)
system: version 6.0(2)U6(8)

and also less traffic levels.

Any thoughts?? My only direction at this time to reconfigure and take the trunking out and do a straight routed port ..etc or increase keepalive interval above 1000msec

balaji.bandi · ‎09-25-2020

If the kit was running last 3 years with out any issue, you see suddendly this issue - make sure cables are intact and there in no other issue Layer 2- then Looks like bug :

https://bst.cloudapps.cisco.com/bugsearch/bug/CSCve06744/?rfs=iqvred

BB

***** Rate All Helpful Responses *****

How to Ask The Cisco Community for Help

View solution in original post

Reza Sharifi · ‎09-26-2020

Hi,

Not sure if this has anything to do with the issue you are having, but keep-alive is usually not configured in a VLAN on the Nexus series. You just connect the Nexus switches to a 3rd switch and put the 3rd switch ports in a vlan.

HTH