Solved: Nexus 5596, weird error

Vinny · ‎08-29-2012

Hello,

I have 2 datacenters running same equipement (two Nexu 5596 with FEX).

I just took a look at the log just to see if everything is ok and I saw that I have the same error message (a lot of times) at both location :

%SYSMGR-FEX100-5-HEARTBEAT_LOSS: Service "satctrl" heartbeat loss 2 ,max 7

I though it was a problem with my peerklink-keepalive connection but I see the word FEX ....so i'm not sure...

Note that at both locations, my Nexus are connected back to back through the management port using transceivers. So it's a copper cable from the first nexus, going into a transceiver, going to another transceiver in fiber and then back to copper to the other nexus.

Any idea ?

thanks

Diego Mauricio Cotrino Copete · ‎06-25-2013

same message here, any findings?

View solution in original post

Reza Sharifi · ‎08-29-2012

Hi,

Is VPC keep alive established?

what is the output of "show vpc"?

HTH

Vinny · ‎08-30-2012

Hi,

yes vpc keep alive is established

first datacenter running NX-OS 5.1.3.N1.1 :

Primary core :

CORE1# sh vpc
Legend:
(*) - local vPC is down, forwarding via vPC peer-link

vPC domain id                   : 1
Peer status                     : peer adjacency formed ok
vPC keep-alive status           : peer is alive
Configuration consistency status: success
Per-vlan consistency status     : success
Type-2 consistency status       : success
vPC role                        : primary
Number of vPCs configured       : 404
Peer Gateway                    : Enabled
Peer gateway excluded VLANs     : -
Dual-active excluded VLANs      : -
Graceful Consistency Check      : Enabled

Secondary core :

Core2 # sh vpc
Legend:
(*) - local vPC is down, forwarding via vPC peer-link

vPC domain id : 1

Peer status                     : peer adjacency formed ok
vPC keep-alive status           : peer is alive
Configuration consistency status: success
Per-vlan consistency status     : success
Type-2 consistency status       : success
vPC role                        : secondary
Number of vPCs configured       : 404
Peer Gateway                    : Enabled
Peer gateway excluded VLANs     : -
Dual-active excluded VLANs      : -
Graceful Consistency Check      : Enabled

Second datacenter running NX-OS 5.2.1.N1.1 :

Primary core :

Core1# sh vpc
Legend:
(*) - local vPC is down, forwarding via vPC peer-link

vPC domain id                     : 1
Peer status                       : peer adjacency formed ok
vPC keep-alive status             : peer is alive
Configuration consistency status : success
Per-vlan consistency status       : success
Type-2 consistency status         : success
vPC role                          : primary
Number of vPCs configured         : 246
Peer Gateway                      : Enabled
Peer gateway excluded VLANs     : -
Dual-active excluded VLANs        : -
Graceful Consistency Check         : Enabled
Auto-recovery status              : Disabled

Secondary core :

Core2# sh vpc
Legend:
(*) - local vPC is down, forwarding via vPC peer-link

vPC domain id                     : 1
Peer status                       : peer adjacency formed ok
vPC keep-alive status             : peer is alive
Configuration consistency status : success
Per-vlan consistency status       : success
Type-2 consistency status         : success
vPC role                          : secondary
Number of vPCs configured         : 246
Peer Gateway                      : Enabled
Peer gateway excluded VLANs     : -
Dual-active excluded VLANs        : -
Graceful Consistency Check        : Enabled
Auto-recovery status              : Disabled

thank you

Diego Mauricio Cotrino Copete · ‎06-25-2013

same message here, any findings?

Vinny · ‎06-25-2013

Hello,

I talked to cisco for over a year about this problem (a lot of debugging with them). It's supposed to be fixed in the 6.X release but 6.x is not recommanded at the moment. So they told me to not upgrade right now unless we have problems.

InayathUlla Sharieff · ‎06-25-2013

Vincent,

Error:

%SYSMGR-FEX100-5-HEARTBEAT_LOSS: Service "satctrl" heartbeat loss 2 ,max 7

1.
 
Heartbeat Issue:
- Satctrl is supposed to punch heartbeat every 5 secs.
- Its allowed to miss upto 7 times (35 secs).
- After 7th miss, it will be killed by sysmgr, fex will be offline.
- Sysmgr sends these syslogs after 2nd miss.

The reason of this heartbeat failure is some other processes in the fex is busy doing some operation and satctrl doesn’t get scheduled to punch heartbeat timely.

Cisco has a software bug to track this issue, when NMS to poll the fex info, sometimes it will lead to heatbeat failure.
http://tools.cisco.com/Support/BugToolKit/search/getBugDetails.do?method=fetchBugDetails&bugId=CSCuc39303

.

HTH Regards
Inayath
*Plz rate if this info is helpfull.

Vinny · ‎06-26-2013

Inayath,

yes it's exactly that. That was the bug id that was created after I spoke to TAC for several months of debugging.

InayathUlla Sharieff · ‎06-26-2013

Yes please upgrade to the latest to have this resolved or remove the snmp polling out of the config.

HTH

Regards

Inayath

*Plz rate all usefull post and close the thread as answered.

fariha zain · ‎06-26-2013

Thanks for the help inayath. I had the same query and it helped me.

Regards

Fari

Diego Mauricio Cotrino Copete · ‎06-26-2013

sure, thanks, solarwinds and n5000-uk9.5.1.3.N1.1a.bin is my case too, we will try to upgrade,