cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
Announcements
Join Customer Connection to register!
2244
Views
10
Helpful
9
Replies
Vinny
Beginner

Nexus 5596, weird error

Hello,

I have 2 datacenters running same equipement (two Nexu 5596 with FEX).

I just took a look at the log just to see if everything is ok and I saw that I have the same error message (a lot of times) at both location :

%SYSMGR-FEX100-5-HEARTBEAT_LOSS: Service "satctrl" heartbeat loss 2 ,max 7

I though it was a problem with my peerklink-keepalive connection but I see the word FEX ....so i'm not sure...

Note that at both locations, my Nexus are connected back to back through the management port using transceivers. So it's a copper cable from the first nexus,  going into a transceiver, going to another transceiver in fiber and then back to copper to the other nexus.

Any idea ?

thanks

1 ACCEPTED SOLUTION

Accepted Solutions

same message here, any findings?

View solution in original post

9 REPLIES 9
Reza Sharifi
Hall of Fame Expert

Hi,

Is VPC keep alive established?

what is the output of "show vpc"?

HTH

Hi,

yes vpc keep alive is established

first datacenter running NX-OS 5.1.3.N1.1 :

Primary core :

CORE1# sh vpc
Legend:
                (*) -  local vPC is down, forwarding via vPC peer-link

vPC domain id                   : 1
Peer  status                     : peer adjacency formed ok
vPC keep-alive  status           : peer is alive
Configuration consistency status:  success
Per-vlan consistency status     : success
Type-2 consistency  status       : success
vPC role                        : primary
Number of  vPCs configured       : 404
Peer Gateway                    : Enabled
Peer  gateway excluded VLANs     : -
Dual-active excluded VLANs      :  -
Graceful Consistency Check      : Enabled

Secondary core :

Core2 # sh vpc
Legend:
                (*) -  local vPC is down, forwarding via vPC peer-link

vPC domain id                   : 1

Peer  status                     : peer adjacency formed ok
vPC keep-alive  status           : peer is alive
Configuration consistency status:  success
Per-vlan consistency status     : success
Type-2 consistency  status       : success
vPC role                        : secondary
Number  of vPCs configured       : 404
Peer Gateway                    :  Enabled
Peer gateway excluded VLANs     : -
Dual-active excluded  VLANs      : -
Graceful Consistency Check      :  Enabled

Second datacenter running NX-OS 5.2.1.N1.1 :

Primary core :


Core1# sh vpc
Legend:
                (*) -  local vPC is down, forwarding via vPC peer-link

vPC domain id                     : 1
Peer  status                       : peer adjacency formed ok
vPC keep-alive  status             : peer is alive
Configuration consistency status  :  success
Per-vlan consistency status       : success
Type-2 consistency  status         : success
vPC role                          :  primary
Number of vPCs configured         : 246
Peer  Gateway                      : Enabled
Peer gateway excluded VLANs     :  -
Dual-active excluded VLANs        : -
Graceful Consistency Check         : Enabled
Auto-recovery status              : Disabled

Secondary core :


Core2# sh vpc
Legend:
                 (*) - local vPC is down, forwarding via vPC peer-link

vPC  domain id                     : 1
Peer status                       : peer  adjacency formed ok
vPC keep-alive status             : peer is  alive
Configuration consistency status  : success
Per-vlan consistency  status       : success
Type-2 consistency status         : success
vPC  role                          : secondary
Number of vPCs configured         :  246
Peer Gateway                      : Enabled
Peer gateway excluded  VLANs     : -
Dual-active excluded VLANs        : -
Graceful Consistency  Check        : Enabled
Auto-recovery status              :  Disabled

thank you

same message here, any findings?

View solution in original post

Hello,

I talked to cisco for over a year about this problem (a lot of debugging with them). It's supposed to be fixed in the 6.X release but 6.x is not recommanded at the moment. So they told me to not upgrade right now unless we have problems.

InayathUlla Sharieff
Cisco Employee

Vincent,

Error:

%SYSMGR-FEX100-5-HEARTBEAT_LOSS: Service "satctrl" heartbeat loss 2 ,max 7

1.
 
Heartbeat Issue:
- Satctrl is supposed to punch heartbeat every 5 secs.
- Its allowed to miss upto 7 times (35 secs).
- After 7th miss, it will be killed by sysmgr, fex will be offline.
- Sysmgr sends these syslogs after 2nd miss.

The reason of this heartbeat failure is some other processes in the fex is busy doing some operation and satctrl doesn’t get scheduled to punch heartbeat timely.

Cisco has a software bug to track this issue, when NMS to poll the fex info, sometimes it will lead to heatbeat failure.
http://tools.cisco.com/Support/BugToolKit/search/getBugDetails.do?method=fetchBugDetails&bugId=CSCuc39303
.
HTH Regards
Inayath
*Plz rate if this info is helpfull.

Inayath,

yes it's exactly that. That was the bug id that was created after I spoke to TAC for several months of debugging.

Yes please upgrade to the latest to have this resolved or remove the snmp polling out of the config.

HTH

Regards

Inayath

*Plz rate all usefull post and close the thread as answered.

Thanks for the help inayath. I had the same query and it helped me.

Regards

Fari

sure, thanks, solarwinds and n5000-uk9.5.1.3.N1.1a.bin is my case too, we will try to upgrade,