cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1299
Views
10
Helpful
0
Replies

MTS stuck messages errors in Cisco CLI Analyzer (3.6.5)

luckydams
Level 1
Level 1

Getting following output:

 

Problem Symptom/Impact:
This MDS 9148S 16G 48 FC (1 Slot) device running 8.1(1a) has experienced the following problem:
Stuck MTS messages were found on the system which could lead to issues that may not be aparent nor causing issues on the switch. There might be times where MTS issues could affect tasks such as `copy running-config startup-config`, fabric locking fabric, distributing cfs and many others as there are many processes that rely on this queue not having stuck messages.
Found 177 MTS messages stuck in queues in the last 300 seconds.

Nexus switch time is: Feb 10 20:09:46 2020

Here is the summary of the MTS messages found stuck for more than 300 seconds and their descriptions (if available):
SAP SAP Description MTS Queue MTS Age Time
51863 Not found recv 12 day(s) 13 hour(s) 10 minute(s) 6 second(s)
694 Not found recv 12 day(s) 13 hour(s) 10 minute(s) 6 second(s)
51862 Not found recv 12 day(s) 13 hour(s) 10 minute(s) 6 second(s)
51861 Not found recv 12 day(s) 13 hour(s) 10 minute(s) 6 second(s)
Not Found means SAP name was not found in `show system internal sysmgr service all`

 

Checking at sources, I have:

'show system internal mts buffer summary'
node sapno recv_q pers_q npers_q log_q
lc 51863 16 0 0 0
lc 51862 20 0 0 0
lc 51861 23 0 0 0
sup 25411 0 0 1 0

We only have 59 (16+20+23) msgs in queue. The tool found 177 (3x59) because this command is found 3 times in a 'tech-support details' output....

 

'show system internal mts buffer detail'
**Fast Sap Buffers are not displayed below**
Node/Sap/queue Age(ms) SrcNode SrcSAP DstNode DstSAP OPC MsgId MsgSize RRToken Offset
lc/51863/recv 6181806554 0x102 694 0x102 51863 7842 0x563246ff 4640 0xd63240a3 0xbc6004
lc/51863/recv 6181806530 0x102 694 0x102 51863 7842 0x56324702 4640 0xd63240b7 0xae4004
lc/51863/recv 6181806528 0x102 694 0x102 51863 7842 0x56324704 4640 0xd6324144 0xae8004
...

Age is given in milliseconds, then 6181806554 ms is around 71 days 13 hours ... 

 

To understand why CLI Analyzer is giving "12 day(s) 13 hour(s) 10 minute(s) 6 second(s)" is quite tricky... until falling into this discussion on stackoverflow. The solution is not working if time intervall exceed a month which is exactly the case here.

 

incorrect solution (from stackoverflow):

#! python

from datetime import datetime, timedelta

t = timedelta(milliseconds=6181806554)
d = datetime(1,1,1) + t

print('{} day(s) {} hour(s) {} min(s) {} sec(s)'.format(d.day-1, d.hour, d.minute, d.second))
# 12 day(s) 13 hour(s) 10 min(s) 6 sec(s)

 

correct solution would be:

#! python

from datetime import timedelta

t = timedelta(milliseconds=6181806554)
days = t.days
hours = t.seconds // 3600
mins = t.seconds % 3600 // 60
secs = t.seconds % 3600 % 60

print('{} day(s) {} hour(s) {} min(s) {} sec(s)'.format(days, hours, mins, secs))
# 71 day(s) 13 hour(s) 10 min(s) 6 sec(s)

Just for the trick: 31 (January) + 28 (February) + 12 = 71

 

CLI Analyzer is a great tool to detect MTS stuck issues but give incorrect result.

(Happy Phytonist, please review your code... ;-)

 

0 Replies 0