01-13-2014 06:23 AM - edited 03-07-2019 05:31 PM
All,
Recently, we've had a unexpected failover in our data center and we noticed something interesting. In the logs of your Nexus 7000, there was double timestamps..does anyone knows the reason why? Maybe because the active sup was about to be the standby? I meant, right before the failover?
2014 Jan 12 08:40:17 comet2-7000 Jan 12 08:32:04 %KERN-2-SYSTEM_MSG: [37130227.530994] node=4 sap=61249 rq=5445(961704) lq=0(0) pq=0(0) nq=0(0) sq=0(0) buf_in_transit=0, bytes_in_transit=0 - kernel
2014 Jan 12 08:39:47 comet2-7000 Jan 12 08:39:47 %KERN-2-SYSTEM_MSG: [3021309.411438] Switchover started by redundancy driver - kernel
2014 Jan 12 08:39:47 comet2-7000 %SYSMGR-2-HASWITCHOVER_PRE_START: This supervisor is becoming active (pre-start phase).
2014 Jan 12 08:39:47 comet2-7000 %SYSMGR-2-HASWITCHOVER_START: Supervisor 6 is becoming active.
2014 Jan 12 08:39:48 comet2-7000 %STP-6-FIRST_BPDU_TX: First BPDU transmitted
2014 Jan 12 08:39:48 comet2-7000 %SYSMGR-2-SWITCHOVER_OVER: Switchover completed.
You see, the first line shows a double timestamp, why??
Thanks,
Thiago
Solved! Go to Solution.
01-20-2014 11:23 PM
Hey Thiago,
There is no way to keep the MTS Que clean manually, It will be freed and occupied dynamically. CoPP is different from MTS. Although CoPP can prevent MTS que full problems.
show system internal mts buffers summary should show you the SAP number which is congesting the que. After the same you can use the SAP number to determine which component or process it is. There are static and dynamic SAP's so 23 is SNMP i.e., there are known SAP's from which we can determine just by seeing but there are many which we cant and only way is to by show system internal mts sup sap
Once we identify we can determine how to stop that from occuping. Like in your case its SNMP, you can like add an ACL to drop the SNMP packets or you can take a capture find out who is sending more SNMP requests or you can stop the SNMP service completely.
completely your wish
HTH,
Ricky
*Rate if this is useful
01-13-2014 05:20 PM
Hi Thiago,
I haven't seen anything before like this. Did you capture the above logs while it is printing on the console or did you run the "show logging" command and checked this?
Thanks,
Madhu
01-14-2014 07:08 AM
Madhu,
sh logging. I want to understand why double timestamps...I couldn't find any document explaining this. If you know, please let me know.
Thanks,
Thiago
01-14-2014 05:13 PM
Hi Thiago,
I dont have a 7K to play with handy. Let me see if I can grab one and try this out.
1. Do you see it everytime when you do a swithcover or just a one time occurence?
2. And between the 2 timestamps which one is the correct one?
3. Is this the only line that shows this or many more lines?
4. What is the version you are running?
Thanks,
Madhu
01-17-2014 06:58 AM
Madhu,
Here it go the responses:
1. Do you see it everytime when you do a swithcover or just a one time occurence? --> this is the first time a swtichover occurred
2. And between the 2 timestamps which one is the correct one? --> This is what I am trying to understand
3. Is this the only line that shows this or many more lines? --> I have more lines
4. What is the version you are running? --> 6.1
HTH,
Thiago
01-17-2014 09:44 AM
Hey Thiago,
This is a bug in 6.1. Is the unexpected failover is because of a crash. from the SAP it looks to be SNMPD. have you tried 6.2(2a) or 6.2(6)?
share,
show system internal mts buffers summary
show system internal mts sup sap 61249 desc
show clock
Thanks,
Richard.
01-20-2014 06:49 AM
01-20-2014 08:56 AM
Can you try to upgrade to 6.2(2a) or 6.2(6) and tell me if you are seeing similar messages format again?
As i suspected it is snmpd SAP response and you should upgrade this box.
HTH
Richard
01-20-2014 09:47 AM
Richard,
I've seen this bug. Altough we were thinking on an alternative to fix this. Well, this Nexus 7k is our core...will take some time to get this upgraded.
What it is this SAP response??
Thanks for your help!
Thanks.
Thiago
01-20-2014 09:56 AM
Hi Thiago,
So basically we use MTS to communicate between different components of Nexus. Think SAP as a number which is used by this MTS for different protocols. For eg: SNMP. when more SNMP get requests, responses happen more inter process communication happens which leads to taking more memory from the que and then congestion -> Hang -> Crash.
Many problem will happen if the MTS que gets stuck. Its very important to keep the que clean.
HTH,
Richard.
01-20-2014 11:36 AM
Richard,
Is there a way to clean the MTS queue manually or it is a automatic process? And how to notice if MTS queue is full? copp should do it?
Thanks,
Thiago
01-20-2014 11:23 PM
Hey Thiago,
There is no way to keep the MTS Que clean manually, It will be freed and occupied dynamically. CoPP is different from MTS. Although CoPP can prevent MTS que full problems.
show system internal mts buffers summary should show you the SAP number which is congesting the que. After the same you can use the SAP number to determine which component or process it is. There are static and dynamic SAP's so 23 is SNMP i.e., there are known SAP's from which we can determine just by seeing but there are many which we cant and only way is to by show system internal mts sup sap
Once we identify we can determine how to stop that from occuping. Like in your case its SNMP, you can like add an ACL to drop the SNMP packets or you can take a capture find out who is sending more SNMP requests or you can stop the SNMP service completely.
completely your wish
HTH,
Ricky
*Rate if this is useful
01-21-2014 07:30 AM
Richard,
Thank you very much for such detailed explanation! I really appreciate it!
Thank you,
Thiago
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide