Solved: Re: double timestamps in Nexus 7000

Thiago Henriques · ‎01-13-2014

All,

Recently, we've had a unexpected failover in our data center and we noticed something interesting. In the logs of your Nexus 7000, there was double timestamps..does anyone knows the reason why? Maybe because the active sup was about to be the standby? I meant, right before the failover?

2014 Jan 12 08:40:17 comet2-7000 Jan 12 08:32:04 %KERN-2-SYSTEM_MSG: [37130227.530994] node=4 sap=61249 rq=5445(961704) lq=0(0) pq=0(0) nq=0(0) sq=0(0) buf_in_transit=0, bytes_in_transit=0 - kernel

2014 Jan 12 08:39:47 comet2-7000 Jan 12 08:39:47 %KERN-2-SYSTEM_MSG: [3021309.411438] Switchover started by redundancy driver - kernel

2014 Jan 12 08:39:47 comet2-7000 %SYSMGR-2-HASWITCHOVER_PRE_START: This supervisor is becoming active (pre-start phase).

2014 Jan 12 08:39:47 comet2-7000 %SYSMGR-2-HASWITCHOVER_START: Supervisor 6 is becoming active.

2014 Jan 12 08:39:48 comet2-7000 %STP-6-FIRST_BPDU_TX: First BPDU transmitted

2014 Jan 12 08:39:48 comet2-7000 %SYSMGR-2-SWITCHOVER_OVER: Switchover completed.

You see, the first line shows a double timestamp, why??

Thanks,

Thiago

Richard Michael · ‎01-20-2014

Hey Thiago,

There is no way to keep the MTS Que clean manually, It will be freed and occupied dynamically. CoPP is different from MTS. Although CoPP can prevent MTS que full problems.

show system internal mts buffers summary should show you the SAP number which is congesting the que. After the same you can use the SAP number to determine which component or process it is. There are static and dynamic SAP's so 23 is SNMP i.e., there are known SAP's from which we can determine just by seeing but there are many which we cant and only way is to by show system internal mts sup sap description.

Once we identify we can determine how to stop that from occuping. Like in your case its SNMP, you can like add an ACL to drop the SNMP packets or you can take a capture find out who is sending more SNMP requests or you can stop the SNMP service completely.

completely your wish

HTH,

Ricky

*Rate if this is useful

View solution in original post

mtsb · ‎01-13-2014

Hi Thiago,

I haven't seen anything before like this. Did you capture the above logs while it is printing on the console or did you run the "show logging" command and checked this?

Thanks,

Madhu

Thiago Henriques · ‎01-14-2014

Madhu,

sh logging. I want to understand why double timestamps...I couldn't find any document explaining this. If you know, please let me know.

Thanks,

Thiago

mtsb · ‎01-14-2014

Hi Thiago,

I dont have a 7K to play with handy. Let me see if I can grab one and try this out.

1. Do you see it everytime when you do a swithcover or just a one time occurence?

2. And between the 2 timestamps which one is the correct one?

3. Is this the only line that shows this or many more lines?

4. What is the version you are running?

Thanks,

Madhu

Thiago Henriques · ‎01-17-2014

Madhu,

Here it go the responses:

1. Do you see it everytime when you do a swithcover or just a one time occurence? --> this is the first time a swtichover occurred

2. And between the 2 timestamps which one is the correct one? --> This is what I am trying to understand

3. Is this the only line that shows this or many more lines? --> I have more lines

4. What is the version you are running? --> 6.1

HTH,

Thiago

Richard Michael · ‎01-17-2014

Hey Thiago,

This is a bug in 6.1. Is the unexpected failover is because of a crash. from the SAP it looks to be SNMPD. have you tried 6.2(2a) or 6.2(6)?

share,

show system internal mts buffers summary

show system internal mts sup sap 61249 desc

show clock

Thanks,

Richard.

Thiago Henriques · ‎01-20-2014

Richard,

Herewith the output.

Thanks,

Thiago

Richard Michael · ‎01-20-2014

Can you try to upgrade to 6.2(2a) or 6.2(6) and tell me if you are seeing similar messages format again?

As i suspected it is snmpd SAP response and you should upgrade this box.

HTH

Richard

Thiago Henriques · ‎01-20-2014

Richard,

I've seen this bug. Altough we were thinking on an alternative to fix this. Well, this Nexus 7k is our core...will take some time to get this upgraded.

What it is this SAP response??

Thanks for your help!

Thanks.

Thiago

Richard Michael · ‎01-20-2014

Hi Thiago,

So basically we use MTS to communicate between different components of Nexus. Think SAP as a number which is used by this MTS for different protocols. For eg: SNMP. when more SNMP get requests, responses happen more inter process communication happens which leads to taking more memory from the que and then congestion -> Hang -> Crash.

Many problem will happen if the MTS que gets stuck. Its very important to keep the que clean.

HTH,

Richard.

Thiago Henriques · ‎01-20-2014

Richard,

Is there a way to clean the MTS queue manually or it is a automatic process? And how to notice if MTS queue is full? copp should do it?

Thanks,

Thiago

Richard Michael · ‎01-20-2014

Hey Thiago,

There is no way to keep the MTS Que clean manually, It will be freed and occupied dynamically. CoPP is different from MTS. Although CoPP can prevent MTS que full problems.

show system internal mts buffers summary should show you the SAP number which is congesting the que. After the same you can use the SAP number to determine which component or process it is. There are static and dynamic SAP's so 23 is SNMP i.e., there are known SAP's from which we can determine just by seeing but there are many which we cant and only way is to by show system internal mts sup sap description.

Once we identify we can determine how to stop that from occuping. Like in your case its SNMP, you can like add an ACL to drop the SNMP packets or you can take a capture find out who is sending more SNMP requests or you can stop the SNMP service completely.

completely your wish

HTH,

Ricky

*Rate if this is useful

Thiago Henriques · ‎01-21-2014

Richard,

Thank you very much for such detailed explanation! I really appreciate it!

Thank you,

Thiago