cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
2782
Views
1
Helpful
15
Replies

Issue on 9606R - %PLATFORM-3-ELEMENT_TMPFS_WARNING:

carl.townshend
Level 1
Level 1

Hi All

We have been having some issues with WCCP and PBR on a 9606R switch, running 17.3.1r software.

Basically when I configured some ACLs to be used for WCCP and PBR, it did not work, the ACLs simply had no matches on them, yet on a 4500X the same config was fine.

I noticed on the logs of the switches there were warnings all over it saying %PLATFORM-3-ELEMENT_TMPFS_WARNING:

Is this some sort of memory leak, will a reboot solve or do we need to upgrade the IOS? could it possibly crash on us?

I have a feeling its related to TCAM, hence why the ACLs aren't working.

Your thoughts guys?

 

15 Replies 15

Leo Laohoo
Hall of Fame
Hall of Fame

Post the complete output to the following commands: 

sh version
sh platform resource
sh platform software mount switch <SWITCH MEMBERS> r0 | include ^tmpfs.*tmp | exclude /tmp/

Cisco IOS XE Software, Version 17.03.01
Cisco IOS Software [Amsterdam], Catalyst L3 Switch Software (CAT9K_IOSXE), Version 17.3.1, RELEASE SOFTWARE (fc5)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2020 by Cisco Systems, Inc.
Compiled Fri 07-Aug-20 21:32 by mcpre


Cisco IOS-XE software, Copyright (c) 2005-2020 by cisco Systems, Inc.
All rights reserved. Certain components of Cisco IOS-XE software are
licensed under the GNU General Public License ("GPL") Version 2.0. The
software code licensed under GPL Version 2.0 is free software that comes
with ABSOLUTELY NO WARRANTY. You can redistribute and/or modify such
GPL code under the terms of GPL Version 2.0. For more details, see the
documentation or "License Notice" file accompanying the IOS-XE software,
or the applicable URL provided on the flyer accompanying the IOS-XE
software.


ROM: IOS-XE ROMMON
BOOTLDR: System Bootstrap, Version 17.3.1r[FC2], RELEASE SOFTWARE (P)

EUUKROC-CORE uptime is 3 years, 39 weeks, 2 days, 18 hours, 31 minutes
Uptime for this control processor is 3 years, 39 weeks, 2 days, 18 hours, 36 minutes
System returned to ROM by PowerOn
System restarted at 14:53:57 UTC Sat Feb 20 2021
System image file is "bootflash:packages.conf"
Last reload reason: PowerOn

 

This product contains cryptographic features and is subject to United
States and local country laws governing import, export, transfer and
use. Delivery of Cisco cryptographic products does not imply
third-party authority to import, export, distribute or use encryption.
Importers, exporters, distributors and users are responsible for
compliance with U.S. and local country laws. By using this product you
agree to comply with applicable laws and regulations. If you are unable
to comply with U.S. and local laws, return this product immediately.

A summary of U.S. laws governing Cisco cryptographic products may be found at:
http://www.cisco.com/wwl/export/crypto/tool/stqrg.html

If you require further assistance please contact us by sending email to
export@cisco.com.


Technology Package License Information:

------------------------------------------------------------------------------
Technology-package Technology-package
Current Type Next reboot
------------------------------------------------------------------------------
network-advantage Smart License network-advantage
dna-advantage Subscription Smart License dna-advantage
AIR License Level: AIR DNA Advantage
Next reload AIR license Level: AIR DNA Advantage


Smart Licensing Status: UNREGISTERED/EVAL EXPIRED

cisco C9606R (X86) processor (revision V01) with 2905019K/6147K bytes of memory.
Processor board ID FXS244300F0
323 Virtual Ethernet interface
192 TwentyFive Gigabit Ethernet interfaces
96 Ten Gigabit Ethernet interfaces
32768K bytes of non-volatile configuration memory.
16001560K bytes of physical memory.
11161600K bytes of Bootflash at bootflash:.
11161600K bytes of Bootflash at bootflash-2-0:.
1638400K bytes of Crash Files at crashinfo:.
1638400K bytes of Crash Files at crashinfo-2-0:.
937691463K bytes of SATA hard disk at disk0:.
937691463K bytes of SATA hard disk at disk0-2-0:.
11161600K bytes of Bootflash at bootflash-1-1:.
1638400K bytes of Crash Files at crashinfo-1-1:.
937691463K bytes of SATA hard disk at disk0-1-1:.
11161600K bytes of Bootflash at bootflash-2-1:.
1638400K bytes of Crash Files at crashinfo-2-1:.
937691463K bytes of SATA hard disk at disk0-2-1:.

Base Ethernet MAC Address : 54:9f:c6:22:a7:80
Motherboard Assembly Number : 4C57
Motherboard Serial Number : XXXX
Model Revision Number : V02
Motherboard Revision Number : 5
Model Number : C9606R
System Serial Number : XXX

Switch 02
---------
Base Ethernet MAC Address : 54:9f:c6:22:a8:00
Motherboard Assembly Number : 4C57
Motherboard Serial Number : XXX
Model Revision Number : V02
Motherboard Revision Number : 5
Model Number : C9606R
System Serial Number : XXX

 

#sh platform resource
**State Acronym: H - Healthy, W - Warning, C - Critical
Resource Usage Max Warning Critical State
----------------------------------------------------------------------------------------------------
Control Processor 5.37% 100% 90% 95% H
DRAM 14305MB(91%) 15626MB 90% 95% W
TMPFS 6800MB(43%) 15626MB 40% 50% W

#sh platform software mount switch 1 R0 | include ^tmpfs.*tmp | exclude /tmp/
tmpfs 78584 7922196 1% /tmp

 

 

Leo Laohoo
Hall of Fame
Hall of Fame

@carl.townshend wrote:
DRAM 14305MB(91%) 15626MB 90% 95% W

@carl.townshend 

Da fuq?

Forget about the TMPFS!  You've got bigger problems!

Control-plane's memory utilization is at 91% (and increasing).  Active Supervisor Card is about to crash!

Hi, oh dear. this doesn't sound good.

What are our options here guys?

What is causing this, is it a software bug?

Do we need to upgrade? if so what do we upgrade to?

We run stackwise virtual on these with 2 sups in each.


@carl.townshend wrote:

What are our options here guys?


Options are simple: 

1.  Reboot the affected Supervisor Card NOW

2.  Upgrade  the firmware to the latest 17.3.X

3.  Wait for the Supervisor to crash


@carl.townshend wrote:

What is causing this, is it a software bug?


Of  course.  I've been saying that IOS-XE memory leaks like a sieve.  Anything and everything can easily trigger a memory leak.  The main question is WHERE is the memory leak.  

I've had this discussion with the DNAC team several times and for several years (and they have decided to ignore them).  I have shown them multiple evidences that DNAC does not give a f*ck about memory utilization and memory leaks.  The "AI" is asleep at the wheel.  If this DNAC was to see a 91% memory utilization, it would keep saying that the health is 100% AOK, however, other cheaper and better NMS platforms would have triggered a warning at the early onset of 80% to 85%.  

I do not trust DNAC to "have my back":   We have performed multiple proactive actions because other NMS platforms (cheaper and better) have flagged issues.  We have never received any "critical" messages from DNAC.  Never have.  Never will.  

Hi Leo, some further questions if you don't mind

When you say reboot primary supervisor, how will this work in a stackwise virtual/quad sup setup?

To do that is it a simple reload command? would this not reboot both switches or would it reload the active sup only and fail to the peer?

What happens if we did reboot the active sup on chassis 1 (active)? will the second chassis sup take over and would traffic still forward on chassis 1 or will it reload the whole chassis 1 and will no longer forward?

Would redundancy force-switchover do the trick ?

cheers


@carl.townshend wrote:
When you say reboot primary supervisor, how will this work in a stackwise virtual/quad sup setup?

This is a 9600, right?   In a VSS, quad Sup is not yet supported so I am going to assume each chassis has a single supervisor card. 

For this memory leak, reboot the ACTIVE supervisor card because this is the card which is currently having a severe memory leak. 


@carl.townshend wrote:
Would redundancy force-switchover do the trick ?

Yes, however, if this does not work someone needs to go in and manually reboot the ACTIVE chassis.

I cannot "guarantee" what a Supervisor Card with 91% memory utilization is going to behave (and do) if the command "redundancy force-switchover" is entered.  The Supervisor Card may obey the command, may hang or crash.  So prepare for the worse.

Good luck.  

NOTE: 

The lesson learnt is very simple:  Because IOS-XE, as a whole, is very buggy, ALWAYS plan a maintenance cycle and reboot every 12 months. 

Hi Leo

These run stackwise virtual, similar to VSS and quad sup is actually supported.

Upgrading Catalyst 9600 Quad-Supervisor StackWise Virtual Setup with ISSU

My question is more, say if the redudancy force switchover command works, both chassis should still forward? and in "theory" we should have no outage ? again in theory

@carl.townshend

The Active Supervisor Card is already in "stress".  I would not want to take any chances by doing an upgrade via ISSU.  I can (almost) guarantee ISSU is going to fail because of the firmware version and the uptime.  

We have been told by a Cisco TAC engineer that he can "guarantee" that ISSU will fail if the current firmware is 17.3.X, 17.6.X and 17.9.X.  The Quad VSS is currently on 17.3.1. And last number is a "1".  I'm afraid all the signs are against you & ISSU.

Either do the failover or get someone to reseat the Active Supervisor Card.  Whatever is the decision, do it ASAP.

Leo Laohoo
Hall of Fame
Hall of Fame

@carl.townshend

If this switch has not yet been rebooted, please post the complete output to the following command: 

sh processes memory platform sorted 

I want to see what processes are causing the issue.  

Is this switch being monitored by DNAC?

shambhu.kumar
Spotlight
Spotlight

Hello carl,

Is it Catalyst 9600 Dual Supervisor-1 Module with StackWise Virtual or Quad Supervisor-1 Module with StackWise Virtual

Are you managing this switch by DNAC?