08-01-2019 12:07 AM - edited 01-20-2021 08:36 PM
Terminology Used:
NOTE:
The cause of this bug is a file, named "/tmp/nyq_rsynch.log", getting bloated (it just keeps getting bigger daily). Once the partition hits 100%, the physical switch crashes.
To check the size of the file, use the command "show platform software mount switch [active | standby | 1 to 16 ] r0 | include ^tmpf.*tmp | exclude /tmp/"
Remember, deleting the nyq_rsych.log file is only a workaround. The file gets created and continues to grow even if the file is repeatedly deleted. For a permanent fix, upgrade to a version not affected by this bug.
Ideally, the value should be <9%. If the value is >80%, call TAC.
Switch#sh platform soft mount switch 4 r0 | i ^tmpfs.*tmp | e /tmp/ tmpfs 179164 3676488 5% /tmp
The item in RED (above) is 5%. This is good (normal). Nothing to worry about.
Switch#sh platform soft mount switch 1 r0 | i ^tmpfs.*tmp | e /tmp/ tmpfs 2746588 1109064 72% /tmp
The item in RED (above) is 72%. Contact TAC when possible.
Verify after the file has been deleted by using the command "df -h" to ensure the file has been successfully erased.
IMPORTANT:
If that value is >90%, put the donut down and call TAC immediately. The switch has less than a week before it will crash.
Help TAC: Point or direct the TAC engineer to this page so they immediately know what to do.
HIDDEN GEM: Cisco Catalyst 9200/9300 can/may support UP TO sixteen (16) switches in a stack. Use the enable command "switch ?" and the notice option is to enumerate a switch member (of a stack) from 1 to 16.
CRASHINFO/CRASH LOGS:
If/When a switch crashes, go to the crashinfo directory (Command: dir crashinfo-SWITCHMEMBER) and there should be a crashlog file that looks like this naming convention: system-report_SWITCHMEMBER_YEARMMDD-HHMMSS-TIMEZONE.tar.gz.
Open the file and look under the "tmp" folder and there should be another file called HOSTNAME_SWITCHMEMBER_RP_0-bootuplog-YEARMMDD-HHMMSS-TIMEZONE.log
Scroll all the way to the bottom and you should see a line that goes like this:
HOSTNAME_SWITCHMEMBER_RP_0 cmand[11955]: %CMRP-0-CHASFS_PROPERTY_SET: Failed to write chassis filesystem object env/rp/0/sensor/0 property data because No space left on device -Traceback= 1#1173ffafaf74e8cab963841cc1ed720b errmsg:7FDCB3A6C000+17C9 :5616EC5DC000+BDF5B :5616EC5DC000+C2C1C :5616EC5DC000+AF68C evlib:7FDCB450B000+A082 evlib:7FDCB450B000+8B1E evlib:7FDCB450B000+950C orchestrator_lib:7FDCA657A000+CB58 luajit:7FDC8E6FD000+62D76 luajit:7FDC8E6FD000+4B35A luajit:7FDC8E6FD000+333E9 luajit:7FDC8E6FD000+60DD7 luajit:7FDC8E6FD000+1D11D orche
NOTE:
Here are the steps TAC needs to undertake to delete the offending file:
1. Command: request consent generate shell-access auth-timeout <MINUTES>
NOTE: “MINUTES” means how long will the shell access last.
2. The stack will generate a long string (aka token) which TAC will take into an internal DB in order to counter-generate a challenge or response.
3. Command: request consent accept shell-access <TAC-generated RESPONSE TOKEN>
4. Wait for about 60 to 90 seconds for the stack to “compute” the challenge.
5. If the token was generated correctly, a line will say "% Consent token authorization success”.
NOTE:
I do not recommend using the option “switch ACTIVE” or “switch STANDBY” and would prefer using the command “switch <1 to 16>”.
6. To go to the switch, use the command “request platform software system shell switch <1 to 16> r0”. An “are you sure” question will appear. Respond with “Y” to proceed or “N” to chicken out (below).
Activity within this shell can jeopardize the functioning of the system. Are you sure you want to continue? [y/n]
7. In shell mode, please observe the banner and the cursor:
DATE TIME : Shell access was granted to user <anon>; Trace file: , /crashinfo/tracelogs/system_shell_R0-0.11225_0.YEARMODAHOUR.bin ********************************************************************** Activity within this shell can jeopardize the functioning of the system. Use this functionality only under supervision of Cisco Support. Session will be logged to: crashinfo:tracelogs/system_shell_R0-0.11225_0.YEARMODAHOUR.bin ********************************************************************** [SWITCHNAME_RP_0:/]$
8. [OPTIONAL] In shell, check first using the command “df -h” and look at the /tmp folder (see below):
[SWITCHNAM_MEMBER_RP_0:/]$ df -h Filesystem Size Used Avail Use% Mounted on tmpfs 3.7G 3.4G 386M 90% /tmp
9. To delete the file, issue the command “rm [-f] /tmp/nyq_rsync.log”.
WARNING:
[SWITCHNAM_MEMBER_RP_0:/]$ rm /tmp/nyq_rsync.log
10. [OPTINAL] Verify again by using the command “df -h” and compare the before-and-after output of the /tmp folder.
[SWITCHNAM_MEMBER_RP_0:/]$ df -h Filesystem Size Used Avail Use% Mounted on tmpfs 3.7G 133M 3.6G 4% /tmp
11. Use the command "exit" to get out of shell.
NOTE:
If there is a need to go to another stack member, then after step 11, go back to step 6 again.
12. IMPORTANT: Terminate the shell access using the command: request consent-token terminate-auth shell-access
Update:
URGENT Update: