Problem with HDD space ASR9K Series A9K-RSP-4G
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-14-2015 02:11 PM
HDD space on ASR9K Series A9K-RSP-4G running asr9k-os-mbi-4.3.4.sp4-1.0.0 is going to have full disk for unknown reason. No ugprade has been performed since Nov 2014.
RP/0/RSP0/CPU0:ROUTER(admin)#sh filesystem
Sat Nov 8 23:17:08.717 EET
File Systems:
Size(b) Free(b) Type Flags Prefixes
1610596352 674449408 flash-disk rw disk0:
RP/0/RSP0/CPU0:ROUTER#show filesystem
Mon Sep 14 23:55:09.660 EET
File Systems:
Size(b) Free(b) Type Flags Prefixes
1610596352 2518016 flash-disk rw disk0:
I could not figure out, what is filling up disk. The exaclty same behaviour is observed on 6 routers, same HW and XR.
RP/0/RSP0/CPU0:ROUTER#dir disk0:
Tue Sep 15 00:00:26.671 EET
Directory of disk0:
6170 -r-- 393212 Thu Jan 1 02:03:08 1970 .bitmap
17 -r-- 1531904 Thu Jan 1 02:03:08 1970 .inodes
18 -rw- 0 Thu Jan 1 02:03:08 1970 .boot
19 -rw- 0 Thu Jan 1 02:03:08 1970 .altboot
6175 -r-- 65536 Thu Jan 1 02:03:08 1970 .longfilenames
6176 drwx 4096 Wed May 23 01:52:22 2012 LOST.DIR
6177 drwx 4096 Mon Sep 3 15:59:26 2012 aaa
6178 drwx 4096 Fri Jul 13 22:47:59 2012 config
6179 drwx 4096 Fri Jul 13 22:48:38 2012 snmp
6180 drwx 4096 Fri Jul 13 22:49:21 2012 eem_rdsfs
4648432 -rw- 32136 Sat Dec 6 00:41:41 2014 sysdb_shared_sc_malloc_dump_node0_RSP0_CPU0_2
6182 drwx 4096 Sun Nov 9 00:03:33 2014 instdb
24869382 -rw- 8009 Mon Dec 8 16:53:39 2014 sysdb_shared_sc_malloc_dump_node0_RSP0_CPU0_3
24869381 -rw- 8009 Wed Dec 10 23:54:57 2014 sysdb_shared_sc_malloc_dump_node0_RSP0_CPU0_4
6193 drwx 4096 Fri Jul 13 22:49:22 2012 cepki
6194 -rw- 0 Mon Jul 9 19:03:00 2012 tar_report.txt
6195 -rw- 12832 Fri Nov 7 11:44:45 2014 sam_certdb
6196 drwx 4096 Fri Jul 13 22:48:35 2012 license
6200 -rw- 126 Fri Nov 7 11:44:45 2014 sam_crldb
5006548 drwx 4096 Fri Nov 7 11:16:40 2014 asr9K-doc-supp-4.3.4
5010636 drwx 4096 Fri Nov 7 11:16:43 2014 asr9k-doc-px-4.3.4
5010642 drwx 4096 Fri Nov 7 11:18:33 2014 asr9k-fpd-4.3.4
5010757 drwx 4096 Fri Nov 7 11:18:36 2014 asr9k-fpd-px-4.3.4
5010762 drwx 4096 Fri Nov 7 11:18:46 2014 asr9k-k9sec-supp-4.3.4
5010765 drwx 4096 Fri Nov 7 11:19:11 2014 iosxr-security-4.3.4
5142016 drwx 4096 Fri Nov 7 11:19:15 2014 asr9k-k9sec-px-4.3.4
5142024 drwx 4096 Fri Nov 7 11:19:27 2014 asr9k-mcast-supp-4.3.4
5142064 drwx 4096 Fri Nov 7 11:19:57 2014 iosxr-mcast-4.3.4
7884160 drwx 4096 Thu Jun 28 23:27:34 2012 np
6024168 drwx 4096 Fri Nov 7 11:20:01 2014 asr9k-mcast-px-4.3.4
6024176 drwx 4096 Fri Nov 7 11:20:16 2014 asr9k-mgbl-supp-4.3.4
6024192 drwx 4096 Fri Nov 7 11:20:34 2014 iosxr-mgbl-4.3.4
6109252 drwx 4096 Fri Nov 7 11:20:38 2014 asr9k-mgbl-px-4.3.4
6192651 drwx 4096 Fri Nov 7 11:21:52 2014 asr9k-ce-4.3.4
6192693 drwx 4096 Fri Nov 7 11:22:16 2014 asr9k-cpp-4.3.4
6303776 drwx 4096 Fri Nov 7 11:22:20 2014 asr9k-scfclient-4.3.4
6303781 drwx 4096 Fri Nov 7 11:22:25 2014 asr9k-diags-supp-4.3.4
6303797 drwx 4096 Fri Nov 7 11:23:45 2014 asr9k-fwding-4.3.4
6742015 drwx 4096 Sat Nov 8 22:40:37 2014 asr9k-base-4.3.4
7697125 drwx 4096 Fri Nov 7 11:25:51 2014 asr9k-os-mbi-4.3.4
7697131 drwx 4096 Fri Nov 7 11:26:16 2014 iosxr-ce-4.3.4
7864609 drwx 4096 Fri Nov 7 11:26:22 2014 iosxr-diags-4.3.4
8524528 drwx 4096 Fri Nov 7 11:27:00 2014 iosxr-routing-4.3.4
9103188 drwx 4096 Fri Nov 7 11:30:48 2014 iosxr-fwding-4.3.4
8887063 drwx 4096 Fri Nov 7 11:34:47 2014 iosxr-infra-4.3.4
10212821 drwx 4096 Fri Nov 7 11:34:54 2014 asr9k-mini-px-4.3.4
10212848 drwx 4096 Fri Nov 7 11:35:36 2014 iosxr-mpls-4.3.4
10607935 drwx 4096 Fri Nov 7 11:35:42 2014 asr9k-mpls-px-4.3.4
10607940 drwx 4096 Fri Nov 7 11:36:11 2014 asr9k-fwding-4.3.4.CSCug75299-1.0.0
10608034 drwx 4096 Fri Nov 7 11:36:35 2014 asr9k-os-mbi-4.3.4.CSCug75299-1.0.0
10608040 drwx 4096 Fri Nov 7 11:36:41 2014 asr9k-px-4.3.4.CSCug75299-1.0.0
10608051 drwx 4096 Fri Nov 7 11:37:14 2014 asr9k-base-4.3.4.CSCui94441-1.0.0
11680764 drwx 4096 Fri Nov 7 11:37:39 2014 asr9k-os-mbi-4.3.4.CSCui94441-1.0.0
11680770 drwx 4096 Fri Nov 7 11:37:45 2014 asr9k-px-4.3.4.CSCui94441-1.0.0
11680781 drwx 4096 Fri Nov 7 11:38:14 2014 iosxr-infra-4.3.4.CSCul58246-1.0.0
11680874 drwx 4096 Fri Nov 7 11:38:41 2014 asr9k-os-mbi-4.3.4.CSCul58246-1.0.0
11680880 drwx 4096 Fri Nov 7 11:38:47 2014 asr9k-px-4.3.4.CSCul58246-1.0.0
11680891 drwx 4096 Fri Nov 7 11:40:22 2014 iosxr-infra-4.3.4.sp4-1.0.0
13069819 drwx 4096 Fri Nov 7 11:40:29 2014 asr9k-adv-video-supp-4.3.4.sp4-1.0.0
13069845 drwx 4096 Fri Nov 7 11:40:38 2014 asr9k-9000v-nV-supp-4.3.4.sp4-1.0.0
13069851 drwx 4096 Fri Nov 7 11:40:47 2014 iosxr-mpls-4.3.4.sp4-1.0.0
13457736 drwx 4096 Fri Nov 7 11:41:10 2014 iosxr-routing-4.3.4.sp4-1.0.0
14109085 drwx 4096 Fri Nov 7 11:41:44 2014 iosxr-fwding-4.3.4.sp4-1.0.0
14378362 drwx 4096 Fri Nov 7 11:42:11 2014 asr9k-os-mbi-4.3.4.sp4-1.0.0
14378368 drwx 4096 Fri Nov 7 11:42:22 2014 asr9k-cpp-4.3.4.sp4-1.0.0
14378398 drwx 4096 Fri Nov 7 11:42:46 2014 asr9k-fwding-4.3.4.sp4-1.0.0
15534680 drwx 4096 Fri Nov 7 11:43:02 2014 iosxr-mcast-4.3.4.sp4-1.0.0
15648000 drwx 4096 Fri Nov 7 11:43:12 2014 iosxr-ce-4.3.4.sp4-1.0.0
15648077 drwx 4096 Fri Nov 7 11:43:26 2014 iosxr-mgbl-4.3.4.sp4-1.0.0
15859258 drwx 4096 Fri Nov 7 11:43:47 2014 asr9k-base-4.3.4.sp4-1.0.0
24582229 drwx 4096 Fri Nov 7 11:44:01 2014 iosxr-bng-4.3.4.sp4-1.0.0
24582340 drwx 4096 Fri Nov 7 11:44:11 2014 iosxr-adv-video-4.3.4.sp4-1.0.0
24869335 drwx 4096 Fri Nov 7 11:44:18 2014 asr9k-px-4.3.4.sp4-1.0.0
24869384 -rw- 164277 Sat Nov 8 23:32:54 2014 sysdb_shared_sc_malloc_dump_node0_RSP0_CPU0_1
1610596352 bytes total (2518016 bytes free)
Output from "du" is in attachment.
I was given document https://supportforums.cisco.com/document/145991/managing-disk-space-rsp-4grsp-8g-aka-rsp2 but such does not say, how to find out root cause for full disk.
Do you have any idea or hint please?
Thank you.
Martin Oles
- Labels:
-
XR OS and Platforms
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-14-2015 09:11 PM
Hi Martin,
I don't anticipate anything is wrong here, you might have well just eaten into all your diskspace, lets do some house keeping first, it appears that you have not repartitioned your disks, you have a 2G disk, and your only making 1.6G available following the following instructions from the link you shared earlier on managing disk space to get an additional 350M back. Once completed keep an eye on displace that is being used. You also want to download CSM, and see if those SMUs installed are part of SP4, in which case you can remove the SMUs(if they are superseded by SP4).
https://supportforums.cisco.com/document/145991/managing-disk-space-rsp-4grsp-8g-aka-rsp2
1 Disk Space Conservation
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-15-2015 01:13 AM
Thank you for this suggestion, but doing house keeping I will make just only space, which will be taken. As I wrote before, device was just "running", no updates, no SMU installed. How device was able to eat all diskpace? There is maybe something which I am somehow missing. I would not expect correctly configured device to run out of disk space withouth any obvious reason.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-15-2015 01:39 AM
Lets do the Math
4.3.4 base is ~900m
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-15-2015 06:12 AM
Thank you, but again, it does not sum up what I am observing.
I have run "dir /recurse disk0:" and then I have counted occupied space by every file. All files together 930Mbytes
After last instalation actions occupaid space was 936Mbytes. So, very roughly those figures are close together.
I am really puzzled about such behaviour.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-15-2015 07:36 AM
you probably are not accounting for hidden files, driver logs, tmp space usage, (install) database. if you have done a lot of (large) commits this will start to add to the overall consumption too.
with the image size that is there today, and the service packs which are rather large also, it is highly advisable to use the run repart -d to reclaim the 300M reserved space which gives you some breathign room.
also disabling mirroring between disk0 and disk1 gives you 2 independent disks to use for install operations.
cheers
xander
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-15-2015 01:29 PM
Just to clarify my problem, after instalation asr9k-os-mbi-4.3.4.sp4-1.0.0 I had 670MB space, after few large commits and 10 months in production disk is full. Repartition yes, it will make me additional 300MB but in this situation such is solution for 5 months. What really is filling the disk? It does not seems as sort of hardware problem as such is visible on 6 routers simultaniously.
How please it is possible to see those hidden files and logs?
Thank you very much for help.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-17-2015 01:34 AM
So far, even with significant effort, no root cause for full disk has been found.
I have ensured, that XR upgrade which took place 10 months ago did not take all disk space, I have checked also standby RSP, output from boht RSPs for comparation:
RP/0/RSP0/CPU0:ROUTER#show filesystem
File Systems:
Size(b) Free(b) Type Flags Prefixes
1610596352 2518016 flash-disk rw disk0:
RP/0/RSP0/CPU0:ROUTER#show filesystem location 0/RSP1/CPU0
File Systems:
Size(b) Free(b) Type Flags Prefixes
1610596352 691096064 flash-disk rw disk0:
Files between active RSP and backup RSP were compared, there is minor difference, but nothing significant.
Furhter more, one of the culprit could be automated system for config backup, which is issuing following commands on daily basis:
terminal width 0
copy running-config disk0:NA_config
sftp disk0:123456789.tmp username:password@10.20.30.40:123456789.tmp vrf VRF
del disk0:123456789.tmp
Investigating furhter more I have also found, that on all affected devices is process sftp_server running many times (298 in case of this particular router):
RP/0/RSP0/CPU0:ROUTER#sh processes | include sftp_server
12345 1 1 16K 10 Receive 5868:21:50:0039 0:00:00:0065 sftp_server
12346 1 1 16K 10 Receive 5862:22:31:0770 0:00:00:0054 sftp_server
12347 1 1 16K 10 Receive 5856:22:46:0024 0:00:00:0042 sftp_server
12348 1 0 16K 10 Receive 5850:21:16:0232 0:00:00:0055 sftp_server
I have tried to kill few processes by "process shutdow jobID", but no affect on disk space observed. Perhaps I would need to shutdown all sftp_server processes to see effect.
Currently I have not found exact proof, if such findings are related. tmp files from backup tool are not visible on disk.
Thank you for your opinion.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-17-2015 06:01 AM
ls -a should show hidden files.
To find large files we can do something like run du -aks /disk0: | grep -E -e '[0-9]{3,} /disk0'
To find new files we would need to take a snapshot of the entire disk and then later do a diff. We could try run ls -laR /disk0: to get this. Another option would be to check for all files modified in Sep with a | grep -E 'disk0|Sep' so that you can also see which directory the file is in.
Many files change constantly, others such as the commit DB will build over time as mentioned by others. I would recommend opening a TAC SR to check this further, or just email me and I can assist.
As you noticed there is a difference in both RSPs, a switchover will often fix the issue and clear out any tmp data. Commit data and others should be synced between the two RSPs.
Sam
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-17-2015 07:03 AM
Hi Sam,
I have checked again disk via "du -aks /disk0: | grep -E -e '[0-9]{3,} /disk0'" . There is no unusualy big file, no exessive repating similar files. Result from this count also shows, that on disk is 915349kB occupied, but what happended with the rest?
Please note, that there are 6 routers with the same behaviour. I am having for such service request, but so far all findings are also inconclusive. Simply written, 620MB "dissapeared".
My suspicious is, that process sftp_server is creating tmp and such is blocking space, but I am so far unable to find such. Maybe I would try to "kill off" all sftp_server processes to see, what would happned. If each process is taking ammount of data and those are from memory swapped to disk0: , then such could explain missing space.
I am very well aware, that doing RSP switchover might fix problem with full disk, also cleaning up unused files would help, also repart would help. But without knowing root cause it will be again matter of time, when disk0 will be filled up again.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-22-2015 02:12 PM
Root cause has been found. There is known CSCuf81591 , hanging sftp_server was taking huge amount of virtual memory, which is swapped to disk0 .
As mentioned above, I am using SFTP server for daily running-config backup.
terminal width 0
copy running-config disk0:NA_config
sftp disk0:123456789.tmp username:password@10.20.30.40:123456789.tmp vrf VRF
del disk0:123456789.tmp
There is either, CSCuf81591 or some form to incompatibility, resulting in sftp_server proccess running in receive mode:
-------------------------------------------------------------------------------
Job Id: 12345
PID: 25375552
Executable name: sftp_server
Instance ID: 1
Respawn: OFF
core: TEXT SHAREDMEM MAINMEM
65936 1 1 16K 10 Receive 4693:44:45:0225 0:00:00:0082 sftp_server
-------------------------------------------------------------------------------
As visible above, backup server is then deleting file. As there is on QNX4 filesystem file lock implemented, it possibly happens, that file is created, then marked as deleted, but due to lock by sftp_server process never actualy freed.
Original status:
RP/0/RSP0/CPU0:ROUTER2#show filesystem
Thu Sep 17 22:28:50.388
File Systems:
Size(b) Free(b) Type Flags Prefixes
1610596352 527872 flash-disk rw disk0:
After "killing season" and 300 sftp_server removed with "process shutdown JobID":
RP/0/RSP0/CPU0:ROUTER2#sh filesystem
Thu Sep 17 22:51:24.199
File Systems:
Size(b) Free(b) Type Flags Prefixes
1610596352 673181696 flash-disk rw disk0:
Such meets situation just after upgrade.
Thank you all for valuable tips and advices.
