cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
2971
Views
10
Helpful
10
Replies
Martin Oles
Beginner

Problem with HDD space ASR9K Series A9K-RSP-4G

HDD space on ASR9K Series A9K-RSP-4G running asr9k-os-mbi-4.3.4.sp4-1.0.0 is going to have full disk for unknown reason. No ugprade has been performed since Nov 2014.

RP/0/RSP0/CPU0:ROUTER(admin)#sh filesystem
Sat Nov  8 23:17:08.717 EET
File Systems:
     Size(b)     Free(b)        Type  Flags  Prefixes
  1610596352   674449408  flash-disk     rw  disk0:

 

RP/0/RSP0/CPU0:ROUTER#show filesystem
Mon Sep 14 23:55:09.660 EET
File Systems:
     Size(b)     Free(b)        Type  Flags  Prefixes
  1610596352     2518016  flash-disk     rw  disk0:

 

I could not figure out, what is filling up disk. The exaclty same behaviour is observed on 6 routers, same HW and XR.

RP/0/RSP0/CPU0:ROUTER#dir disk0:
Tue Sep 15 00:00:26.671 EET

Directory of disk0:

6170        -r--  393212      Thu Jan  1 02:03:08 1970  .bitmap
17          -r--  1531904     Thu Jan  1 02:03:08 1970  .inodes
18          -rw-  0           Thu Jan  1 02:03:08 1970  .boot
19          -rw-  0           Thu Jan  1 02:03:08 1970  .altboot
6175        -r--  65536       Thu Jan  1 02:03:08 1970  .longfilenames
6176        drwx  4096        Wed May 23 01:52:22 2012  LOST.DIR
6177        drwx  4096        Mon Sep  3 15:59:26 2012  aaa
6178        drwx  4096        Fri Jul 13 22:47:59 2012  config
6179        drwx  4096        Fri Jul 13 22:48:38 2012  snmp
6180        drwx  4096        Fri Jul 13 22:49:21 2012  eem_rdsfs
4648432     -rw-  32136       Sat Dec  6 00:41:41 2014  sysdb_shared_sc_malloc_dump_node0_RSP0_CPU0_2
6182        drwx  4096        Sun Nov  9 00:03:33 2014  instdb
24869382    -rw-  8009        Mon Dec  8 16:53:39 2014  sysdb_shared_sc_malloc_dump_node0_RSP0_CPU0_3
24869381    -rw-  8009        Wed Dec 10 23:54:57 2014  sysdb_shared_sc_malloc_dump_node0_RSP0_CPU0_4
6193        drwx  4096        Fri Jul 13 22:49:22 2012  cepki
6194        -rw-  0           Mon Jul  9 19:03:00 2012  tar_report.txt
6195        -rw-  12832       Fri Nov  7 11:44:45 2014  sam_certdb
6196        drwx  4096        Fri Jul 13 22:48:35 2012  license
6200        -rw-  126         Fri Nov  7 11:44:45 2014  sam_crldb
5006548     drwx  4096        Fri Nov  7 11:16:40 2014  asr9K-doc-supp-4.3.4
5010636     drwx  4096        Fri Nov  7 11:16:43 2014  asr9k-doc-px-4.3.4
5010642     drwx  4096        Fri Nov  7 11:18:33 2014  asr9k-fpd-4.3.4
5010757     drwx  4096        Fri Nov  7 11:18:36 2014  asr9k-fpd-px-4.3.4
5010762     drwx  4096        Fri Nov  7 11:18:46 2014  asr9k-k9sec-supp-4.3.4
5010765     drwx  4096        Fri Nov  7 11:19:11 2014  iosxr-security-4.3.4
5142016     drwx  4096        Fri Nov  7 11:19:15 2014  asr9k-k9sec-px-4.3.4
5142024     drwx  4096        Fri Nov  7 11:19:27 2014  asr9k-mcast-supp-4.3.4
5142064     drwx  4096        Fri Nov  7 11:19:57 2014  iosxr-mcast-4.3.4
7884160     drwx  4096        Thu Jun 28 23:27:34 2012  np
6024168     drwx  4096        Fri Nov  7 11:20:01 2014  asr9k-mcast-px-4.3.4
6024176     drwx  4096        Fri Nov  7 11:20:16 2014  asr9k-mgbl-supp-4.3.4
6024192     drwx  4096        Fri Nov  7 11:20:34 2014  iosxr-mgbl-4.3.4
6109252     drwx  4096        Fri Nov  7 11:20:38 2014  asr9k-mgbl-px-4.3.4
6192651     drwx  4096        Fri Nov  7 11:21:52 2014  asr9k-ce-4.3.4
6192693     drwx  4096        Fri Nov  7 11:22:16 2014  asr9k-cpp-4.3.4
6303776     drwx  4096        Fri Nov  7 11:22:20 2014  asr9k-scfclient-4.3.4
6303781     drwx  4096        Fri Nov  7 11:22:25 2014  asr9k-diags-supp-4.3.4
6303797     drwx  4096        Fri Nov  7 11:23:45 2014  asr9k-fwding-4.3.4
6742015     drwx  4096        Sat Nov  8 22:40:37 2014  asr9k-base-4.3.4
7697125     drwx  4096        Fri Nov  7 11:25:51 2014  asr9k-os-mbi-4.3.4
7697131     drwx  4096        Fri Nov  7 11:26:16 2014  iosxr-ce-4.3.4
7864609     drwx  4096        Fri Nov  7 11:26:22 2014  iosxr-diags-4.3.4
8524528     drwx  4096        Fri Nov  7 11:27:00 2014  iosxr-routing-4.3.4
9103188     drwx  4096        Fri Nov  7 11:30:48 2014  iosxr-fwding-4.3.4
8887063     drwx  4096        Fri Nov  7 11:34:47 2014  iosxr-infra-4.3.4
10212821    drwx  4096        Fri Nov  7 11:34:54 2014  asr9k-mini-px-4.3.4
10212848    drwx  4096        Fri Nov  7 11:35:36 2014  iosxr-mpls-4.3.4
10607935    drwx  4096        Fri Nov  7 11:35:42 2014  asr9k-mpls-px-4.3.4
10607940    drwx  4096        Fri Nov  7 11:36:11 2014  asr9k-fwding-4.3.4.CSCug75299-1.0.0
10608034    drwx  4096        Fri Nov  7 11:36:35 2014  asr9k-os-mbi-4.3.4.CSCug75299-1.0.0
10608040    drwx  4096        Fri Nov  7 11:36:41 2014  asr9k-px-4.3.4.CSCug75299-1.0.0
10608051    drwx  4096        Fri Nov  7 11:37:14 2014  asr9k-base-4.3.4.CSCui94441-1.0.0
11680764    drwx  4096        Fri Nov  7 11:37:39 2014  asr9k-os-mbi-4.3.4.CSCui94441-1.0.0
11680770    drwx  4096        Fri Nov  7 11:37:45 2014  asr9k-px-4.3.4.CSCui94441-1.0.0
11680781    drwx  4096        Fri Nov  7 11:38:14 2014  iosxr-infra-4.3.4.CSCul58246-1.0.0
11680874    drwx  4096        Fri Nov  7 11:38:41 2014  asr9k-os-mbi-4.3.4.CSCul58246-1.0.0
11680880    drwx  4096        Fri Nov  7 11:38:47 2014  asr9k-px-4.3.4.CSCul58246-1.0.0
11680891    drwx  4096        Fri Nov  7 11:40:22 2014  iosxr-infra-4.3.4.sp4-1.0.0
13069819    drwx  4096        Fri Nov  7 11:40:29 2014  asr9k-adv-video-supp-4.3.4.sp4-1.0.0
13069845    drwx  4096        Fri Nov  7 11:40:38 2014  asr9k-9000v-nV-supp-4.3.4.sp4-1.0.0
13069851    drwx  4096        Fri Nov  7 11:40:47 2014  iosxr-mpls-4.3.4.sp4-1.0.0
13457736    drwx  4096        Fri Nov  7 11:41:10 2014  iosxr-routing-4.3.4.sp4-1.0.0
14109085    drwx  4096        Fri Nov  7 11:41:44 2014  iosxr-fwding-4.3.4.sp4-1.0.0
14378362    drwx  4096        Fri Nov  7 11:42:11 2014  asr9k-os-mbi-4.3.4.sp4-1.0.0
14378368    drwx  4096        Fri Nov  7 11:42:22 2014  asr9k-cpp-4.3.4.sp4-1.0.0
14378398    drwx  4096        Fri Nov  7 11:42:46 2014  asr9k-fwding-4.3.4.sp4-1.0.0
15534680    drwx  4096        Fri Nov  7 11:43:02 2014  iosxr-mcast-4.3.4.sp4-1.0.0
15648000    drwx  4096        Fri Nov  7 11:43:12 2014  iosxr-ce-4.3.4.sp4-1.0.0
15648077    drwx  4096        Fri Nov  7 11:43:26 2014  iosxr-mgbl-4.3.4.sp4-1.0.0
15859258    drwx  4096        Fri Nov  7 11:43:47 2014  asr9k-base-4.3.4.sp4-1.0.0
24582229    drwx  4096        Fri Nov  7 11:44:01 2014  iosxr-bng-4.3.4.sp4-1.0.0
24582340    drwx  4096        Fri Nov  7 11:44:11 2014  iosxr-adv-video-4.3.4.sp4-1.0.0
24869335    drwx  4096        Fri Nov  7 11:44:18 2014  asr9k-px-4.3.4.sp4-1.0.0
24869384    -rw-  164277      Sat Nov  8 23:32:54 2014  sysdb_shared_sc_malloc_dump_node0_RSP0_CPU0_1

1610596352 bytes total (2518016 bytes free)
 

Output from "du" is in attachment.

I was given document https://supportforums.cisco.com/document/145991/managing-disk-space-rsp-4grsp-8g-aka-rsp2 but such does not say, how to find out root cause for full disk.

 

Do you have any idea or hint please?

Thank you.

Martin Oles

10 REPLIES 10
Eddie Chami
Cisco Employee

Hi Martin,

 

I don't anticipate anything is wrong here, you might have well just eaten into all your diskspace, lets do some house keeping first, it appears that you have not repartitioned your disks, you have a 2G disk, and your only making 1.6G available following the following instructions from the link you shared earlier on managing disk space to get an additional 350M back. Once completed keep an eye on displace that is being used. You also want to download CSM, and see if those SMUs installed are part of SP4, in which case you can remove the SMUs(if they are superseded by SP4). 

 

https://supportforums.cisco.com/document/145991/managing-disk-space-rsp-4grsp-8g-aka-rsp2

1 Disk Space Conservation
 

Thank you for this suggestion, but doing house keeping I will make just only space, which will be taken. As I wrote before, device was just "running", no updates, no SMU installed. How device was able to eat all diskpace? There is maybe something which I am somehow missing. I would not expect correctly configured device to run out of disk space withouth any obvious reason.
 

Lets do the Math

4.3.4 base is ~900m

CSCug75299 75M
CSCui94441 69M
CSCul58246 65M
SP4 382M
 
A few core files and NP data logs and your well in the 1.5G.
 
Repart the disk. This is a 2G internal USB, delete the malloc_dump files, disable logging to disk if you have it enabled. Go into the directories (not the packaging ones) and look for whatever files might be increasing in size, thats a good place to start. 
 
Eddie.

Thank you, but again, it does not sum up what I am observing.

I have run "dir /recurse disk0:" and then I have counted occupied space by every file. All files together 930Mbytes

After last instalation actions occupaid space was 936Mbytes. So, very roughly those figures are close together.

I am really puzzled about such behaviour.

you probably are not accounting for hidden files, driver logs, tmp space usage, (install) database. if you have done a lot of (large) commits this will start to add to the overall consumption too.

with the image size that is there today, and the service packs which are rather large also, it is highly advisable to use the run repart -d to reclaim the 300M reserved space which gives you some breathign room.

also disabling mirroring between disk0 and disk1 gives you 2 independent disks to use for install operations.

cheers

xander

Just to clarify my problem, after instalation asr9k-os-mbi-4.3.4.sp4-1.0.0 I had 670MB space, after few large commits and 10 months in production disk is full. Repartition yes, it will make me additional 300MB but in this situation such is solution for 5 months. What really is filling the disk? It does not seems as sort of hardware problem as such is visible on 6 routers simultaniously.

How please it is possible to see those hidden files and logs?

Thank you very much for help.

So far, even with significant effort, no root cause for full disk has been found.

I have ensured, that XR upgrade which took place 10 months ago did not take all disk space, I have checked also standby RSP, output from boht RSPs for comparation:

RP/0/RSP0/CPU0:ROUTER#show filesystem
File Systems:
     Size(b)     Free(b)        Type  Flags  Prefixes
  1610596352     2518016  flash-disk     rw  disk0:
 
RP/0/RSP0/CPU0:ROUTER#show filesystem location 0/RSP1/CPU0
File Systems:

     Size(b)     Free(b)        Type  Flags  Prefixes
  1610596352   691096064  flash-disk     rw  disk0:

 

Files between active RSP and backup RSP were compared, there is minor difference, but nothing significant.

Furhter more, one of the culprit could be automated system for config backup, which is issuing following commands on daily basis:
terminal width 0
copy running-config disk0:NA_config
sftp disk0:123456789.tmp username:password@10.20.30.40:123456789.tmp vrf VRF
del disk0:123456789.tmp

Investigating furhter more I have also found, that on all affected devices is process sftp_server running many times (298 in case of this particular router):

RP/0/RSP0/CPU0:ROUTER#sh processes | include sftp_server
12345  1    1   16K  10 Receive     5868:21:50:0039    0:00:00:0065 sftp_server
12346  1    1   16K  10 Receive     5862:22:31:0770    0:00:00:0054 sftp_server
12347  1    1   16K  10 Receive     5856:22:46:0024    0:00:00:0042 sftp_server
12348  1    0   16K  10 Receive     5850:21:16:0232    0:00:00:0055 sftp_server

I have tried to kill few processes by "process shutdow jobID", but no affect on disk space observed. Perhaps I would need to shutdown all sftp_server processes to see effect.

Currently I have not found exact proof, if such findings are related. tmp files from backup tool are not visible on disk.

Thank you for your opinion.

ls -a should show hidden files.

To find large files we can do something like run du -aks /disk0: | grep -E -e '[0-9]{3,} /disk0'

To find new files we would need to take a snapshot of the entire disk and then later do a diff. We could try run ls -laR /disk0: to get this. Another option would be to check for all files modified in Sep with a | grep -E 'disk0|Sep' so that you can also see which directory the file is in.

 

Many files change constantly, others such as the commit DB will build over time as mentioned by others. I would recommend opening a TAC SR to check this further, or just email me and I can assist.

 

As you noticed there is a difference in both RSPs, a switchover will often fix the issue and clear out any tmp data. Commit data and others should be synced between the two RSPs.

 

Sam

Hi Sam,

I have checked again disk via "du -aks /disk0: | grep -E -e '[0-9]{3,} /disk0'" . There is no unusualy big file, no exessive repating similar files. Result from this count also shows, that on disk is 915349kB occupied, but what happended with the rest?

Please note, that there are 6 routers with the same behaviour. I am having for such service request, but so far all findings are also inconclusive. Simply written, 620MB "dissapeared".

My suspicious is, that process sftp_server is creating tmp and such is blocking space, but I am so far unable to find such. Maybe I would try to "kill off" all sftp_server processes to see, what would happned. If each process is taking ammount of data and those are from memory swapped to disk0: , then such could explain missing space.

I am very well aware, that doing RSP switchover might fix problem with full disk, also cleaning up unused files would help, also repart would help. But without knowing root cause it will be again matter of time, when disk0 will be filled up again.

Root cause has been found. There is known CSCuf81591 , hanging sftp_server was taking huge amount of virtual memory, which is swapped to disk0 .

As mentioned above, I am using SFTP server for daily running-config backup.

terminal width 0
copy running-config disk0:NA_config
sftp disk0:123456789.tmp username:password@10.20.30.40:123456789.tmp vrf VRF
del disk0:123456789.tmp

There is either, CSCuf81591 or some form to incompatibility, resulting in sftp_server proccess running in receive mode:

-------------------------------------------------------------------------------
                  Job Id: 12345
                     PID: 25375552
         Executable name: sftp_server
             Instance ID: 1
                 Respawn: OFF
                    core: TEXT SHAREDMEM MAINMEM
65936  1    1   16K  10 Receive     4693:44:45:0225    0:00:00:0082 sftp_server
-------------------------------------------------------------------------------

As visible above, backup server is then deleting file. As there is on QNX4 filesystem file lock implemented, it possibly happens, that file is created, then marked as deleted, but due to lock by sftp_server process never actualy freed.

Original status:

RP/0/RSP0/CPU0:ROUTER2#show filesystem
Thu Sep 17 22:28:50.388
File Systems:

     Size(b)     Free(b)        Type  Flags  Prefixes
  1610596352      527872  flash-disk     rw  disk0:

After "killing season" and 300 sftp_server removed with "process shutdown JobID":

 

RP/0/RSP0/CPU0:ROUTER2#sh filesystem

 

Thu Sep 17 22:51:24.199
File Systems:

     Size(b)     Free(b)        Type  Flags  Prefixes
  1610596352   673181696 flash-disk     rw  disk0:

Such meets situation just after upgrade.

Thank you all for valuable tips and advices.

Content for Community-Ad