03-26-2024 04:36 AM
DNAC has been happily backing up to a SFTP server for several weeks. Yesterday it failed and as far as I can tell nothing has changed. I have verified I can login to the SFTP server using WinSCP and SSH using the credentials configured in DNAC, there is 1.3TB of storage available and 67% is in use (834GB in use and 421GB available).
The details in the error message are:
BACKUP.PREPARE_BACKUP
SUCCESS
BACKUP.maglev-system:credentialmanager
SUCCESS
BACKUP.ndp:redis
SUCCESS
BACKUP.fusion:postgres
SUCCESS
BACKUP.app-hosting:postgres
SUCCESS
BACKUP.maglev-system:glusterfs-server
SUCCESS
BACKUP.maglev-system:mongodb
SUCCESS
BACKUP.POST_BACKUP.fusion:maintenance-service
SUCCESS
BACKUP.POST_BACKUP.ndp:pipelineadmin
SUCCESS
BACKUP.FINALIZE_BACKUP
FAILURE
EndpointConnectionError
Task error description:
Error while rsync \'/usr/bin/rsync -PrltDvR --stats --no-p --no-g --chmod=ugo=rwX
100.70.1.1:/backups/dnac/rsync/ndp.redis/d51f3e34-7fb0-4b86-b3c7-d22ff2aec11a/backup-metadata.json
100.70.1.1:/backups/dnac/rsync/fusion.postgres/d51f3e34-7fb0-4b86-b3c7-d22ff2aec11a/backup-metadata.json
100.70.1.1:/backups/dnac/rsync/app-hosting.postgres/d51f3e34-7fb0-4b86-b3c7-d22ff2aec11a/backup-metadata.json
100.70.1.1:/backups/dnac/rsync/maglev-system.glusterfs/d51f3e34-7fb0-4b86-b3c7-d22ff2aec11a/backup-metadata.json
100.70.1.1:/backups/dnac/rsync/maglev-system.credentialmanager/d51f3e34-7fb0-4b86-b3c7-d22ff2aec11a/backup-metadata.json
100.70.1.1:/backups/dnac/rsync/maglev-system.mongodb/d51f3e34-7fb0-4b86-b3c7-d22ff2aec11a/backup-metadata.json /backup-metadata-tmp\'.
rsync: [sender] link_stat "/backups/dnac/rsync/ndp.redis/d51f3e34-7fb0-4b86-b3c7-d22ff2aec11a/backup-metadata.json" failed: No such file or directory
(2)\nrsync: [sender] link_stat "/backups/dnac/rsync/fusion.postgres/d51f3e34-7fb0-4b86-b3c7-d22ff2aec11a/backup-metadata.json" failed: No such file or directory
(2)\nrsync: [sender] link_stat "/backups/dnac/rsync/app-hosting.postgres/d51f3e34-7fb0-4b86-b3c7-d22ff2aec11a/backup-metadata.json" failed: No such file or directory
(2)\nrsync: [sender] link_stat "/backups/dnac/rsync/maglev-system.glusterfs/d51f3e34-7fb0-4b86-b3c7-d22ff2aec11a/backup-metadata.json" failed: No such file or directory
(2)\nrsync: [sender] link_stat "/backups/dnac/rsync/maglev-system.credentialmanager/d51f3e34-7fb0-4b86-b3c7-d22ff2aec11a/backup-metadata.json" failed: No such file or directory
(2)\nrsync: [sender] link_stat "/backups/dnac/rsync/maglev-system.mongodb/d51f3e34-7fb0-4b86-b3c7-d22ff2aec11a/backup-metadata.json" failed: No such file or directory
(2)\nrsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1655) [Receiver=3.1.1]\nrsync: [Receiver] write error: Broken pipe (32)\n
My Linux isn't great, but it seems to indicate a network disconnection (write error: Broken pipe).
03-26-2024 08:01 AM
What version of rsync is your server running?
03-26-2024 08:43 AM
3.2.7 protocol version 31 according to 'rsync --version'
03-26-2024 11:20 AM - edited 03-26-2024 12:02 PM
The daily backup that started this morning at 04:00 failed after about 12.5 hours. This is the error this time
Task Name
Status
Error Type
BACKUP.PREPARE_BACKUP
SUCCESS
BACKUP.maglev-system:credentialmanager
SUCCESS
BACKUP.ndp:redis
SUCCESS
BACKUP.fusion:postgres
SUCCESS
BACKUP.app-hosting:postgres
SUCCESS
BACKUP.maglev-system:glusterfs-server
FAILURE
Exception
Task error description:
"Error while backup() 'Shell command /usr/bin/rsync -PrltDvR --stats --no-p --no-g --chmod=ugo=rwX --exclude=_no_backup_ /mnt/volumes/default_vol/appstackdata /mnt/volumes/default_vol/commondata /mnt/volumes/default_vol/servicedata 100.70.1.1:/backups/dnac/rsync/maglev-system.glusterfs/20ee76fd-2cfb-4498-91df-dce05cce0cf2 timed out after 3607.4429023750126 seconds of inactivity'"
BACKUP.maglev-system:mongodb
FAILURE
BACKUP.POST_BACKUP.fusion:maintenance-service
FAILURE
BACKUP.POST_BACKUP.ndp:pipelineadmin
FAILURE
BACKUP.FINALIZE_BACKUP
FAILURE
And the files that did get written to the SFTP server, are no longer there.
03-26-2024 12:35 PM - edited 03-26-2024 12:35 PM
I have the same version running without issue for several clusters running Catalyst Center 2.3.X.X. I was suspecting that you were running a too old or too new version of the rsync protocol. I unfortunately don't have any other good theories.
03-27-2024 08:11 AM
So, after 2-days of this failing, the backup has completed successfully this morning; however, it took much longer than it has previously for a similar size. Usually less than 20-minutes and this morning was over 5-hours. I'm going to leave it over the long weekend and then look at it next week.
I think I'll remove the backups from January to free up space on the SFTP server to see if that improves anything. I'll also patch the Ubuntu host and give it a reboot.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide