12-06-2022 12:15 AM
Hi,
I'm trying to setup a new backup for our DNAC and keep consistently getting this error message.
Error during _process_backup(): Internal server error: {"error":{"root_cause":[{"type":"snapshot_creation_exception","reason":"[ndp:cba0c672-c478-49d3-b394-c4d0489cc69f.000/lSBGG91JTXC_lkwxq01JDw] failed to create snapshot"}],"type":"snapshot_creation_exception","reason":"[ndp:cba0c672-c478-49d3-b394-c4d0489cc69f.000/lSBGG91JTXC_lkwxq01JDw] failed to create snapshot","caused_by":{"type":"access_denied_exception","reason":"/var/data/es/snapshots/meta-lSBGG91JTXC_lkwxq01JDw.dat"}},"status":500}
The backup gets to 50% and does seem to copy data then always fails at this steps. I have rebuilt the destination NFS server and have also rebooted the DNAC. Neither of which have made any difference.
Thanks
12-06-2022 01:28 AM
- FYI : https://bst.cloudapps.cisco.com/bugsearch/bug/CSCwd08262
M.
12-06-2022 02:24 AM
Thanks for the link:
I ran a chmod 777 -R on the route of the share. It seemed to be working as more data was passed. I can see 11G was transfered to the NFS server. However rather then just failing at 50% it then flapped between 50/60% then eventually failed with the same error as before.
Error during _process_backup(): Internal server error: {"error":{"root_cause":[{"type":"snapshot_creation_exception","reason":"[ndp:c2254c92-8ed4-4177-a3ab-04363c924afb.000/VzykC4S-SBGZZoKi59jKvw] failed to create snapshot"}],"type":"snapshot_creation_exception","reason":"[ndp:c2254c92-8ed4-4177-a3ab-04363c924afb.000/VzykC4S-SBGZZoKi59jKvw] failed to create snapshot","caused_by":{"type":"access_denied_exception","reason":"/var/data/es/snapshots/meta-VzykC4S-SBGZZoKi59jKvw.dat"}},"status":500}
Here's an output of the NFS server that I'm using;
administrator@nfshost:/mnt/sdb$ ls -l DNAC/
total 8
drwsrwsrwx 6 nobody nogroup 4096 Dec 6 10:08 backups
drwsrwsrwx 2 nobody nogroup 4096 Dec 6 09:32 nfs
administrator@nfshost:/mnt/sdb$ ls -l DNAC/backups/
total 16
drwxrwsrwx 5 administrator nogroup 4096 Dec 6 09:34 fusion.postgres
drwxrwsrwx 5 administrator nogroup 4096 Dec 6 09:33 maglev-system.credentialmanager
drwxrwsrwx 5 administrator nogroup 4096 Dec 6 09:33 maglev-system.glusterfs
drwxrwsrwx 5 administrator nogroup 4096 Dec 6 09:33 ndp.redis
On DNAC, /nfs is the NFS server and /backups is the share on the '(Remote Host)' option. Only /nfs is shared as an nfs share.
12-06-2022 04:34 AM
- Check if the share has enough free space , also on the NFS server check the nfs related logs and or networking related logs ,
M.
12-06-2022 06:19 AM
Initially I had multipath errors, I managed to get rid of these by adding the following
defaults {
user_friendly_names yes
}
blacklist {
device {
vendor "VMware"
product "Virtual disk"
}
}
This removed the errors but now when looking at the logs
administrator@host:/mnt/sdb$ tail /var/log/syslog
Dec 6 14:13:37 host kernel: [85329.354152] NFSD: end of grace period
Dec 6 14:13:37 host kernel: [85329.354155] NFSD: laundromat_main - sleeping for 90 seconds
Dec 6 14:15:07 host kernel: [85419.464417] NFSD: laundromat service - starting
Dec 6 14:15:07 host kernel: [85419.464435] NFSD: end of grace period
Dec 6 14:15:07 host kernel: [85419.464437] NFSD: laundromat_main - sleeping for 90 seconds
Dec 6 14:15:36 host systemd-timesyncd[764]: Timed out waiting for reply from 185.125.190.56:123 (ntp.ubuntu.com).
Dec 6 14:15:36 syt-penm-ibs01 systemd-timesyncd[764]: Initial synchronization to time server 185.125.190.57:123 (ntp.ubuntu.com).
Dec 6 14:16:38 host kernel: [85509.575411] NFSD: laundromat service - starting
Dec 6 14:16:38 host kernel: [85509.575413] NFSD: end of grace period
Dec 6 14:16:38 host kernel: [85509.575415] NFSD: laundromat_main - sleeping for 90 seconds
administrator@syt-penm-ibs01:/mnt/sdb$
There isn't anything special in my /etc/eports string
/mnt/sdb/DNAC/nfs *(rw,all_squash,sync,no_subtree_check)
08-30-2023 08:43 AM
Hi jamesytn,
I'm currently facing the same challenge. Did you manage to find a solution for this?
09-21-2023 08:42 AM
James,
What is the Cisco Catalyst Center (formerly Cisco DNA Center)?
So looking at your directory ownership, this may be the issue. I ran into the same issues or symptoms that you are reporting and I had the same ownership for the NFS directory. I believe that this was the original configuration requirements in the past but I cannot validate this since our documentation has been scrubbed for the different releases.
/mnt/sdb/DNAC/nf
drwsrwsrwx 2 nobody nogroup 4096 Dec 6 09:32 nfs
So, I performed the following to fix the issue
$ sudo chown nfsnobody:nfsnobody /home/cx1/nfs
$ sudo exportfs -r
$ df -h
/home/cx1/nfs
drwxr-xr-x. 2 nfsnobody nfsnobody 6 Sep 12 12:55 nfs
Note: Since this will be the initial "All Data" backup, this task/job may or will take
multiple hours to complete. This depends on the amount of Assurance Data that you have
accumulated in your cluster. You will see the backup "appear" to stall at 40-50%.
Up until this percentage will backed up the RSYNC/Automation data. Then we start backing up
your Assurance data. This data can be large or very large.
You can monitor the progress on the Remote NFS Server by watching the NFS Directory status during the backup.
For Example:
------------
$ watch -d -n 0.5 "tree /home/cx1/nfs | grep files"
You will see the files & directory numbers continue to increment when the progress status pertcentage is static during the backup.
02-14-2024 12:13 AM
The comment about appearing to stall helped me out
Fix for me was:
I thought it had got stuck until I read your comment. 27hrs later, it succeeded at ~820GB
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide