cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
3149
Views
2
Helpful
7
Replies

Cisco DNAC Backup Issues

jamesytn
Level 1
Level 1

Hi,

I'm trying to setup a new backup for our DNAC and keep consistently getting this error message.

 

 

 

 

Error during _process_backup(): Internal server error: {"error":{"root_cause":[{"type":"snapshot_creation_exception","reason":"[ndp:cba0c672-c478-49d3-b394-c4d0489cc69f.000/lSBGG91JTXC_lkwxq01JDw] failed to create snapshot"}],"type":"snapshot_creation_exception","reason":"[ndp:cba0c672-c478-49d3-b394-c4d0489cc69f.000/lSBGG91JTXC_lkwxq01JDw] failed to create snapshot","caused_by":{"type":"access_denied_exception","reason":"/var/data/es/snapshots/meta-lSBGG91JTXC_lkwxq01JDw.dat"}},"status":500}

 

 

 

 

The backup gets to 50% and does seem to copy data then always fails at this steps. I have rebuilt the destination NFS server and have also rebooted the DNAC. Neither of which have made any difference.

Thanks

7 Replies 7

marce1000
VIP
VIP

 

            - FYI : https://bst.cloudapps.cisco.com/bugsearch/bug/CSCwd08262

 M.



-- Each morning when I wake up and look into the mirror I always say ' Why am I so brilliant ? '
    When the mirror will then always repond to me with ' The only thing that exceeds your brilliance is your beauty! '

Thanks for the link:

I ran a chmod 777 -R on the route of the share. It seemed to be working as more data was passed. I can see 11G was transfered to the NFS server. However rather then just failing at 50% it then flapped between 50/60% then eventually failed with the same error as before.

Error during _process_backup(): Internal server error: {"error":{"root_cause":[{"type":"snapshot_creation_exception","reason":"[ndp:c2254c92-8ed4-4177-a3ab-04363c924afb.000/VzykC4S-SBGZZoKi59jKvw] failed to create snapshot"}],"type":"snapshot_creation_exception","reason":"[ndp:c2254c92-8ed4-4177-a3ab-04363c924afb.000/VzykC4S-SBGZZoKi59jKvw] failed to create snapshot","caused_by":{"type":"access_denied_exception","reason":"/var/data/es/snapshots/meta-VzykC4S-SBGZZoKi59jKvw.dat"}},"status":500}

 Here's an output of the NFS server that I'm using;

administrator@nfshost:/mnt/sdb$ ls -l DNAC/
total 8
drwsrwsrwx 6 nobody nogroup 4096 Dec  6 10:08 backups
drwsrwsrwx 2 nobody nogroup 4096 Dec  6 09:32 nfs

administrator@nfshost:/mnt/sdb$ ls -l DNAC/backups/
total 16
drwxrwsrwx 5 administrator nogroup 4096 Dec  6 09:34 fusion.postgres
drwxrwsrwx 5 administrator nogroup 4096 Dec  6 09:33 maglev-system.credentialmanager
drwxrwsrwx 5 administrator nogroup 4096 Dec  6 09:33 maglev-system.glusterfs
drwxrwsrwx 5 administrator nogroup 4096 Dec  6 09:33 ndp.redis

On DNAC, /nfs is the NFS server and /backups is the share on the '(Remote Host)' option. Only /nfs is shared as an nfs share.

 

 - Check if the share has enough free space  , also on the NFS server check the nfs related logs and or networking related logs  , 

 M.



-- Each morning when I wake up and look into the mirror I always say ' Why am I so brilliant ? '
    When the mirror will then always repond to me with ' The only thing that exceeds your brilliance is your beauty! '

Initially I had multipath errors, I managed to get rid of these by adding the following 

defaults {
    user_friendly_names yes
}

blacklist {
    device {
        vendor "VMware"
        product "Virtual disk"
    }
}

This removed the errors but now when looking at the logs  

administrator@host:/mnt/sdb$ tail /var/log/syslog
Dec  6 14:13:37 host kernel: [85329.354152] NFSD: end of grace period
Dec  6 14:13:37 host kernel: [85329.354155] NFSD: laundromat_main - sleeping for 90 seconds
Dec  6 14:15:07 host kernel: [85419.464417] NFSD: laundromat service - starting
Dec  6 14:15:07 host kernel: [85419.464435] NFSD: end of grace period
Dec  6 14:15:07 host kernel: [85419.464437] NFSD: laundromat_main - sleeping for 90 seconds
Dec  6 14:15:36 host systemd-timesyncd[764]: Timed out waiting for reply from 185.125.190.56:123 (ntp.ubuntu.com).
Dec  6 14:15:36 syt-penm-ibs01 systemd-timesyncd[764]: Initial synchronization to time server 185.125.190.57:123 (ntp.ubuntu.com).
Dec  6 14:16:38 host kernel: [85509.575411] NFSD: laundromat service - starting
Dec  6 14:16:38 host kernel: [85509.575413] NFSD: end of grace period
Dec  6 14:16:38 host kernel: [85509.575415] NFSD: laundromat_main - sleeping for 90 seconds
administrator@syt-penm-ibs01:/mnt/sdb$

 There isn't anything special in my /etc/eports string

/mnt/sdb/DNAC/nfs *(rw,all_squash,sync,no_subtree_check)

 

Hi jamesytn,

I'm currently facing the same challenge. Did you manage to find a solution for this?

James,

What is the Cisco Catalyst Center (formerly Cisco DNA Center)?

So looking at your directory ownership, this may be the issue. I ran into the same issues or symptoms that you are reporting and I had the same ownership for the NFS directory. I believe that this was the original configuration requirements in the past but I cannot validate this since our documentation has been scrubbed for the different releases.


/mnt/sdb/DNAC/nf
drwsrwsrwx 2 nobody nogroup 4096 Dec 6 09:32 nfs

Chapter: Backup and Restore
https://www.cisco.com/c/en/us/td/docs/cloud-systems-management/network-automation-and-management/dna-center/2-3-5/admin_guide/b_cisco_dna_center_admin_guide_2_3_5/b_cisco_dna_center_admin_guide_2_3_5_chapter_0110.html#Cisco_Task_in_List_GUI.dita_d361...


So, I performed the following to fix the issue

  • Deleted existing Assurance Backups if any (optional)
  • Removed the NFS configuration settings from Cisco Catalyst Center (formerly Cisco DNA Center)
  • Removed the contents and directory from the Remote NFS Server
  • Added the directory back to the Remote NFS Server
  • Changed ownership of directory
  • Refresh the exportfs for the NFS directory
  • Check & Verify the available disk space
  • Add the Backup Configuration back to the Cisco Catalyst Center (formerly Cisco DNA Center)
  • Perform a backup for type "All Data"
  • Verify successful backup of All Data
$ sudo chown nfsnobody:nfsnobody /home/cx1/nfs
$ sudo exportfs -r
$ df -h

/home/cx1/nfs
drwxr-xr-x. 2 nfsnobody nfsnobody 6 Sep 12 12:55 nfs

Note: Since this will be the initial "All Data" backup, this task/job may or will take
multiple hours to complete. This depends on the amount of Assurance Data that you have
accumulated in your cluster. You will see the backup "appear" to stall at 40-50%.
Up until this percentage will backed up the RSYNC/Automation data. Then we start backing up
your Assurance data. This data can be large or very large.

You can monitor the progress on the Remote NFS Server by watching the NFS Directory status during the backup.

For Example:
------------
$ watch -d -n 0.5 "tree /home/cx1/nfs | grep files"

You will see the files & directory numbers continue to increment when the progress status pertcentage is static during the backup.

 

The comment about appearing to stall helped me out

Fix for me was:

  1. Align folder permissions (assurance folder wasn't quite right)
  2. Remove Scheduled backups > Remove NFS config
  3. Reapply NFS config
  4. Run full backup

I thought it had got stuck until I read your comment. 27hrs later, it succeeded at ~820GB

Review Cisco Networking for a $25 gift card