cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
625
Views
0
Helpful
0
Replies

HyperFlex’s new snapshot mechanism matters. Here’s what Cisco forgot

RedNectar
Advocate
Advocate

This has been posted here to reach a wider audience. If you find the same information on my blog, that's OK - I've posted it in both places!


It all become possible when in ESXi v7.0U2, VMware introduced a new parameter for VMs called snapshot.alwaysAllowNative. Then, with HyperFlex v4.5(2), Cisco took advantage of this new parameter to remove the biggest bugbear of all HyperFlex installations. The SENTINEL snapshot.

The short story

Before HyperFlex Data Platform (HXDP) v4.5(2), HyperFlex used VMware APIs to create an initial snapshot in native format because this format is much more efficient when combined with HyperFlex's pointer-based log structured file system.  This snapshot was always given the special name SENTINEL, but it took some time to create and always consumed a little extra space if it was not deleted later. And if it was deleted, there was always a chance that a VMware REDO snapshot would be taken, and the efficiencies of the HyperFlex pointer-based log structured file system would be compromised.

Now, with with HyperFlex v4.5(2), coupled with ESXi v7.0U2 the HXDP only needs to set a parameter on the VM - the snapshot.alwaysAllowNative parameter. Much more efficient and far less prone to error.  It is much faster now for backup software to take snapshots and there is no potential residual space wasted.

But the old problem of a VMware REDO snapshot being taken still exists, and unfortunately, when Cisco adapted this new approach, they dropped the ball when it came to management options in the HyperFlex Connect management app, the VMware HyperFlex plugin, the HyperFlex CLI and on the Intersight SAS platform.  None of these management systems has any ability to allow users to see which VMs have the snapshot.alwaysAllowNative parameter set, and none of them has any ability to set the parameter.

I'm calling on Cisco to add these options ASAP.  Previously it was relatively simple to see if a VM had a SENTINEL snapshot - you just needed to look at the Snapshots tab for a VM. Now you have to navigate at least six mouse-clicks to check for the snapshot.alwaysAllowNative parameter in a long list of other parameters.

In the meantime, I've created a bunch of PowerCLI scripts you can use to:

  • List all the VMs that have the snapshot.alwaysAllowNative parameter set to TRUE
  • List all the VMs that have the snapshot.alwaysAllowNative parameter set to FALSE
  • List all the VMs that do not have the snapshot.alwaysAllowNative parameter at all, with an option to
    • Set the snapshot.alwaysAllowNative parameter to TRUE on these VMs

These scripts are found at the end of this article.

The full story. Let's start with: Why does it matter?

Before I can explain WHY it matters, you need to understand a little more about VMware snapshots, and in particular, the Native snapshot, and how it is different to the regular REDO snapshot.

First of all, snapshots matter because all backup and replication software do their work by first taking a snapshot of any running VM - it's pretty obvious that you can't backup a running VM while it is potentially writing to disk. The solution is to take a snapshot, copy the snapshot then delete it. Simple.

Secondly, you need to understand that VMware has not always created snapshots the same way.  Today, VMware snapshots are based on a collection of deltas from an initial base, known as REDO files. But in the distant past a snapshot was simply a copy of the .vmdk file that was the VM.  Hence the name the native snapshot. This snapshot of course doubled the amount of disk space required for the VM in the VMware NFS file system, but it turns out that this is an ideal format for snapshot files using HyperFlex's log-structure pointer based file system which, instead of making a copy of a VM when a native snapshot is taken, makes a copy of the pointers instead. No additional disk space needed! Very cool. Very efficient. And very much suited to a Hyper-Converged Infrastructure (HCI).

But there's a problem.

The writers of the HyperFlex Data Platform (HXDP) had to come up with a way of forcing VMs to create snapshots in the original native format rather than the normal REDO format.  And the way they did this originally was when a snapshot was created via the HXDP rather than using the normal VMware methods, HyperFlex made use of APIs to create the first snapshot in native format and gave it a special name - SENTINEL.

Once a VM had one snapshot in the original native format, VMware would create any future snapshots in native format as well, for compatibility. Now the problem is that if a VM has had a normal VMware REDO snapshot taken before the SENTINEL had been created, or after the SENTINEL had been removed, the HXDP can't take a native format snapshot.  And that problem still exists today, even with the new snapshot.alwaysAllowNative parameter.

Missing in Action - Cisco Management Tools

It has been common practice for HyperFlex users to create an initial SENTINEL HyperFlex snapshot as soon as they deploy a VM to ensure that their integrated backup software would be able to make use of the more efficient pointer-based log-structured files system when backing up VMs.

But there was a disadvantage to this approach - over time, the SENTINEL would contain data that belonged to the originally deployed VM that was now out of date, which was fine should you ever wish to revert to the original state, but if you were getting short on space, it meant that at least a small amount of space was been reserved for an unlikely event - remembering you'd likely have a backup of the original file should you need it.

The advantage though was that you could easily check to see if a VM had a SENTINEL snapshot, you just had to click on the Snapshot tab for the VM

SENTINEL.jpg

But with the new more efficient snapshot.alwaysAllowNative parameter, checking if the parameter is set is much harder.

Finding_snapshot.alwaysAllowNative.gif

That's 8 clicks and two scrolls by the time you are done! And the visual challenge of finding the parameter in that long list is just not easy.

Why didn't Cisco add a Native Snapshots column to the VM list in HyperFlex Connect, or perhaps better still a symbol like a * in the Snapshot column to indicate that the VM had been configured for native snapshots?

And why is there no Action option to set a VM (or group of VMs) snapshot.alwaysAllowNative parameter to TRUE?

WhyDidntCiscoDoThese.jpg

And I'd expect these options to be also available in Intersight, on the right-click menu in vCenter (via the plugin) and even would be nice to have some vm options in the HyperFlex Connect Web CLI - such as stcli vm list (which I'd expect to list the snapshot.alwaysAllowNative parameter among other useful information) and stcli vm [(--id ID | --name NAME)] set snapshot.alwaysAllowNative

WhyDidntCiscoDoThisToo.jpg

Now I'm hopeful that Cisco cares enough about their User Interface to actually repair this oversight. And I was prepared to forgive the initial release. But the urge to write this article came when I recently upgraded our lab cluster to HXDP v5.0.  I really expected these features would have been attended to. But I was sadly disappointed.

So while waiting for Cisco to actually fix this faux pas, I've written some PowerShell CLI commands that will help you in the meantime. Fell free to cut and paste from below. 

The PowerCLI Scripts

To use PowerCLI scripts you need to have installed PowerShell Core for your OS (Windows usually comes with PowerShell installed) then from the Powershell CLI, install the VMware PowerCLI Powershell modules like this:

 Install-Module -Name "VMware.PowerCLI" -Scope "CurrentUser" 

Next, you connect to your HXDP vCenter - but if you don't have valid certificates on your vCenter, fist do this

Set-PowerCLIConfiguration -InvalidCertificateAction:Ignore 

And when you connect to vCenter, it should look something like this:

 

PS /Users/rednectar> Connect-VIServer -Server vca.your.domain.dns
Specify Credential
Please specify server credential User:admin@your.domain.dns Password for user admin@your.domain.dns: ********* Name Port User ---- ---- ---- vca.your.domain.dns 443 HL.DNS\admin

And now you can issue the following command to perform the following functions:

  • List all the VMs that have the snapshot.alwaysAllowNative parameter set to TRUE
Get-VM  |
 where {$_.Name -notlike "vCLS*"} |
 where {($_.ExtensionData.Config.ExtraConfig |
 where {$_.Key -match "snapshot.alwaysAllowNative"} |
 where {$_.value -eq $true})} |
 select @{N="VMs using Native Snapshots";E={$_.Name}}
  • List all the VMs that have the snapshot.alwaysAllowNative parameter set to FALSE
Get-VM  |
 where {$_.Name -notlike "vCLS*"} |
 where {($_.ExtensionData.Config.ExtraConfig |
 where {$_.Key -match "snapshot.alwaysAllowNative"} |
 where {$_.value -eq $false})} |
 select @{N="VMs set to NOT use Native Snapshots";E={$_.Name}}
  • List all the VMs that do not have the snapshot.alwaysAllowNative parameter at all
Get-VM |
 where {$_.Name -notlike "vCLS*"} |
 where {!($_.ExtensionData.Config.ExtraConfig |
 where {$_.Key -match "snapshot.alwaysAllowNative"})} |
 select @{N="VMs NOT configured at all for Native Snapshots";E={$_.Name}}
  • Set the snapshot.alwaysAllowNative parameter to TRUE on these VMs
Get-VM -PipelineVariable vmname |
where {$_.Name -notlike "vCLS*"} |
where {!($_.ExtensionData.Config.ExtraConfig |
where {$_.Key -match "snapshot.alwaysAllowNative"})} |
New-AdvancedSetting -Name snapshot.alwaysAllowNative -value TRUE -Confirm:$false -force |
select @{N="VMs Converted to use Native Snapshots";E={$vmname.Name }}

 

RedNectar aka Chris Welsh. Forum Tips: 1. Paste images inline then edit>Image Size Large- don't attach. 2. Always mark helpful and correct answers, it helps others find what they need.
0 Replies 0
Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community:

Recognize Your Peers