The main use cases of deduplication appliances are:
- reducing the storage consumed
- store backup data for long-term
Integrated vs non-integrated
In general there are two types:
-
Integrated. This type is preferred and uses specific protocols such as DDBoost (DataDomain) and Catalyst (StoreOnce) or they even integrate the Veeam Data Mover agent directly onto their appliance (like Exagrid or Quantum). In this case, VBR can communicate with the storage device and only send the unique blocks instead of all of the new or changed backup data.
Synthetic operations will be performed on the appliance itself, therefore they will require minimal additional time and lower the I/O. Using backup job default settings with (weekly) synthetic full backups will run fine.
Applies to using as primary and secondary backup repository.
-
Non-integrated. This type uses standard protocols such as SMB or NFS. In this case, VBR cannot communicate with the storage device directly and sends all of the new or changed backup data resulting in a higher flow of data traffic.
Avoid synthetic operations and use Active Full Backups. The downside is the need to transport the entire amount of backup data on a weekly or monthly basis. Therefore plan carefully as this can lead to a longer backup or copy window and issues like longer opened VM snapshots with longer VM snapshot commits when running primary backups.
Applies to using as primary and secondary backup repository.
Performance Impacts
Please read KB2660 for Performance Impacts of Deduplicated Storage Systems.
Considerations
Any choice or decision can have a significant impact on the design. Therefore, when considering the use of such appliance, it is important to look at the different use cases:
- Is immutability required?
- What are the RPO/RTO requirements?
- Which (restore) capabilities will be used?
- Is (backup file) encryption required?
- Will there be secondary backup copies to another backup repository, SOBR tiering or tape?
- Can it be used as (meta data) extent for NAS backups?
Immutability
To protect your backup files from loss as a result of malware activity or unplanned actions, always opt for a device that offers immutability such as:
RPO/RTO
These appliances are optimized for write operations, therefore will work great for ingesting lots of backup data. However, random read I/O suffers from the re-hydration processes required during restores (except appliances with a landing zone).
For this reason we recommend using inline deduplication appliances mainly as secondary targets, where parameters such as price per GB are more important than restore performance.
Required restore capabilities
As mentioned above, with a deduplication appliance RTO is significantly impacted thus impacting the way certain (restore) capabilities will be experienced - unless the deduplication appliance offers a fast non-deduplicated area for the most recent restore points. Therefore it is important to qualify the main requirements when it comes to restore.
For example:
-
High-priority restores. You need to urgently mitigate a ransomware attack, frequently perform automated restore tests, frequently scan backups for malware, frequently spin up test environments from your backups, restore large databases or files, be able to restore application items very fast.
In this case, it is highly recommended to use a non-deduplicated backup repository (such as a landing zone) to be able to meet the required SLA. -
Low-priority restores. You need to restore an entire non-critical VM, restore few files or items from an application not more than once a week. In this case, the restore time is accepted and will therefore meet the SLA.
Encryption
VBR supports the encryption of backup files at-rest. This is useful when we try to protect ourselves from backup data theft. However, when writing encrypted backup files to a deduplication appliance, the deduplication efficiency is negatively impacted as two identical blocks will be different after encryption. This results in less reduction of storage consumption and turns the deduplication appliance into a very expensive backup repository.
In this case you can better turn off the backup file encryption option and switch on data encryption on the deduplication appliance itself. This will make sure all data on the storage device is encrypted, however it only protects against the theft of the physical device itself.
Secondary copies
When considering 3-2-1 into the design, it also means we will need to read data from the primary backup repository and copy (or move) it to a secondary backup repository or even tape. This operation puts extra load on the primary backup repository.
As described earlier, random read I/O suffers from the re-hydration processes which means that if a deduplication device is used as a primary backup repository, the performance of creating secondary copies of your backup data will be severely impaired.
Use as metadata extent for NAS backups
You can not assign the role of a cache repository to deduplicating storage appliances.
Always use a fast (SSD) repository as metadata extent and use the deduplication device as backup data extent.
Please refer to the NAS Backup Repository > Metadata extents section for more details.
Using deduplication on the NAS backups can significantly reduce the capacity required compared to using a standard repository. In particular when there are multiple shares or the retention of the data is high e.g. months or years.
Windows Server deduplication
Although the option is still out there to use a Windows Server with deduplication, it is not a recommended option because it’s non-Integrated. However, if still in place, make sure to follow the best practices listed here.
References
- KB2660 - Performance Impacts of Deduplicated Storage Systems
- KB1745 - Deduplication Appliance Best Practices
- Veeam Alliance Partner integrations and qualifications: Backup Target -Deduplication
- Veeam Help Center - Deduplicating storage appliances
- Veeam Help Center - Scale-Out Repository with Extents in Metadata and Data Roles
- Veeam Help Center - Tape