For those diving into NAS sizing, I wanted to share this write up to help explain the roles of the Cache Repositories and Metadata. The sizing of these components varies depending on the type of backup storage you’re using. This consideration has become increasingly important with the addition of object storage support in v12.
A Cache Repository is a storage location where Veeam Backup & Replication keeps temporary cached metadata (folder level hashes) for the NAS data being protected. A Cache Repository improves incremental backup performance because it enables Veeam to quickly identify source folders that don’t have any changes via matching hashes stored in the Cache Repository.
When NAS backup is targeting a disk-based repository, the cache metadata is held in memory so very little if any storage on the Cache Repository is required, roughly 1 to 2 GB for disk-based repositories.
Not to be confused with cached metadata (folder level hashes) on the Cache Repository, Metadata is created by Veeam used to describe the backup files, source, files name version and pointers to backup blobs. Most often, when performing restore, merge, transform operations, Veeam interacts with this Metadata rather than with the backup data.
Metadata is always redundant (“meta” and “metabackup” or “metacopyv2”). The actual placement and number of metadata copies is dependent on the repository configuration and type. This is important to understand because the number and placement of Metadata will impact how much storage is required.
- For a single, simple repository, two copies of Metadata are created on the same repository
- For a SOBR with 3 or more extents, 3 copies of the Metadata are made.
- On an object-based repository, there are 2 copies of the Metadata. The primary copy resides on the Cache Repository and the secondary copy resides on the object-based repository.
- When working with dedupe appliances, the primary copy should be configured to reside on a dedicated Metadata extent (performance tier) within the SOBR. A copy of the Metadata (Metabackup) is stored on the data extent (performance extent).
Note: It is important to remember that when using a SOBR for NAS backup that while backup data is distributed to all data extents, the backup data itself is not redundant across the repository. Redundancy is achieved via secondary repositories.
When using a standalone repository, the repository always stores both the backup data and the Metadata. However, remember how the cache metadata is held in memory so very little if any storage on the Cache Repository is required? This changes if the NAS backup is targetting an object-based repository because the Cache Repository is now storing the primary copy of the NAS backup Metadata. This means additional storage must be sized and provisioned on the Cache Repository.
Why does Veeam store the primary copy of the Metadata on the Cache Repository? It’s to avoid Metadata operations being performed against the object storage which will incur object store API costs and potential egress charges.
When NAS backup targets an object-based repository, Veeam recommends the Cache Repository be provisioned with storage equal to 5% of the source NAS data being protected to store the Metadata. Because the Metadata yields frequent lookups, Veeam recommends faster disks, possibly SSDs to avoid negatively impacting performance. If 5% of the source NAS data being protected does not sound appealing, you can alternatively size based on 1.5GB of Metadata space per 1M file versions protected by 1 backup job.
Regarding sizing when targeting a dedupe appliance, the primary Metadata copy won’t reside on the Cache Repository, instead, Metadata resides on the backup repository. As detailed earlier, Metadata is constantly being accessed and updated during backups so it’s important to avoid deduping the primary copy of the Metadata, doing so could result in unnecessary slowdowns due to the constant on-the-fly rehydration of said files. For optimal performance, Veeam recommends to combine a disk-based repository (non-dedupe) with the deduplication appliance repository in the same SOBR. By doing so, its possible to dedicate performance extents to act as data extents or as Metadata extents.
The process of assigning roles to extents involves using Set-VBRRepositoryExtent
cmdlet, as described in the Veeam PowerShell Reference. By configuring Metadata extents, we can avoid storing the primary copy of the Metadata on deduplicated extents, improving read access speeds.
Note: Since Metadata would spread across all Metadata extents without any redundancy there is no advantage to configuring multiple Metadata extents in the same SOBR.
Sizing storage for Metadata on the backup storage
Now, we’ve discussed sizing Metadata on the Cache Repository but it’s also important to size the storage required for the Metadata on the backup storage itself. Because the exact number of Metadata copies is determined by repository type and architecture used, there are different formulas used for estimating the Metadata storage requirement on the backup repository.
- Sizing the Metadata on the disk-based repository: Metadata = Backup size * 10%
- Sizing the Metadata on the object-based repository: Metadata = Backup size * 5%
This difference between the 5% and 10% is explained by the number of Metadata copies, with object-based storage repositories there are 2 copies of the Metadata, with only one being present on the object storage, the other being on the Cache Repository disk.
With disk-based storage there are between 2 and 3 copies of the Metadata,for which Veeam assumes a conservative 3 copies across extents, hence the 10% sizing recommendation.
Remember, you still need to size the Metadata on the Cache Repository when working object-based repositories, this is in addition to sizing Metadata on the object-based repository itself. Remember to not mix this up with the cached metadata (folder level hashes) which are always stored on the Cache Repository.