This section provides guidance on deploying Veeam in larger scale environments which for the purposes of this document will start at 5k+ virtual machines.
Note that as these configurations are over 500 VMs a licensed edition of SQL will be required.
As Veeam scales the required resources for the Veeam Backup and Replication server (VBR) will hit a natural limit of what is possible within a single ESXi host. The point this may happen is down to your individual environment as there are many variables not least the available resources within the Hypervisor. In addition to this is the issue that another host will need sufficient resource to allow for HA restart.
There are two potential options to deal with this situation.
- Deploy VBR as a Physical Server
- Deploy additional VBR instances and split the jobs (scale-out)
Both options have pros and cons:
|Physical||Dedicated CPU & RAM||Resources potentially finite|
|Dedicated Networking||Will require equal hardware to restore*|
|Single management point||Single Failure Domain|
|No changes to jobs or proxy/repo architecture required|
|Scale-Out||No physical hardware for VBR req||Increased operational overhead|
|Can take advantage of HA||Dedicated proxy and repos required per VBR|
|Decreased attack surface|
|Maintains VBR portability|
*From configuration backup
The choice between the two options is ultimately up to you how you prefer to scale the environment. However, please note that with the Scale-Out option that Enterprise Manager will provide a centralised monitoring and restore platform.
High-level example of Scale-Out Design:
When do I need to choose?
As mentioned previously this is highly dependent on available resources in the hypervisor and amount of capacity being backed up. However, we have found that at 3500-5000+ VMs these options start to need to be considered.
Please use the best practice calculations to estimate the resource requirements for VBR and SQL.
Using the scale-out method, the environment is split down into smaller Protection Domains which also overlap with failure domains. These have the advantage of:
In splitting the VBR servers into smaller environments, while using security best practices, means that a breach on one server will potentially affect less of the overall Veeam deployment.
Reduction in restore times
At scale, VBR can take some time to rebuild from a configuration backup, though this is still usually measured in minutes. However, by splitting the VBR servers the restore times for an individual server will be reduced.
As with the VBR server the Configuration Database will require significant resources though likely at a lower rate than the VBR. However, like with the VBR server there will be a point a decision will need to be taken on scale-up or out.
Many customers with large environments may have an existing SQL server(s) that could be used. If not, the options are to either include an instance of SQL with each VBR, or to create a centralised server. Similar questions to the scaling of VBR need to be asked, though the additional cost of SQL licenses may likely be a defining factor.
Note: the connection between VBR and the Configuration DB is latency sensitive. It is not recommended connecting a VBR server over a WAN connection.
If Veeam ONE is in-scope then a dedicated server is likely the best option as the Database requirements can be quite significant at this scale. See the Veeam ONE calculator.
Each VBR instance should be allocated dedicated proxy and repository resources. This is required because each VBR instance will be independent and therefore not aware of another instances’ resource scheduling (e.g. allocation of slots for a proxy or repository).
Where possible, natural divides in the virtual architecture should be followed. At the most basic level this could be at the site level then down to cluster level if there are more than one in the environment.
When using Backup from Storage Snapshots (BfSS) there will be an increase in SAN network connectivity as each proxy will require its own connections. Also, when configuring jobs, consideration will need to be made to not have jobs that target the same datastore at the same time. In this case the jobs schedules should be factored into the planning.
In a scale-out approach it will be important to centralise as much of the operations and monitoring as possible. Using Enterprise Manager is essential for centralising and delegating many of the operations.
Veeam ONE is highly recommended as it will remove a significant amount of the operational overheads in monitoring and reporting on a solution of this scale.