Enterprise Design

Introduction

This section provides guidance on deploying Veeam in larger-scale environments which for this document will start at 5000 virtual machines.

Large Scale VBR deployments

There’s no hard limit of VM’s number in a single Veeam Backup and Replication instance, however, the recommendation is to keep it under 10000 VMs per instance. At the same time installations with 18000 VMs per instance are seen in the field as well.

When scaling in Veeam Backup server to the limit (10000-18000 VMs) following official requirements - you can face the case when size is almost equal or even over the amount of resources of your ESXi hosts.

In addition, your environment can bring additional limitations - as there are many variables not least the available resources within the Hypervisor.

There are basic approaches to deal with this situation.

For both options virtual is the recommended deployment. (more details here).

Scale-up by a gradual increase of provisioned resources.

As most deployments on large scale are not instantly coming to a maximum - consider a gradual increase of resources.
The approximate amount of required resources can be calculated using Veeam Sizing Tools

However, official recommendations are quite conservative and real results can vary on a large scale.
Still biggest cons of the approach:
- chance of VBR server requirements will be close or even over the size of the ESXi host
- single point of failure
Scale-out (split jobs between servers)

The core idea is to have multiple VBR servers, running backups for the same infrastructure (even the same vCenter). For centralized monitoring, management and restore platform - Veeam Enterprise manager or for just central reporting Veeam ONE should be used.

In this scenario, it’s recommended to start with a single VBR deployment and increase the number of VBR servers if required. That will prevent initial over provisioning and operational complications related to the increased overhead of multiple servers.

The biggest con of the approach is the requirement of dedicated proxies and repositories per VBR (more details below).

The choice between the two options is ultimately up to you how you prefer to scale the environment. However, many people have success by combining both.

High-level example of Scale-Out Design:

When do I need to choose?

As mentioned previously this is highly dependent on available resources in the hypervisor and amount of capacity being backed up. However, we have found that at approximately 5000 VMs these options start to need to be considered.

Please use the best practice calculations to estimate the resource requirements for VBR and SQL.

Protection Domains

Using the scale-out method, the environment is split down into smaller Protection Domains which also overlap with failure domains.

These have the advantage of:

Increased security

Splitting the VBR servers into smaller environments, while using security best practices, means that a breach on one server will potentially affect less of the overall Veeam deployment.

Reduction in restore times

At scale, VBR can take some time to rebuild from a configuration backup, though this is still usually measured in minutes. However, by splitting the VBR servers the restore times for an individual server will be reduced.

Configuration Database

Using PostgreSQL as the configuration database you are able to scale-up to 5000 VMs under a single VBR server. Above 5000 machines, it is recommended to use Microsoft SQL.

Many customers with large environments may have an existing SQL server(s) that could be used. If not, the options are to either include an instance of SQL with each VBR or to create a centralized server. Similar questions to the scaling of VBR need to be asked, though the additional cost of SQL licenses may likely be a defining factor.

If Veeam ONE is in-scope then a dedicated server is likely the best option as the Database requirements can be quite significant at this scale. See the Veeam ONE DB calculator.

Proxy and Repositories

Each VBR instance should be allocated dedicated proxy and repository resources. This is required because each VBR instance will be independent and therefore not aware of another instances’ resource scheduling (e.g. allocation of slots for a proxy or repository).

Where possible, natural divides in the virtual architecture should be followed. At the most basic level, this could be at the site level then down to cluster level if there is more than one in the environment.

When using Backup from Storage Snapshots (BfSS) there will be an increase in SAN network connectivity as each proxy will require its connections. Also, when configuring jobs, consideration will need to be made to not have jobs that target the same datastore at the same time. In this case the jobs schedules should be factored into the planning.

Centralising Management and Monitoring

In a scale-out approach, it will be important to centralize as much of the operations and monitoring as possible. Using Enterprise Manager or Veeam Service Provider Console is essential for centralizing and delegating many of the operations.

Veeam ONE is highly recommended as it will remove a significant amount of the operational overheads in monitoring and reporting on a solution of this scale.