Veeam Replication Job Stuck at 99%

Recently I had the opportunity to deploy Veeam B&R utilising Cloud Connect Replication for a customer to replace their existing DR solution. We were running into an issue with a couple replication jobs that were sitting at 99% for longer than I would expect, in some cases for several hours.

I wasn’t sure what it was doing as there was no network traffic, CPU or even disk usage on the on the source that could be detected. The Veeam job showed no tasks currently underway and  I didn’t want to speak to the Service Provider to check their end until I had verified everything was working as expected at the source so I kept digging.

Examing the job in question shows the below,

Selecting the VM brings up the following,

So everything looked happy except for the 99% part. Was it a UI bug, refreshing the console with ‘F5’ didn’t fix the issue.

Having a closer look at the replicas section though I found that the VM in question was still processing. A quick check with the Service Provider to verify on their end confirmed that snapshots were still being committed on their end for the VM in question.

So it turned the 99% issue was just the target ESXi host processing the retention policy (VM snapshot commit). The thing to remember is that highly transaction VMs will take longer to commit the snapshot on the replica side and this particular VM was a large highly transactional database.

I’m going to investigate if anything can be done about improving this snapshot commit process. I would expect that reducing the retention, therefore, the number of snapshots would be a good place to start. Additionally, we can consider moving the VM replica to a faster tier of storage.

2 thoughts on “Veeam Replication Job Stuck at 99%

  1. Robert Rogers

    Did you get this any faster? I have a 1.27TB sql database that takes ages to commit the snapshots – its only has two snapshots

    Reply
    1. admin Post author

      Unfortunately, the snapshots being committed were on the cloud connect service provider side so being on the tenant side we had very limited control over it. A snapshot commit that takes longer than normal can be caused by underperforming storage and/or larger than normal changes occurring during the backup/replication job.

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *