Question – Is VMware SRM supposed to do an automatic failover?

In a joint customer conversation, Eric Dahan and I were asked “Is VMware SRM supposed to do an automatic failover?”. Excellent question!
I have highlighted the levels of protection VMware added to the this new release in a recent blog – VMware vCenter SRM – VMUG Preview.

At a high level, VMware’s Site Recovery Manager “is the premier tool to enable you to build, manage and execute reliable disaster recovery plans for your virtual environment. Taking full advantage of the encapsulation and isolation of virtual machines, Site Recovery Manager enables simplified automation of disaster recovery. It helps meet recovery time objectives, reduces costs traditionally associated with business continuance plans and achieves low-risk and predictable results for recovery of a virtual environment.?

Prior to Site Recovery Manager 5.0, workflow capabilities included both execution and testing of a recovery plan. With version 5.0, VMware has introduced a new workflow designed to deliver migration from a protected site to a recovery site through execution of a planned migration workflow. Planned migration ensures an orderly and pretested transition from a protected site to a recovery site while minimizing the risk of data loss. Therefore, to answer the question a Disaster Recovery Event has to be triggered. During a disaster recovery event, an initial attempt will be made to shut down the protection group’s virtual machines and establish a final synchronization between sites. This is designed to ensure that virtual machines are static and quiescent before running the recovery plan, to minimize data loss where possible during a disaster. If the protected site no longer is available, the recovery plan will continue to execute and will run to completion even if errors are encountered.

Can a Disaster Recovery Event be triggered automatically? Yes but not recommended. Read Mike Laverick’s post on VMware Communities.

“If you looking for some thing that would allow an automatic failover – you really looking at other vendors 3rd party availability products that allow you to affectively create a “stretched cluster”. The problem with these techs is that they are NOT designed for multisite DR, but component failure – such as the death of an individual host or VM…”

So it brings to the next question, How quick can I be up and working again? I throw the question back to you! How long can your business afford to be down (RTO) and how much data are you willing to loose(RPO)?

In regards to VMware’s SRM, it can handle most any RPO that is right for you. You will need to test how many VM‟s you can start, and how quickly they start up, before you can determine your own RTO!