An often overlooked, but critical component of disaster recovery (DR) solutions is testing. In an interview with HFMWeek, Bob Guilbert touched upon the topic of DR testing. In the discussion, Bob noted that “the best approach that funds can take to ensure an effective disaster recovery system is to test them periodically.” Lisa Smith, a Certified Business Continuity Planner here at Eze Castle, also echoes this advice in her conversations regarding inclement weather business continuity planning.
If regular testing is a critical component of an effective DR solution, why do many firms fail to do so? In working on the Eze Disaster Recovery team, I have heard a variety of reasons from clients as to why this is the case. The most common reasons include:
a lack of time to commit to DR testing;
a lack of understanding as to how to go about testing their solutions;
and a belief that testing could hinder normal business operations, and is therefore too risky for the firm.
Here at Eze Castle, we strive to educate our clients on different testing methods so that they find an option that best meets their unique business requirements. Current technologies allow IT providers to overcome just about any objection users may have with regards to DR testing.
As we continue to educate our hedge fund and investment firm clients on disaster recovery testing methods and the available options, we have seen a drastic increase in the number of clients that elect to implement regular testing procedures. More and more clients are requiring their full user base to test as well. Additionally, we are hearing from a growing number of hedge fund clients whose investors now require them to test their DR solutions and report on the results from those tests. This trend is putting more pressure on firms to participate in regular testing than ever before.
Following are some FAQs on DR testing, including common questions we hear from our hedge fund clients:
Q: Why should we test our DR system?
A: Testing helps ensure that the DR site meets your current business needs. We often find that firms grow, evolve, and change their production environments as they develop and expand their businesses. In turn, the DR site needs to evolve and change simultaneously in order to continuously meet business requirements. By regularly testing their DR sites, firms can ensure all of these needs are met and they will be fully prepared to continue operations should a disaster occur.
Q: What should we test?
A: Users should verify that they have the functionality needed to work successfully from the DR site during a disaster. At Eze Castle, we typically recommend that users think about their daily workflow when testing to help ensure all critical applications and data are available in the event of an outage. Results should be documented and provided back to your DR provider so that they can assist you with resolving any issues that may have arisen during the course of the test.
Q: How often should we test our DR site?
A: We recommend that testing be done at least twice per year. Typically, DR solution agreements will include regular testing as part of the service package, and we strongly recommend all firms take advantage of this component.
Q: What happens if we do not test?
A: If a firm fails to test its DR systems, they run the risk of the DR site not meeting current business requirements during a disaster. This could mean major outages for the company, resulting in severe business losses. Regular testing helps to capture the ever-changing requirements of a business, so that gaps or issues can be addressed in advance of a disaster.
Q: Will testing our disaster recovery site disrupt our production site (and therefore normal business operations)?
A: Some firms believe that the only way to test DR systems is to do a full failover from production, work solely out of the DR site, and then return back to normal production—a process that can be quite risky. However, the reality is that many disaster recovery solutions (including Eze Castle’s!) can be tested in a manner that avoids any disruption to the production environment. Most firms prefer the “throwaway test” method, in which any changes that are made during testing are overwritten once the services are stopped in DR and replication is resumed. This requires less coordination with user groups, and therefore has minimal impact on the firm’s daily operations.
Editor's Note: This article has been updated and was originally published in March 2011 by Holly Plumley (Eze Castle Integration).