Disaster Recovery Planning for Animal Model Research

An overview of best practices, technologies, and strategies for information security and recovery in animal model research

Opinion Article Contributors:  Chuck Donnelly and Julie Morrison

Summary

Business continuity planning is essential for surviving disasters. Animal model research is particularly sensitive to major loss during a disaster, since animal breeding and genetic engineering are time intensive processes.  Research programs can be set back by years due to the loss of research animal lines or data, that could have been easliy protected.

Modern techologies, in conjunction with careful planning, can mitigate some of the risk or reduce the magnitude of loss.  This article discusses some of the technologies that can be deployed to help manage risk, particularly pure cloud systems, and it introduces some simple steps that can reduce setbacks if disaster strikes.

 

Pure Cloud and MobileIn disaster recovery planning, it is critical to understand the difference between cloud hosted and pure cloud information systems. Cloud hosted systems are simply on-premises (on-prem) systems that have been deployed to the cloud with little or no architectural changes, leaving labs with old, restrictive, and brittle systems. Pure cloud systems are designed and built into the cloud infrastructure, where they inherit and operate on the most advanced cloud technologies. Pure cloud systems typically provide support for SMS messaging and mobile devices, which puts real time data and communication services into the hands of technicians who need to make rapid decisions and share information to dispersed team members securing or recovering a facility. Currently most systems available are on-prem or cloud hosted, but there are a few pure cloud offerings coming on to the market, such as RockStep Solutions' animal informatics platform called Climb.

In most disaster scenarios, facilities that have adopted a pure cloud, and a mobile device policy, will be in the best position. Information systems architected and developed in the cloud have access to advance cloud technologies, such as mobile services, Internet of Things (IoT) monitoring, artificial intelligence (AI), and machine learning (ML). Pure cloud systems with IoT can remotely monitor facilities with anomaly detection algorithms that issue alerts to researchers and technicians when adverse conditions are detected. Integrated mobile communication services provide disaster teams with rapid, context sensitive, communication channels, from inside the data management system. With integrated information and communications, activities can be logged as teams work through the phases of disaster management. While all digital systems require power and networks, pure cloud systems can bypass internal networks, without wall-power, by securely connecting through cellular towers from battery operated devices.

It is also important to understand that not all cloud infrastructure is equal. There are currently three leading cloud providers: Microsoft (Azure), Google (Google Cloud), and Amazon (EC2).  Several other cloud players, such as IBM, Salesforce, and RackSpace also offer quality cloud infrastructure, but at differing levels of service. Microsoft Azure is currently the leader in terms of capabilities for pure cloud systems. When evaluating cloud infrastructure for security in the event of a disaster, request information about how data are distributed geographically and make sure your data are geo-replicated in regions separated by significant distances and locations with low risk for disaster.

Why is the cloud secure?

Preparation Strategies

Disasters are unplanned events that cause massive disruption and damage to a local facility or geographical region. No amount of planning can guarantee against loss when disaster strikes; however, strategies can be employed that reduce risks and speed the time to recovery.

Disasters can be related to climate and weather (environmental), dynamics in the Earth’s crust (geographical), or be caused by humans (carelessness or malicious attacks).  They can be regional or local. They can vary in warning time and duration of the event. Your mitigation strategies will need to take into account both the nature of the event as well as the temporal characteristics.

In preparing for disasters, you need to:

  1. Consider the types of disaster your facility is most susceptible to. Are you in a flood zone or in geologially active region?
  2. Understand your current vulnerabilities. For example, if you are in a flood zone, but your instituation has already constructed large burms, and installed large pumps with backup power systems, you may not need to initially focus as much attention on flood mitigation strategies. However, your analysis of risk may find that all of your animal rooms are located below flood elevations, in which case, you probably want to develop plans for moving your animal facilites to higher grounds.
  3. Create a table, as shown in Table 1, in order to identify your risk profile in each disaster event type.
  4. Focus your mitigation plan on higher probability events with the most significant impact on your research.

 

Table 1: Disaster event types, with special and temporal characteristics.

Event Type

Regional

Local

24 Hrs + Warning Time

No/Limited Warning

Event Duration

Environmental/Climate

 

 

 

 

 

-           Hurricane

 

 

Days

-           Tornado

 

 

Hours

-           Floods

 

 

Days

Fire

 

 

 

 

 

-           Wild fire

 

 

Days

-           Building fire

 

 

Hours

Human Attack

 

 

 

 

 

-           Extremist groups

 

 

Hours

Geographical

 

 

 

 

 

-           Earthquake

 

 

Minutes

-           Tsunami

 

 

Minutes

-           Volcano

 

 

Days

 

Resource Management Strategies

Managing regional disasters has the additional complexity of staff availability. Critical staff may be unavailable to help, as they may be dealing with their personal disaster management or unable to get to work. Regional events may also overwhelm government responders and limit their ability to assist at your facility. In some cases, your building structures might survive the event, and you may have backup power systems turned on, but you may not be able to get your staff on site to care for your animals, maintain your HVAC systems, or provide site security. Your strategies for regional disasters may require on-site housing and stock piles of food and water for staff and animals, in addition to fuel supplies for generators.

Planning for events in high risk areas should include, at minimum, yearly staff drills. Drills should include review of strategies for securing critical animal resources to minimize setbacks in research programs. These drills will also help maintain the general awareness required to respond to a real emergency.

Information Management Strategies

Access to information and communication systems is crucial before, during, and after an event.  Here we consider three classes of information systems based on their infrastructure and ability to be provide real-time information before, during, and after an event. Table 2 shows how each technology may perform in different phases of the disaster, from preparation to recovery.

For events with advanced warning times of 24 or more hours, it may be possible to relocate resources and call in staff to stay on site to assist during the event and in the days following, to assist in recovery.

For sudden events, such as earthquakes, with no warning, access to real-time information systems can help with resource recovery, by providing support for information sharing and communication among team members working to locate and secure critical resources.


Table 2: Comparison of the three major information system types and how they perform in disaster scenarios.

 

Pure Cloud

On-Prem

Spreadsheets / Pen and Paper

Data storage

Automatic geo redundant storage minimizes risk of data loss during regional disasters.

Subject to massive data loss in any disaster, regional or local.

May be stored safely, but easily subject to loss. Single copy may be damaged.

Situational awareness in period leading up to event (if prep time available)

Pure cloud systems integrate mobile services. Provide communication channels with critical staff who may need to quickly develop plans. Communications are secure, logged, and reviewable during and after the event.

 

Have instant knowledge of location and status of critical animal resources which can be moved quickly to safer locations.

 

Securing resources and facilities are coordinated from within the pure cloud application.

Provides a tool for knowing where animal resources are, allows quick movement of high priority resources to safe locations.

 

Communications are typically done outside the application via email or text messaging. Team members may not all be appropriately informed.

Provides minimal support for situational awareness. It may be difficult to find the correct version of a spreadsheet or the notebook. 

 

Inaccurate data may lead to securing the wrong resources.

 

Like on-prem systems, teams rely on email or text messages for communications and may miss important comminutions.

Recovery operations (after the event)

May be able to operate before power is restored to local facility and Intranet is turned back on. Battery operated devices can connect to the data system via cell towers.

 

Cell towers are high priority for regional responders and are thus likely to be brought back on line quickly if damaged.

 

Since data are replicated in geographically dispersed facilities, all data will be available once the system is accessible.

 

In-application communication channels can be used.

Requires power to local machine rooms to be restored and IT resources to reboot and test systems. 

 

Overloaded IT departments may not be able to respond to all emergency data needs on campus.

 

Ad-hoc communication channels may need to be deployed (e.g. peer to peer text messaging).

 

Does not require Internet access. Files may be stored locally. Data recovery is quick if data have not been damaged or lost in the event.

 

Similar to on-prem systems, ad-hoc communication channels may need to be deployed.

Return to nominal operations

As soon as Internet is restored, operations can return to normal.

 

Local power and intranet access is required to be fully up and running.

 

Data are available from any computer allowed on the network.

 

Power must be restored to machine rooms.

 

Intranet needs to be restored. If systems crashed unexpectedly, may need to recover database from backup. 


If there is a time-gap between the backup and current state, significant effort may be required to restore to current data state. It may not be possible if data are not also tracked on paper.

Returning to nominal operations may happen quickly due to low-tech nature of the process and lack of reliance on infrastructure.

 

Recommended Readings

 

Is IoT right for your lab?

Schedule a quick call to discuss your current facility strategies and technical planning.

Be the first to know about cool stuff RockStep is doing!

MS Excel Users

Ask us about our templates to migrate your data!

JCMS Users

Ask us about our JCMS migration tool!

Recent Posts