Office of Operations
21st Century Operations Using 21st Century Technologies

Appendix G. Technical Considerations and Vulnerability Improvements

Introduction

ITS are vulnerable to a variety of disruptions, both naturally occurring, such as hurricanes and earthquakes, and man-made, including power outages and hazardous materials. To ensure uninterrupted functionality of ITS technologies, it is necessary to plan for such disasters. Planning for supply and other varieties of disruptions involves the design and implementation of a backup system that duplicates some of the most important functions of the original system; planning for a large-scale community-wide incident; and the management, testing, and documentation for backup systems to ensure their functionality in case of primary system failures.

Backup System Strategies

If the primary TMC site becomes unavailable, an alternate site should be identified for a backup system, for which plans have been documented. Four alternate site possibilities exist. Each provides different features and relative costs. Ordered from most to least costly, they include redundant site, hot site, cold site, and cooperative agreement. Additionally, there are derivatives of each type.

A redundant site entails a second operations center that is always standing ready with the hardware, software, and communications infrastructure already in place and running. This backup site may take over operations at any point needed, without any specific startup to execute. With the availability of a redundant site, part of the operations may always be run from the second site. In this scenario, the only difference that would occur during a system outage at the primary site would be that all staff would work from the one operations center rather than being spread over two installations. While the fastest mode of recovery, it is normally the most expensive. Another issue with a redundant site is that it normally resides close to the main operations center. If the incident is community-wide, both the primary and redundant sites may become inoperable.

A hot site is an operations center that is set up with all the needed hardware and network infrastructure, but lacking the necessary software for normal operations. Hot sites are normally shared by several different organizations, serving as backup sites for agencies on a first-come first-serve basis. If a community-wide incident occurs, it is possible that the hot site may not be available. Such an incident may require transfer to an alternate site in a different region or the possibility of having no site available to meet agency needs. If open, the hot site is immediately available upon the declaration of an emergency. Before becoming operational, the site must be restored with the operating organization’s software. Hot sites are normally subscription services where an annual fee is paid providing for testing time and the ability to use a site if needed. Use of the site for an emergency is frequently at an additional cost.

Cold sites provide the infrastructure of a building as well as some wiring; heating, ventilation and air conditioning (HVAC); and a private branch exchange (PBX). If movement to a cold site is necessary, rooms may then be quickly filled by the occupying organization with hardware to run operations. Setting up operations in a cold site will take longer than either a redundant site or hot site, but use is generally less costly. Depending upon the exact requirements, a cold site may be able to be delivered to a desired location via a trailer, or may be a quickly constructed building. Frequently, a cold site is utilized after an initial period of time spent in a hot site.

The least costly alternative is a cooperative agreement, a reciprocal accord with either another municipal agency in the region or an equivalent agency in an adjoining municipality. These agreements frequently have little or no cost associated for rental of the space. While inexpensive, exercising cooperative agreements can be difficult in an emergency. Because many agency operations managers are asked to work with as few resources as possible while maintaining a high level of service to their customers, it is uncommon that operations centers would have enough extra equipment and space to enable an influx of all of the personnel and work from another operations center, which may last for a considerable amount of time. Also unlikely is the ability to periodically take space from an existing operation to test a contingency plan.

Regardless of the nature of the backup system, several issues need to be addressed:

  • All software licenses must allow for running the system at an alternate location for either tests of emergency situations or true emergencies.
  • Any backup files that exist must also be able to be easily and quickly transported or communicated to the alternate site.
  • Communications lines that normally terminate in an operations center must be able to be rerouted to the alternate center.

Planning

While planning for a backup TMC system, the planner must expect and account for the unexpected. An example of an unexpected system failure was the Great Lakes Blackout in 2003. This blackout was initiated by a brush fire that knocked out a single power line south of Columbus, Ohio. This was followed by the failure of a second power line connecting eastern and northern Ohio, which in turn was followed by the failure of a third power line in northern Ohio due to excess loading. As more power lines progressively started to disconnect from the grid, the failures accelerated. Five power lines between Ohio and Michigan failed about 8 minutes after the first failure. This led to failure of the entire power system around the Great Lakes region, leaving cities from Cleveland to the cities in the East Coast like New York in a profound blackout. This incident was the initiator of a cascading set of failures, leading to a vast swath of 3,700 miles of North America without power. A set of seemingly minor failures acting in concert led to the largest blackout in American history. These minor incidents acting in concert were not expected, eventually having a devastating impact. A compounding of minor incidents leading to major disaster is often true of many community-wide disasters.

During such periods of operational recovery and mitigation, most TMCs have found that significant problems exist with communications. To avoid such problems, state-of-the-art communications practice for TMCs includes the use of multiple communications paths, which ideally avoid systems outages. By maintaining multiple communications paths, the TMC is able to avoid a systems outage based on a single communications outage from a single central office or an individual line. It is also possible that during emergencies and outages, telephone calls from TMCs cannot be completed even if the system is available due to excessive phone calls from system users inquiring about the nature of the incident and the welfare of the area. TMCs could circumvent these problems by utilizing the Government Emergency Telecommunication System (GETS). Another provision that TMCs could use is voice telephones that do not require electricity for operations. Mitigation of power supply problems could be accomplished by having multiple feeds from various power stations or grids. Finally, in case of the entire grid or multiple power station failure, TMCs could utilize Uninterrupted Power Supply (UPS) systems to supply emergency power.

Good planning mitigates the effects of unexpected system failures. An example of good planning and interagency cooperation was demonstrated during the New York City Blackout of August 2003. The I-95 Corridor Coalition contacted member agencies that were not affected by the blackout to post messages informing motorists of the problem. The notification allowed the motorists the ability to avoid the affected areas, helping to relieve traffic congestion in the New York City area. Another method to ensure the functionality of the system during community-wide emergencies is that of working from alternative sites. These sites may include connecting into the system from the staff member’s home or an alternate office arrangement. By connecting into the system from an alternative location, fundamental goals of the TMC may be handled without full access to the operations center.

Testing

Testing of a backup system provides a number of benefits to a TMC. One of the most important benefits that the test provides is an initial and continuous validation of the TMC backup plan. Another important benefit to testing is the evaluation of the effects that external interfaces and changes to the TMC have on the recovery and mitigation plan.

Documentation

Recovery and mitigation documents must be confidential yet widely available to TMC staff. The documentation must have all the information needed to rapidly reconstruct the TMC.

Network layouts, security infrastructure, systems complexities, internal procedures, and complete staff contact lists are some of the critical and confidential information that must be included in a plan. During system outages and emergency TMC relocation, the documentation must be available away from the operations center. Some TMCs make the documentation available through the Internet. In these cases, the documentation must be stored in servers that are not co-located with the TMC. Others centers provide the document in full or in part in hardcopy or on various softcopy devices such as CDs or thumb drives.


June 2010
Publication #FHWA-HOP-09-003