Transportation Management Center Information Technology Security
Chapter 9. Guidelines for Resiliency/Data Protection and Recovery
When developing cybersecurity guidelines, it is important for agencies to work together to share past experiences and best practices. These partnerships can be between organizations within the same agency (e.g., a localities Information Technology (IT) department and Public Works) and among agencies within the same industry (e.g., local and State Traffic Management Centers (TMC)). This will help build an awareness of industry risks and standards and ensures that the most innovative approaches are being utilized and expanded upon industry-wide. These partnerships also can serve as a platform for discovering and applying for Federal resources and incentives.
In 2013, the Department of Homeland Security (DHS) in collaboration with the critical infrastructure community developed the National Infrastructure Protection Plan (NIPP) titled NIPP 2013: Partnering for Critical Infrastructure Security and Resilience which serves as a guide for the national effort to manage risk to critical infrastructure as a collaborative effort. This document presents shared national vision, mission, and goals with respect to managing risk and serves as a guideline for organizations or groups of organizations to develop a collaborative partnership.20
Another platform for developing partnerships that TMCs should consider is any of various applicable Information Sharing and Analysis Centers (ISAC), which are sector-based, non-profit limited liability companies (LLC) developed to help companies across industries collaborate and coordinate via the National Council of ISACs (NCI).21 Some examples that may be of interest to TMCs include:
Additionally, IT-ISAC has developed industry Special Interest Groups to combine peer member companies within similar industries, such as the Food and Agriculture industry. A TMC would benefit by being an active member of the ISAC where indicators of compromise or events of interest could be shared within their sector and could lead the development of a TMC or Transportation Industry special interest group. When agencies/parent organizations belong to multiple ISACS, communications should be coordinated within the parent organization to establish a protocol for managing communications with the various ISACS to prevent duplicate or conflicting communications.
Each of these sample ISACs was developed to help companies across each associated sector manage risks through information sharing and analysis.
Risk Management Plan
Building upon the individual predecessor efforts above for TMCs that are lacking a risk management plan, the figure from NIPP Supplemental Tool: Executing A Critical Infrastructure Risk Management Approach provides a concise path to support those that are beginning to implement a risk management approach to cybersecurity to incorporate aspects of the National Institute of Standards and Technology (NIST) 800-37 Guide for Applying the Risk Management Framework. It is unrealistic to expect complete prevention of all vulnerabilities. Risk analysis is used to identify where the greatest risks/weaknesses exist, and a risk management plan is used to determine courses of action to mitigate and manage those risks.
(Source: NIST 800-37 Guide for Applying the Risk Management Framework.)
Related to the figure above, within the previous sections of this report, inventory and identification of hardware, software and network elements in the TMC has been reviewed to satisfy the Identify Infrastructure component of this process. For TMC/organizations that are unfamiliar with performing risk assessments, NIST 800-30 is a Guide for Conducting Risk Assessments.23 For the TMC environment, organizations should directly assess the risk and impact of a field equipment breach, an operator workstation breach, a server breach, a network access breach, and the categorization of the data systems that are the most critical to daily operations.
The outcome of the risk analysis will allow the TMC management the ability to develop risk-based security governance for the risk management plan, setting goals/objectives, and implementing risk management activities. Throughout the process, information would be fed back to support follow-up activities and engaging collaboration with IT and Operations Technology (OT) staff.
While Center for Internet Security (CIS) Controls noted up to this point all play a part in an agency's overall risk management strategy, the following activities relate specifically to the Measure portion of the risk management approach process flow by evaluating the performance of organizational plans/strategies. The intent is to establish a program that is fundamentally part of the agency's routine processes.
All guidelines suggest every agency should complete a formal risk assessment and develop their own resiliency plan. However, low levels of resources can make this challenging, especially in the preliminary stages of plan development. As many of the best practices identified above should be implemented at a minimum using CIS Top 20 and NIST Framework as a baseline.
Incident Planning and Response
It is not a matter of whether systems will encounter a threat, but when. In preparation for this occurrence, CIS Control 19: Incident Response and Management recommends the development of an Incident Response Plan. The Incident Response Plan describes an approach for responding to information security incidents. It defines the roles and responsibilities of personnel, characterization of incidents, communication plans, and reporting requirements. The goal of the Incident Response Plan is to provide guidelines to manage the response process in an effective and consistent manner.
The document also should reference the organization's Business Continuity Plan to appropriately categorize critical assets which assist in assigning severity levels to incidents. This also will help organizations work toward the development of organization-wide incident reporting standards with respect to time allowances, means and methods, and reporting requirements (e.g., whether the incident should be reported to an individual within the organization or third-party authorities).
The document should be written down for ease of access and should be used to train incoming staff. It is intended that this will be a living document to be modified to incorporate industry risks and best practices as they change. As part of this ongoing revision, an industry best practice is to execute periodic table top exercises to test the Incident Response Plan. The results of these exercises will help organizations identify gaps and outlying vulnerabilities and fine-tune their response procedures. The CIS Controls identify the actions that are recommended for an effective Incident Response and Management control, but specific examples are not provided. This is where NIST documentation is a useful supplement to implementing the CIS Controls since reference samples are available to be modified and tailored to an organization's unique needs and requirements.
Sample Incident Response Process: The National Institute of Standards and Technology Incident Lifecycle
The Computer Security Incident Handling Guide (NIST SP 800-61r2) identifies the Incident Response Life Cycle by expressing it in four phases:24
Preparation involves preparing to respond to potential incidents, by assembling personnel, developing training, and gathering the necessary hardware and software to accommodate the incident responses. During the Preparation phase systems, networks, and applications also are kept secure through the continuous monitoring for anomalous traffic and the tracking and patching of system and application vulnerabilities.
The Detection and Analysis phase is the discovery of an event with security tools or notification by an employee, inside party or outside party about a suspected incident. During this phase the team seeks to determines the priority, scope, risk and root cause of the incident. This phase includes the creation of a ticket with appropriate classification to begin identifying the details of the event. It is important to identify the initial vector, and what data, systems, employees, or regions were affected. A crucial step in this phase is the evidence gathering and handling to help document and preserve a chain of custody for legal proceedings.
The Containment, Eradication, and Recovery
The Containment, Eradication, and Recovery phase is the triage phase where the Incident Response Team's primary objective involves preventing further damage to the victim organization and eliminating remnant of the unauthorized activity from the affected systems. Strategies will vary depending on the incident so it is important to have these strategies pre-determined to help facilitate decision-making.
Post-Incident Activity is critical to improving response sharing lessons learned with all teams involved. It is important to hold these meetings within a few days of the end of an incident. Refer to NIST SP 800-61r2 section 3.4.1 Lessons Learned for more information on what the meeting discussion should include.
Incrementally building a more robust process/program for resiliency is the purpose for developing a resiliency plan to harden systems and facilities to improve the ability to recover from an attack or breach. The technical publication by MITRE titled Cyber Resiliency Design Principles/Selective Use Throughout the Lifecycle and in Conjunction with Related Disciplines dated January 2017 is a resource for aiding in the development of a resiliency plan and design for TMC IT environments.25
The following figure from the MITRE publication shows the relationships and building blocks related to resiliency goals evolving into objectives and ultimately techniques to improve the resiliency of the TMC systems and network environment. Establishing layered defenses is a widely used resiliency strategy relevant to TMCs, particularly when coupled with segmentation/isolation strategies already discussed in this report. The technique most often used by TMCs involves Redundancy (of software, hardware, network equipment, configuration backup files, off-site recovery equipment/locations). Many of the remaining techniques are more sophisticated and thus more applicable to implementation groups 2 and 3. A Resiliency Plan is intended to look at how each of these techniques could be used to address the agency's ability to withstand and mitigate a given threat to the continuity of their operations.
(Source: Cyber Resiliency Design Principles/Selective Use Throughout the Lifecycle and in Conjunction with Related Disciplines, MITRE.)
When responding and recovering from a breach or widespread application failure, the TMC's resiliency plan should contain a prioritization plan for restoring applications in order of greatest criticality. Recovery times and frequency of backups (discussed below) will impact storage capacity and techniques, which should in turn be folded into system upgrade plans for future upgrades.
Resiliency plans for TMCs should contain strategies for central device authentication, malware protection strategies for standard devices, and network segmentation of devices that cannot be controlled at an acceptable level to achieve agency cybersecurity goals. For example, a resiliency plan could account for devices that do not support central authentication and are segmented in the network from others such that just one segment can be temporarily shut off/disconnected in the event of a breach. This type of strategy is an example of how to improve the response time and continuity of operations for an organization's resiliency to an incident.
Any opportunity to reduce the TMC's potential attack surface by reducing open ports and protocols passing between networks should be employed and actively incorporated into the TMC's Resiliency Plan. Finally, as a general guide for developing the Plan, evaluate the least level of privileged access that can be used to satisfy each objective to prevent unnecessarily high-level access to devices or systems.
One specific form of system and data protection that should be included in an agency's Resiliency Plan is malware protection. Part of that Plan should include monitoring known sources of credible information about known threats and vulnerabilities. The Cybersecurity and Infrastructure Security Agency (CISA) incorporates an industrial control system (ICS) element into their Computer Emergency Response Teams (CERT) and posts routine alerts and advisories on their website, which also can be subscribed to via a Really Simple Syndication (RSS) feed.26 Malicious software is a widespread threat across the cyber world that can enter through any number of points such as end-user devices, email attachments, webpages, cloud services, user actions, and removable media. For the TMC world, these entrances can include not only staff daily operations but field devices as well. This agility paired with the speed at which attacks occur and spread make malware a priority risk to protect against with a dedication to keeping the protection current.
CIS Control 8: Malware Defenses provides guidelines on controls to implement and manage malware protection including monitoring and removal. With respect to document control, anti-malware software with anti-exploitation features should be a fundamental consideration to protecting TMCs from malware infiltration as well as data loss from malware infiltration. In addition to scanning emails at the server level, or network traffic passing through the firewall, at the very least implementation groups 2 and 3 also should utilize centrally managed anti-malware software to continuously monitor and defend each of the organization's workstations and servers (sub-control 8.1). All organizations also are strongly recommended to set operating system policies that prevent auto-running applications (sub‑control 8.5) held on removeable media devices (e.g., flash drives, camera memory cards, CDs, etc.).
Data Loss Prevention
TMCs manage copious amounts of data from incident management logs to the massive amounts of traffic sensor data and crowd-sourced data that has both real-time and historical value for operations managers. TMC operators may have experienced the loss of that data from storage device failures, accidental deletion (by users with system privileges set too high), malware that blocks access to servers/data, and/or a corrupt field device that overwrites the central database. However, data theft is the primary focus of CIS Control 13, which is described below. Understanding the importance and value of the TMCs datasets is the first step in data loss prevention.
Data loss prevention (DLP) refers to a procedure or policy for monitoring and detecting unauthorized data breach and exfiltration attempts by monitoring, detecting, and blocking sensitive data. CIS Control 13: Data Protection provides guidelines for DLP techniques. As a first step, it is important for organizations to identify and classify their data. Best practice is to create at least 3 and optimally no more than 5 basic data labels: Create a "General" label that is default (applied to everything) allowing the user to upgrade to a higher classification (e.g., Operations, Supervisory Control and Data Acquisition (SCADA), Sensitive/PCI), or downgrade to a lower classification with explanation.
DLP is a best practice to implement on specific use cases, however sifting through alerts can consume considerable man hours. As an alternative to manually filtering and processing data, commercial DLP solutions are available for purchase and deployment by organizations with limited staff availability. Use-cases for DLP include monitoring for specific data/file types and tags in files or unauthorized use of encryption of files located on the network that have been identified as threats/vulnerabilities by ICS-CERT or others, along with monitoring significant data transmissions outside of the network. Organizations should still plan at a minimum to dedicate staff to reviewing the event logs regularly and following up on events related to attempts to transmit sensitive information without authorization, as well as to tune the tools to manage the volume and types of alerts that are of no consequence or value to reduce considerable person hours required to implement DLP.
Information Rights Management (IRM) are features for document creation that can enforce encryption on all newly created documents and protect files from unauthorized copying, viewing, printing, forwarding, deleting, and editing as an added layer of protection. This allows the document owner to place the correct label (data classification) plus security attributes that could set a timeframe on a document to limit viewing, provide the ability to revoke the document, and prevent the ability to print or forward on to another user. To achieve further data tracking and control, organizations should implement a management system capable of tracking client/user data as well.
Archival/Backup, Restoration, Recovery
If the organization had to start from scratch to recreate the network and digital files tomorrow, does the organization have what it takes? Nightly backups may not be frequent enough for real-time data gathering in a TMC environment. Many TMCs and business IT enterprises have recognized that the loss of certain data sets is too critical to rely solely on nightly backups. For some organizations, device configuration settings and maintenance/repair history changes frequently enough that backing up the files every hour is an appropriate frequency, while for others a nightly backup is sufficient. TMC incident logs, traffic sensor data, and dynamic message sign logs/schedules are elements of operations that would suggest real-time replication to a secondary database server, preferably in a separate physical location. TMCs also should consider external users that rely on information published or transmitted by the TMC. Road closure and incident status information are both real-time indicators that motorists and third-party mapping applications now routinely rely on, and that should have an appropriate level of redundancy and frequency of backups.
Whether part of risk analysis, or a separate enterprise business impact analysis, TMCs should determine the acceptable frequency of backup recovery points and recovery times to maintain continuity of operations based on organizational objectives. As previously discussed under Incident Planning and Response, data restoration and recovery are key components of the incident response process. Doing so without continuing to compromise the network by restoring corrupt or infected data, while also maintaining as much of the data as possible and limiting data loss, is an equally important aspect of data recovery. Part of the response plan should include a procedure to test the backups for existence of malware or other corruption before restoring infected files and causing further issues. Maintaining at least one backup offsite is a technique used to isolate both the primary and the backup from being infected.
Step one in preparing for data restoration and recovery is data backup. Backup types include mirror, full, incremental, and differential. Organizations should analyze the pros and cons of each type of backup to determine the best or most realistic backup schedule by type. Organizations should strive for automated, routine data backup. Once a data backup schedule has been determined, it is important to remember that backup data can quickly take up space on an organization's server and should be accounted for. An organization's policies should include requirements with respect to data backup retention and disposal. Much of this will stem from regulation, legal requirements, or contractual obligations, such as PCI.
For the sake of organization efficiency, it is important that data recovery is also completed in a timely manner. As briefly indicated under Incident Planning and Response, organizations should perform a Business Impact Analysis (BIA) prior to creating a Recovery Point Objective (RPO) and Recovery Time Objective (RTO)—the time it takes to restore data prior to the disruption and the functional restoration of a business service post disruption—to ensure that higher recovery priority is given to data with a higher impact on the organization's functionality.
To verify backup system integrity without risking data loss, organizations should regularly test the system's backup and restoration process on a sample of data in a test bed environment. CIS Control 10: Data Recovery Capabilities recommends performing such tests once per quarter, or whenever new backup equipment is purchased, whichever comes first.
Bacula Systems provides additional information about data backup best practices, along with examples with respect to labeling, schedules and retentions, partitioning, and recovery plans.27
Personal Privacy Information Legislation
Personal information privacy continues to gain more widespread attention. The European Union (EU) established a law pertaining to general data protection regulation (GDPR) to govern the control that individuals have over their personal data. Additionally, similar legislation passed in the State of California called the California Consumer Privacy Act, which takes effect in January 2020. There are groups lobbying/advocating for taking privacy legislation/initiatives to the Federal level. TMCs subject to these requirements should monitor the impacts of this legislation on their own systems that contain personal information from dashboard users, 511 users, or other traveler information systems at a minimum. Data from these users should be scrubbed from backups and protected from the Federal Information Processing Standards (FOIA) requests at a minimum. Many TMCs now get incident alerts/data from State and local police agencies. If that data has not been pre-sanitized to remove personal information before use in the TMC incident management databases, then it should be sanitized by the TMC and scrubbed from archives/backups as well. As noted above with respect to DLP, agencies should consider a category of data related to personal privacy data that be screened or quickly isolated within the broader context of the agency's wide array of data.
20Department of Homeland Security (DHS), "NIPP 2013: Partnering for Critical Infrastructure Security and Resilience," 2013. Retrieved from: https://www.dhs.gov/sites/default/files/publications/national-infrastructure-protection-plan-2013-508.pdf. [Return to footnote 20]
23NIST, "SP 800-30 Rev. 1 Guide for Conducting Risk Assessments," 2012. Retrieved from: https://csrc.nist.gov/publications/detail/sp/800-30/rev-1/final. [Return to footnote 23]
24NIST, "SP 800-61 Rev. 2 Computer Security Incident Handling Guide," 2012. Retrieved from: https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-61r2.pdf. [Return to footnote 24]
25MITRE, "Cyber Resiliency Design Principles: Selective Use Throughout the Lifecycle and in Conjunction with Related Disciplines," 2017. Retrieved from: http://www.mitre.org/sites/default/files/publications/PR%2017-0103%20Cyber%20Resiliency%20Design%20Principles%20MTR17001.pdf. [Return to footnote 25]
27Bacula Systems, "Enterprise Data Backup Best Practices (Prior to installation)." Retrieved from: https://www.baculasystems.com/enterprise-data-backup-best-practices. [Return to footnote 27]
You may need the Adobe® Reader® to view the PDFs on this page.
United States Department of Transportation - Federal Highway Administration