Considerations of Current and Emerging Transportation Management Center Data

Chapter 5. Policy Considerations

This section covers some of the many policy considerations that will impact departments of transportation (DOT) when considering a data source, business model, or procurement strategy.

Acceptable Use Terms and Conditions

The data-use agreement is a cornerstone of any public-private data partnership. This agreement governs several key components of the partnership including, but not limited to:

Who can use the data within the agency?
Who can data be shared with and in what format—raw, aggregated, anonymized, etc.?
What, if any, attribution is required when publishing data and data products?
What type of processing and publications are allowed—research, reports, public traveler information, operational use only, etc.?
What is the expected level of accuracy and reliability of data?
How is the data validated?

Acceptable use terms and conditions are new to many agencies. Traditionally, agencies procured and deployed their own infrastructure and received data as the byproduct of that deployment. Because of this, agencies had full control over data: they could store it or share it with other agencies, universities, or the public; use it for realtime operations, planning, or studies; or dispose of it if they deemed it not useful or worth the cost of storage and management. In fact, many agencies were often unaware of all the data they were collecting. As technology advanced, the value of data became clearer as it supported information-driven decisionmaking. The private sector seized the opportunity to generate and monetize transportation data to support agencies in their operations, planning, and performance management efforts.

As agencies began to work with the private sector data providers, they faced questions they never had to think about or answer. Those questions became key components of data partnerships that could either make or break the agency's ability to achieve its transportation systems management and operations (TSMO) mission. Early partnerships often resulted in agencies paying a high price for data that they could have generated with their own infrastructure, effectively paying a vendor to access the agencies' own data. Other times agencies obtained valuable data, but found that overly restrictive data use agreements prevented them from doing anything useful with that data.

Over time, agencies learned that a successful partnership required freedom and flexibility in data use on the agency side, while protecting the proprietary nature of a private partner's data sets. Model agreements ensure the following key components:

Ability to utilize data for any purpose within the agency.
Pay once, use many times.
Share data with trusted partners or the public while protecting the proprietary nature of the data set.
Tie contract payment to vendor performance.

Successful acceptable use terms and conditions are a result of a meaningful effort to understand agency internal needs across different departments, potential uses and interactions with partner agencies, private sector partners, academia, and the public. However, it is not a good idea to develop these types of data use agreements in a vacuum. Instead, agencies must consider other successful model agreements and engage their peers in discovering best practices, lessons learned, and common pitfalls. Finally, agencies must understand that data use agreements are a two-way street, and a path to a true partnership with private sector companies requires mutual benefit and respect.

Data Verification and Validation

No data is perfect. However, it is critical for agencies to understand what data is and is not capable of, and use it in an appropriate manner. To accomplish this, agencies must understand the problem they are trying to solve, and work with the private sector to understand how a data set may help solve that problem. Similarly, private-sector companies must manage expectations and provide data of satisfactory quality. This section discusses important aspects of verification and validation and how it can or should impact procurements.

A true data partnership includes a collaborative approach to problem solving. Agencies define problems they are attempting to solve and provide a set of goals they are attempting to achieve. Private sector partners evaluate the fitness of their data set in context of agency goals and develop data packages and data use agreements that provide the capability to solve those problems and reach the end goals.

Private sector data providers continue to innovate and improve their data packages. As part of that innovation agencies must consider potential uses of new data packages and work closely with their private-sector partners.

A benchmark for data performance is necessary for a two-way partnership to work. This benchmark may consist of many different components, including:

Data delivery interval and definition of acceptable data gaps.
- If data delivery happens once a minute to support realtime operations, how many missed intervals negatively affect realtime operations? How are those data gaps treated? Are they replaced by modeled or archived data?
Geographic data coverage.
- Does data consistently provide coverage of the agreed-upon network at the required spatial granularity?
Data accuracy and responsiveness.
- How much data is allowed to deviate from ground truth? Is there a sliding scale depending on the type of the road or environment? How quickly does data need to reflect changes occurring in the field?

For these components to be verified and validated correctly, it is important that a neutral third-party performs the verification and validation. While it is okay for the vendor to provide internal benchmarks, and for the agency to define expectations in a contract, neither party participating in the contract is in a position to fairly evaluate the performance of the data.

Impartial third parties that can perform verification and validation can be universities that have the capabilities and expertise to analyze data and provide objective evaluation. Alternatively, other private parties could perform this evaluation, but it is imperative that those private parties avoid conflicts of interest.

Validation efforts must be comprehensive, consistent, and timely. These efforts require data analysis expertise and the ability to not only process data and compare the results to benchmarks, but also understand the trends in data to ensure that data is consistent and reasonable. Given the critical nature of this data, it is important that validation processes are fast enough to identify any potential issues. This will allow the providers time to either correct them or for agencies to exercise their contractual options.

Payment Terms

This section discusses how companies (and agencies) prefer to structure payment terms and conditions, and how those terms and conditions can affect the price of the data. It covers more innovative terms and conditions that incorporate payments based on the quality of the data.

Most public agencies are comfortable with the traditional infrastructure procurement process. For the most part, agencies procure technology or infrastructure through a bid process, install it in the field (either by in-house staff, vendor, or contractor), and maintain it over the life of the equipment or infrastructure.

Data procurement is different because a third party generates the data and agencies are exclusively a consumer of that data. In this arrangement, there are several approaches to procurement and payment terms.

Raw Data Procurement

Agencies can work with third-party providers to purchase raw data. In this arrangement, agencies can purchase data in several different ways:

Realtime only: Agencies obtain access to a realtime data feed that they can consume and store as they please.
Realtime and archived data: In addition to obtaining a realtime data feed, agencies can purchase some amount of archived data (i.e., one or more years) delivered in a one-time data transfer.
Aggregated data: In some cases, agencies may prefer to purchase data aggregated up to a certain level to support their existing operational capabilities. This is generally not an ideal approach as it reduces the agency's capabilities in the long term if they identify other uses for the data in the future.

The number of data points purchased often determines the payment terms for raw data procurement. For example, segment basis determines probe vehicle data marketing. Agencies can choose to purchase a certain number of segments, often covering certain geographic extent (State, county, and municipality) and/or network type (freeways, arterials, etc.).

In some instances, raw data procurement cost may be dependent on the number of users. For example, a State DOT may purchase data for use by a certain number of users at the DOT and for an additional cost, add partner metropolitan planning organization (MPO) users. This approach is less common as it is difficult to predict who would be using the data and when, and may be difficult for the third-party providers to enforce.

Purchasing raw data is a cost-effective approach when an agency has internal resources to process that data and make it useful. The cost of data in this arrangement may be low due to the competitive nature of the market.

Data and Service Procurement

In addition to purchasing raw data, agencies may acquire additional data services. For example, they may purchase software-based dashboards, analytics, or decision support systems. This approach reduces the effort (and potential cost) for agencies to make data useful and actionable. The tradeoff is that the service may package data in a specific way that reduces flexibility when it comes to use of that data. For example, a dashboard may only be capable of displaying several realtime operational metrics; even if the agency has additional metrics they want to monitor that could be produced using third-party data. The good news is that, in a competitive market, third-party providers are often willing to work with agencies to develop additional capabilities and tools.

Another similar approach is through partnerships with other private companies or universities. For example, an agency may purchase raw data from the provider and a data analytics platform from a university partner who works with the provider to develop needed capabilities.

Purchasing raw data and associated services is generally more expensive than purchasing raw data alone, but may be more cost-effective as agencies often have major resource constraints or lack of specific expertise to process data. This has been one of the most common cost models over the last several years.

Single-Use Data and/or Service Package

Raw data purchase often allows agencies perpetual access to purchased data, but in some instances, providers may offer data and service for a specific single use. This may be a particular study or analysis, or it could be to support a specific operational strategy implementation. In these cases, the provider only supplies the necessary data for this specific use, often aggregated, calculated, and packaged as a final result rather than as a raw data set.

These single-use data/service packages are not common, but have been a way for agencies to "test-drive" a new data set or service or to satisfy an immediate need at a relatively lower cost than if they procured the entire data set and analytics platform. However, the major drawback of this approach is that procurement satisfies a very narrow need and the agency does not have ability to leverage this investment elsewhere (i.e., for another department) or at a later time (i.e., for a similar study in the future). This approach is not sustainable or cost-effective in the long term.

Bartering

In some instances, agencies can work with third-party data providers to obtain data in exchange for advertisement or attribution, or in exchange for agency data that may be of value to the private sector partner. This has been a popular model with crowdsourced data providers, such as Waze. Waze is capable of generating data on current conditions, but has no way of informing their customer of planned closures. In this arrangement, Waze partners with an agency to provide crowdsourced realtime data to support agency operations in exchange for receiving and displaying planned event and planned closure information. This approach does not involve any monetary exchange, but is mutually beneficial to both partners.

While this approach has some clear benefits, it does severely limit the agency's ability to control the quality, timeliness, or any other aspect of data delivery. There are no specific service-level agreements in place and partners are free to change their data interfaces as they see fit without agency approvals since there are no strict contracts or agreements in place. So far, this model has been exclusive to crowdsourced data and has not been prevalent among many third-party data providers.

Data Quality

In most procurement arrangements (except with bartering), agencies have the ability to structure payments based on data quality and performance. As part of third-party provider contracts, agencies can tie their payments to the results of the validation process to ensure that the provider performs as expected. This allows protection for agencies throughout the contract while encouraging providers to continue to innovate and improve their data products to remain competitive.

Data Management and Maintenance

When it comes to data, a core asset for agencies today, agencies must consider policies surrounding short-term and long-term data management and data operations and maintenance (O&M). These policies include cost considerations, technical considerations, and agency information technology (IT) policies that might impact management.

One of the key decisions agencies must make in today's data management ecosystem is whether to store/manage data in-house or utilize one of the various hosted options. The implications of this choice can have long-term impacts on costs, control, and utility of data.

In-House Versus Hosted Options

To store and process their data, agencies previously relied on in-house systems. With the emergence of third-party provided data, agencies are facing a choice of hosting data in-house or using a hosting service (with either the data provider, trusted third party partners host, or a commercial hosting solution). There are benefits and drawbacks for each approach.

Storing and processing data, at a minimum, requires computing resources, IT management resources, and data management experts (e.g., database administrators, analysts, and developers). Traditionally, most agencies have IT departments that provide IT infrastructure management and access to computing resources. Similarly, agencies have in-house transportation experts with varying levels of expertise when it comes to data analysis. While this type of approach worked well on traditional data sets that were smaller and easier to manage, currently produced data sets are significantly larger and more complex. Traditional tools, such as Microsoft Excel and small databases, are proving to be inadequate when it comes to management and processing of today's datasets.

Table 17 outlines general pros and cons of an in-house approach versus using hosted options for today's data set types and sizes.

Table 17. Pros and cons of hosting options.
Hosting Option	Pros	Cons
In-House	Full control over data storage and processing infrastructure, strategies, algorithms, and data retention. Good understanding and tight control of budgets to support in- house storage and processing.	High cost of procuring the infrastructure, space, and expertise to manage data and processing. High investment needed in cybersecurity to ensure protection of infrastructure and data. Competitive workforce market makes it difficult to attract and retain talent. Data becomes obsolete due to inability to continue to innovate and stay current.
Commercial Cloud Hosting	Infrastructure management and cybersecurity become the responsibility of the cloud provider. Smaller initial investment since cost of storage in cloud is generally less than the cost of infrastructure and staff needed to store data in-house.	Lack of control over budget and expenses. While the cost of storage per byte may be low, the cost of data transmission may be difficult to estimate and control. Systems with a large number of transactions may end up driving cost significantly. Still requires in-house expertise.
Trusted Partner Hosting	Lower up-front investment. Lower risk due to stronger and more focused relationship between agency and the partner. Dedicated resources and expertise tailored to agency needs.	A university focused on pure research or a company with a single product offering may not be an ideal agency partner.

In-House Data Management

In-house data management solutions provide an ultimate level of control over data and storage/ processing of that data. While this approach provides more control, it also requires a larger investment in workforce and physical infrastructure. For example, agencies may need to hire software developers and data experts who have strong expertise in data management and computer programming to effectively ingest and transform data into actionable TMC information. This can be a major challenge in a highly competitive market for these skills. Sometimes there may be many "data silos" even within a single agency, therefore multiplying the in-house data storage and management cost, while still maintaining barriers when it comes to data control.

Large in-house investments in data management systems can lock those agencies into a rigid system that does not adapt well to changing data sources and the emergence of new data elements and types. This can result in the agency falling behind until another large-scale data management system is procured (often doomed to become quickly obsolete).

Finally, in-house data management systems require not only an initial investment and operations and maintenance considerations, but also associated cybersecurity needs, especially in cases where personally identifiable information (PII) is collected and stored.

Commercial Cloud Hosting

As data became a core asset in many industries, there has been an emergence of commercial cloud providers. These commercial cloud providers offer a variety of services ranging from barebones Infrastructure as a Service (IaaS), to more complex execution environment acting as a Platform as a Service (PaaS), and finally a full blown Software as a Service (SaaS) model.

These commercial cloud providers have identified a market and present a great option for agencies that do not have strong IT capabilities, dedicated technical staff, or sufficient up-front capital to establish their own IT infrastructure. Even agencies that do have these capabilities can benefit from scalability and elasticity of commercial cloud providers. Elasticity is the ability for infrastructure to grow or shrink in response to demand in order to manage load and cost of the system.

The biggest challenge with the use of commercial cloud hosting is that, while the costs may appear affordable when pricing data storage by number of bytes of data stored, the pricing model can get very complex and uncertain when it comes to transactions and processing of that data. For example, it may be cheap to load raw data into the cloud, but very expensive to extract it back when needed to perform after-action reviews or share that data and results with partners. If an agency has made a plunge and invested heavily in a cloud-based system, it may find itself held hostage to that cloud provider, even when costs rise, since the cost of extracting data and capabilities and establishing them elsewhere can be prohibitively expensive or time consuming.

Finally, even if an agency elects to use a commercial cloud platform, the agency still must have internal expertise and capacity to manage the system and develop necessary capabilities in the cloud.

Trusted Partner Hosting

Trusted partner hosting is often a more agency-friendly solution that carries lower costs and lower risks. This approach utilizes a trusted third-party partner, such as a university, sister agency, or a trusted consultant to establish a cloud-based solution. For example, an agency may work with a university to build a system and store data at a lower cost than in-house or commercial cloud, while still having access to expertise and resources otherwise unavailable to the agency. Similarly, universities and sister agencies may be sharing internal State or local networks that provide an additional layer of security and improved performance.

Trusted partners can often provide similar levels of service as commercial cloud providers, but with a more focused mission. For example, a local university that provides a cloud storage and processing system for a TMC often is not also in the business of providing hosting solutions for major retailers and general consumers. This means that the staff is focused on building, operating, and maintaining the TMC system and addressing the needs of the agency rather than a broad spectrum of customers with potentially competing requirements and priorities.

The key to the successful partnership is a mutual understanding and appropriate partner selection. University, consultant, or sister-agency partners must be equipped and prepared to operate a production system and understand TMC needs. A university entity with a pure research focus or a company with a single product offering may not be the best option for an agency that is looking to remain flexible, innovative, and responsive to ever-changing data trends.

Disaster Recovery and Security Considerations

When deciding between in-house and cloud storage, it is important to consider data sensitivity and risk. Cloud providers usually have a different threat profile, but also different threat management mechanisms. This means that cloud providers may be attractive targets for an attacker, especially a sophisticated one such as a foreign government, while a typical agency may not have the same risk. However, cloud providers acknowledge this threat and usually have a robust security policy to both prevent and recover from cybersecurity compromises. On the other hand, agencies may have limited funds and expertise to implement robust security mechanisms.

The nature of cybersecurity dictates that no system is ever 100 percent secure. The key is to balance the risk, vulnerability, and costs. Commercial cloud providers often exercise higher security standards than some individual agencies can afford to do, but even cloud providers are not immune to cybersecurity attacks. In fact, cloud providers have become a point of failure because a sophisticated attacker can focus on taking down a single cloud provider, which in turn could impact hundreds or thousands of cloud customers. Therefore, a TMC utilizing a commercial cloud to store and process data may become collateral damage in an attack on the cloud, even if the TMC would have never been the specific target. Even if TMC data is not especially sensitive, and the risk of a data breach is not as critical, a denial-of-service attack could render a TMC inoperable, with agency staff having no power or influence over resolution.

If a TMC decides to host its data and systems internally, it becomes a slightly smaller target, but is still vulnerable to cybersecurity attacks. Agencies must focus on developing a layered security architecture that contains at least the following components:

Physical security.
- Data centers with secured door access and server rack locks.
- Limited authorized staff physical access and access audits.
- Proper disposal of old hardware.
Encryption.
- Encryption of transmitted and stored information.
Principle of least privilege.
- Any entity or component must be able to access only the information and resources necessary for its legitimate purpose.
Network and computer security.
- Regularly maintained servers, workstations, and software – frequent firmware and software updates, patches, and virus scanners.
- Secure boundaries – firewalls, routers, and intrusion prevention/detection systems, etc.
- Network segmentation – separating different components on the network to avoid compromise of one segment affecting others.
Social engineering.
- Staff education and awareness of security threats that capitalize on human fallacies and cultural and social norms, such as never disclosing passwords or other sensitive information to potential impersonators.
- Strong password policies and multi-factor authentication.

In addition to the security investment, agencies must ensure they have strong disaster recovery plans as well. With recent growth of ransomware attacks, it is critical for agencies to back up their data to several locations and be prepared to automatically switch processing to a backup location, or replace compromised data (corrupted by a hacker or encrypted by ransomware). Backups should be frequent and maintained in several geographically distributed locations that are not connected.

Open Source Software Versus Open Data

The transportation industry is becoming more acquainted with the terms "open source software" and "open data" now that they deal with software and data as core components of their business. However, there is a lot of confusion and misunderstanding of the difference between open source software and open data and in understanding what each term means.

Open source software uses a source code released under a license that allows others to change and distribute the code to anyone for any purpose. Open source software allows collaborative development of software, or building of specialized functionality on top of common core software. Open source software offers numerous benefits, such as lower cost, quicker innovation, sustainability, flexibility, and improved reliability and security. However, it also has certain drawbacks, some of which have major impacts in the transportation industry. For example, advanced traffic management system (ATMS) vendors are in the business of providing proprietary service and software to agencies to allow them to manage traffic more effectively and more efficiently. If agencies require their vendors to open source their software, vendors see that as a loss of competitive advantage over their competitors, but because they want to lock in a contract with an agency, they may agree to open source their software. This appears to be a win-win, when in fact it is a detrimental arrangement. Vendors often provide bare minimum functionality that satisfies client requirements, but have a legitimate concern that exposing advanced capabilities followed by adoption by other vendors will reduce the original vendor's ability to market the product.

Open data makes data freely available to everyone to use and republish without restrictions. Open data can be generated by open source software or proprietary software. Open data can provide transparency in how an agency operates its network and systems and allows the public, which may be indirectly funding the generation of that data, to understand challenges an agency faces and how that agency prioritizes its goals. In addition, open data allows a broader community to contribute and develop innovative applications, concepts, and capabilities that individual agencies either would not have otherwise, or would not have sufficient resources to implement.

The primary challenges with open data include the following:

Privacy.
- Someone must make a decision on what data is available. Care must be taken not to expose private information or reduce the value of data by removing potentially useful elements.
Interpretation.
- "If you torture the data long enough, it will confess to anything" – Darrell Huff.
- Poorly documented or poor quality open data can lack clear context or be complex. Lack of understanding or malicious manipulation of underlying data to advance a specific agenda can lead to inconsistencies, incorrect, or various interpretations that can result in impactful consequences.
Cost.
- Collecting, processing, storing, managing, and disseminating data can be expensive, and if agencies cannot recover the cost, agencies and private sector entities may not be willing to make the necessary investment.
- If a public agency provides open data using public funds, then the question becomes whether it is acceptable for private sector entities to monetize that data by repackaging it or including it in their applications.

While both open source software and open data have significant benefits, agencies must be careful to evaluate those benefits against potential drawbacks and be flexible in their requirements during procurement and contracting.

previous | next