Office of Operations
21st Century Operations Using 21st Century Technologies

Private Sector Data for Performance Management
Final Report

July 2011

(You will need the Adobe Acrobat Reader to view the PDFs on this page.)

CHAPTER 2. DATA REQUIREMENTS

Data Needs for Congestion Performance Measures

Several reports1, 2, 3 have cataloged numerous performance measures that could be used for congestion monitoring. Rather than reprint all of the tables from these reports, Table 1 documents the congestion performance measures that are currently being used or have been recommended in several major national performance monitoring activities.

Based on Table 1, there are three basic data requirements for congestion performance measures:

  1. Average travel time and speed data are used in nearly all reviewed monitoring activities, with a preference for direct measurement when possible. In the one program activity where it is not used, traffic volumes are used to estimate travel time-based measures. Also, several program activities use day-to-day travel time/speed data distributions for travel reliability measures.
  2. Traffic volumes are used in most monitoring activities to either calculate delay measures or for weighting purposes in averaging travel time/speed data across different roadways and time periods.
  3. Length of road segments is used for several supporting calculations, including calculation of travel times from link speeds as well as in vehicle-miles or person-miles of travel.

There are other parameters (e.g., driving population, value of lost time, congestion thresholds, lane-miles, etc.) that may be necessary for certain performance measures, but these parameters have typically been derived or adapted from existing resources or datasets. Travel time/speed, traffic volumes, and road segment length are the three primary variables that are required for all roads being monitored. Until recently, these three data elements were not universally available on all roadways of interest, so sampling and estimation were used to develop a representative nationwide estimate.

The data requirements summarized thus far relate to monitoring. Performance management involves making decisions and/or taking actions based on performance results, and effective performance management actions cannot be undertaken without understanding why performance is improving or declining. For example, performance measures can tell you that congestion has increased within the past year, but to take action requires understanding WHY congestion increased. Was it increased traffic demand? Was it an increase in the number of incidents, or an increase in incident clearance times? Have the traffic management actions been effective at reducing congestion, and if not, why not? Effective performance management, then, requires other supporting data related to management actions and other external factors (like traffic growth or development).

Table 1. Summary of Mobility Measures and Data Requirements for National Performance Monitoring Activities
National Reporting Activity Mobility Measures Data Requirements
FHWA Urban Congestion Reports (2010)
(https://ops.fhwa.dot.gov/perf_measurement/ucr/index.htm)
Congested Hours
Travel Time Index
Planning Time Index
Average travel time/speed
Travel time/speed distribution (for reliability )
Traffic volumes
Length of road segments
FHWA Freight Significant Corridors
(https://ops.fhwa.dot.gov/freight/freight_analysis/perform_meas/fpmtraveltime/index.htm)
Average operating speed
Travel Time Index
Buffer Index
Average travel time/speed
Travel time /speed distribution (for reliability)
Length of road segments
FHWA Border Crossings (2001)
(https://ops.fhwa.dot.gov/freight/border_crossing.htm)
Delay per truck trip
Average travel time
95th percentile time
Buffer Index
Average travel time/speed
Travel time /speed distribution (for reliability)
Traffic volumes/number of truck trips
FHWA Conditions and Performance Report (2008)
(https://www.fhwa.dot.gov/policy/2008cpr/index.htm)
Daily % of vehicle miles traveled (VMT) in congestion
Travel Time Index
Annual hours of delay per capita
Average length of congested conditions
Traffic volumes
Length of road segment s
Population
Surface Transportation Authorization Act of 2009 (Oberstar Bill , June 22, 2009)
(http://t4america.org/docs/062209_STAA_fulltext.pdf)
Annual Total Hours of Travel Delays
Annual Hours of Delay per peak period driver
Total Annual Cost of Congestion
Speed
Travel Time Reliability
Incident-based Delays
Average travel time/speed
Traffic volumes
Travel time /speed distribution (for reliability)
Length of road segment s
Driving population
Incident data
Value of lost time
National Transportation Operations Coalition (NTOC) Performance Measurement Initiative (July 2005)
(http://www.ntoctalks.com/ntoc/ntoc_final_report.pdf)
Customer Satisfaction
Extent of Congestion – Spatial & Temporal
Incident Duration
Recurring & Nonrecurring Delay
Delay
Speed
Throughput – Person & Vehicle
Travel Time – Link & Trip
Travel Time – Reliability (Buffer Time)
Volume
Length of road segments
Average link travel time/speed
Average trip travel time/speed
Travel time /speed distribution (for reliability)
Population
Incident data
Vehicle occupancy
Customer survey data
Texas Transportation Institute (TTI) Urban Mobility Report (2010)
(http://mobility.tamu.edu/ums/)
Travel Time Index
Delay per traveler
Cost of congestion
Change in congestion
Average travel time/speed
Traffic volumes
Length of road segments
Population
Value of lost time
INRIX Scorecard (2009)
(http://scorecard.inrix.com/scorecard/)
Travel Time Index
Travel Time Tax
Hours of congestion
Average travel time/speed
Length of road segment s

Core Data Elements, Metadata, and Consistency of Different Data Sources

When considering the use of private sector data for nationwide performance monitoring, there are several scenarios in which data from more than one company may be used. For example, consider that Company A wins a two-year contract to supply nationwide data, but is replaced by Company B on the next two-year contract. Will the software built around Company A's data also work for Company B? Will there be an abrupt change in the performance measures trend line that is caused by a switch in data providers? Or, consider the possibility that different companies could provide data for different regions of the country, with Company A providing data for the eastern US and Company B providing data for the western US. With any of these scenarios involving more than one company, it is important to have consistency and interchangeability among the various providers' datasets. This section discusses the considerations necessary for private sector data sets to be consistent and interchangeable.

Location Referencing

Consistent and unambiguous location referencing is critically important when considering traffic data from multiple sources. With private sector location referencing, there is good news and bad news. The good news is that the traveler information industry has largely agreed on a consistent location referencing method called traffic message channel (TMC), which is supported by a consortium of two large mapping companies: NAVTEQ and TeleAtlas (now wholly owned by TomTom). The bad news is that, despite the TMC location references being a de facto standard in the commercial traffic information marketplace, they are not widely used or well known by most public sector agencies. Therefore, any efforts to use private sector travel time/speed data on a statewide or nationwide basis will require the integration of the TMC-referenced road network with the public sector road network.

Core Data Elements

For the purposes of performance monitoring, there are several core data elements provided by the private sector companies that are essential:

  • Date (or day of week for historical data) and time stamp
  • Roadway link identifier
  • Roadway link length
  • Roadway link travel time or speed (average and specified percentiles for historical data)

For the time stamp, a standard definition of time (such as coordinated universal time (UTC)) should be used to prevent confusion and interpretation in various providers' data sets due to different time zones and the varying use of daylight savings time in different areas of the country.

For the roadway link identifier, a separate location table should provide supporting information, such as a qualitative description of the road link, exact latitude and longitude for link endpoints, link direction, upstream and downstream link identifiers, etc. Such a location table already exists for TMC locations and is referred to as the TMC Location Table.

Metadata

Metadata is simply "data about data" and can be any form of documentation that provides supporting information about a primary data element(s). Metadata is especially useful when trying to understand how data was collected, how it has been processed and/or modified, and what the data represents. Metadata can be static for an entire dataset (such as documentation on the blending algorithm used) or it can be dynamic and reported at the individual record level (such as a record-level quality indicator). Although metadata may not be absolutely essential, it can substantially improve the ease of data integration and application development. Therefore, it is important to outline the metadata elements that would be preferred when using private sector data for performance monitoring.

The following metadata elements are all data quality indicators and would be useful:

  • Vehicle probe sample size – The number of vehicle probes that were used in the calculation of a travel time or speed estimate. Typically, sample size is considered to be an indicator of data quality. However, with some probe vehicle types, the sample size may not be the best indicator of data quality.
  • Vehicle probe standard deviation – The standard deviation among the vehicle probes that were used in the calculation of a travel time or speed estimate. When both the sample size and standard deviation are provided, a standard error and confidence interval can be estimated. This is a surrogate indicator of data quality, because in some situations the standard deviation may be more influenced by the variability of free-flow traffic (in which drivers can select their own speed) than by variability among a limited number of samples.
  • Confidence interval or indicator – In lieu of providing sample size and standard deviation, the data provider could choose to calculate a statistical confidence interval internally and provide that in record-level metadata. Or, a generalized confidence indicator (say on a scale of 1 to 10) could be used to indicate relative quality levels.
  • Blending indicator or ratio – A blending indicator is a binary value (YES or NO) that indicates whether a travel time or speed estimate is blended. A blending ratio is a quantitative value (e.g., 50%) that quantifies the proportion of historical vs. real-time data in the travel time or speed estimate. Similarly, the blending ratio could also quantify the proportion of data points from public agency fixed-point sensors vs. data points from probe vehicles. Both metadata could be used to filter out those data values that have unacceptable levels of blending. For monitoring purposes, the key would be to avoid a blend of data from different years.

Temporal Consistency

For monitoring mobility and reliability trends over multiple years, there will need to be consistency in the datasets used for nationwide performance monitoring. This concern about consistency and comparability is not unique to private sector data for performance monitoring and has been addressed in other national data programs through the development of statistically-based data collection and reporting guidelines, best practices, and in some cases, data quality standards.

There are proven technical means (such as standardized data dictionaries and exchange formats) to ensure consistency among several different data providers. Similarly, core data elements and preferred metadata can be defined to make data integration less difficult. However, the temporal (i.e., time) consistency issue for trend data remains an issue even with data standardization.

One approach to address time consistency for trend data is to ensure that every data provider meets certain accuracy and other data quality requirements. If each data provider meets those specified accuracy targets, then fluctuation between different companies' datasets will be less likely. The same quality assurance principle applies for addressing data blending (discussed in the next section): as long as all data providers meet specified data quality requirements, public sector agencies should be able to choose among data providers for the best value. One approach would be to consider purchasing traffic data the same way that concrete or reinforcing steel is purchased from private companies - designate certain quality requirements, test randomly, and only pay for or use that data (or material) that meets quality requirements.

Another approach to address time consistency is as follows: if a new data provider wins a procurement contract, they must provide a "calibration dataset" from 2 or 3 previous years. This calibration dataset is then compared to the previous data provider for the overlapping years, and adjustments are made as necessary to ensure smooth trend lines. This approach was used in the 2010 Urban Mobility Report, which combined 2007 to 2009 INRIX data with 1982 to 2008 FHWA Highway Performance Monitoring System (HPMS) data. In this case, the overlapping years of 2007 and 2008 were used to adjust the FHWA HPMS trend line to match the 2007 to 2009 INRIX trend line.

Finally, the Section 1201 (CFR 23 Part 511) traveler information requirements are a very important foundation for establishing consistency and comparability, in that they establish national standards for real-time travel time accuracy in major metropolitan areas. Once these real-time data are archived, the resulting data archives will have at least the accuracy of the original real-time source data. That is a fundamental principle also included in the U.S. Department of Transportation's (USDOT) Data Capture and Management Program.

Issues Associated With "Blended" Traffic Data

A common practice among private sector traffic data providers is to combine several different data sources and/or data types with proprietary algorithms to produce an estimate of current, up-to-date traffic conditions. This practice is referred to as "blending" or "fusion" and typically each company has their own data blending or data fusion algorithm. In this section, we discuss the possible effects that two types of blending may have on the use of private sector traffic data for performance monitoring:

  1. Time or location blending: mixes average historical conditions (time blending) or nearby locations (location blending) with stale "real-time" data (e.g., from the past hour).
  2. Source blending: mixes private sector (typically probe-based) data with public agency (typically fixed-point sensor) data.

Ultimately, the best way to determine and control the effects of blending is a quality assurance program (see Chapter 4) that ensures that the blended real-time estimates do not exceed some specified level of error acceptable for performance management.

Time or Location Blending

When considering the possible effects of time or location blending, it is useful to differentiate between performance measures based on historical averages (such as travel time index or delay) and performance measures based on distributions (reliability measures such as the planning time index that use percentiles or variation). As a measure of central tendency, a sample mean (i.e., historical average) is more statistically stable than a specified percentile in situations with low sample size and high variability. What this means is as follows: time or location blending is more likely to affect reliability performance measures than measures based on historical averages. The extent to which reliability performance measures are affected is difficult to know without conducting experimental analyses with blended and non-blended data.

Source Blending

There is some data source blending that may be unavoidable (and not necessarily detrimental) with private sector vehicle probe data, and that is the blending of different types of vehicle probes. For example, several data providers obtain their real-time vehicle probe data from GPS-equipped commercial fleet vehicles, which could include long-haul freight trucks, package delivery vans, taxi vehicles, construction vehicles, utility/cable/phone service vehicles, etc. Commuter probe vehicles are also becoming more common with the uptake of GPS-equipped smart phones, personal navigation devices, and other mobile consumer devices. Some of these vehicle types have different operating characteristics in traffic, so when sample sizes are small, it is more likely to have a biased estimate of average speed or travel time. When sample sizes are large, it is more likely that different vehicle types will be proportionally represented in the average speed, resulting in less bias. Again, this blending of different probe vehicle types of probe vehicles is unavoidable because, at least in the near future, only a sample of all vehicles will be capable of being monitored.

The other type of source blending occurs when vehicle probe data is combined with fixed-point sensor data. Depending upon the blending algorithm, the fixed-point sensor data may be given more weight than a small number of vehicle probes (using the rationale that a fixed-point sensor measures the speed of all vehicles, rather than just a few samples with vehicle probes). However, it is important to recognize that fixed-point sensors measure the speed of all vehicles at a single point, which may not accurately represent the traffic conditions on other parts of the road link. Even if a vehicle probe reports an instantaneous speed at a single point in time, the vehicle probe reports are likely to be randomly distributed over the length of the road link, rather than at one stationary location like a fixed-point sensor, or one type of location, like adjacent to an entrance ramp meter.

The extent to which blending vehicle probe data with fixed-point sensor data affects the accuracy of travel times and speeds is difficult to know without conducting experimental analyses with blended and non-blended data. It may be possible to specify in a procurement of historical data that fixed-point sensor data (public or private) not be blended with vehicle probe data in the calculation of summary statistics. There is currently at least one private sector data provider that already provides this option of not including public sector fixed-point sensor data in the calculation of historical summary statistics.

Integration of Travel Time and Traffic Volume Data

The selection of the best travel time and traffic volume data sources for congestion monitoring depends on the application scale and context. For example, is the performance monitoring program nationwide in scale for all major roads (freeways and arterial streets)? If so, then the most comprehensive source for travel time data appears to be the private sector, whereas the most comprehensive source for traffic volume data will be the HPMS database or a compilation of state Department of Transportation (DOT) databases.

Now consider the example of a congestion monitoring program at the urban area level where an extensive network of operations-based fixed-point sensors operates on the freeway system. In this example, it could be that the public agency's freeway sensors are the most cost-effective dataset and could be supplemented on the arterial streets by private sector travel time data. Again, it depends on the application scale and context, as well as the quality of existing datasets.

Once the best source(s) of travel time and traffic volume data have been identified and gathered, the geospatial integration of this data will most likely be necessary. In simple terms, that means combining data sets from different sources so that a travel time/speed and traffic volume is assigned to all roadway links. The combining of data sets, called conflation, is made more difficult by the fact that different data sources will likely have different roadway segmentation.

One of several considerations that will have to be addressed during the geographic information systems (GIS) conflation process is the roadway segmentation that will serve as the basis for the monitoring program. This could be the segmentation associated with the travel time dataset, the traffic volume dataset, or an entirely new segmentation defined specifically for performance monitoring. The choice of base segmentation depends on the application context and the particular datasets being conflated, so it is difficult to provide prescriptive guidance. However, the GIS analysis and reporting framework should allow for the base segmentation to be aggregated to several different levels of reporting, such as link, directional route, corridor (both directions combined), functional classes, subarea or relevant sub-jurisdictions, regional, and statewide.

Another consideration after spatial conflation is the temporal harmonization for datasets from different data sources. For example, the private sector travel time data could be provided as 15-minute averages for each day of the week for directional links, whereas the traffic volume data could be provided as an average daily count for both travel directions combined. In this example, 15-minute estimates of directional traffic volume should be estimated to match the travel times.

Through several previous efforts4, 5, TTI researchers have developed an analytical process of estimating sub-daily traffic volumes to match the temporal resolution of private sector speed data. This analytical process was applied at the national level for TTI's 2010 Urban Mobility Report, which combined private sector travel time data with HPMS traffic volumes.6 Similar procedures could be used by state or local agencies with their detailed datasets.

  • Conflation. In GIS, establish the segmentation relationships (e.g., "crosswalk table") between the public agency roadway network and the private sector's roadway network (referred to as the traffic message channel, or TMC, network). This process is known as conflation, and the result is that speeds and traffic volumes are available for all roadway links.
  • Sub-daily volume estimation. Estimate 15-minute or hourly traffic volumes from the public agency traffic count data. This enables each average traffic speed value to have a corresponding traffic count estimate.
  • Establish free-flow travel speed. Estimate the free-flow speed using speed data from low volume time periods. In most efforts, the free-flow is calculated as the 85th percentile speed during off-peak periods.
  • Calculate congestion performance measures. The calculation of performance measures is straightforward once the TMC roadway network is matched with the public agency network in GIS.

July 2011
FHWA-HOP-11-029