Office of Operations
21st Century Operations Using 21st Century Technologies

Data Quality White Paper

3.0 Use of Data Quality Measures in Existing Systems

As more traffic data have become available, the issue of data quality has become a greater concern. Furthermore, the advent of private services that provides real-time traffic information made the issue of data quality more complicated. Traditionally, the public and private sectors generally have differing points of views and consequently differing expectations with regard to traveler information systems. The public side installs and operates roadway sensors and provides traffic information, while the private side focuses on offering detailed information to travelers. However, more recently, this relationship has become more complicated in structure. ITS technologies are evolving fast and the cost of installation and operation has increased. Thus, in many ways, the fast-paced private sector shares a role in ITS projects including real-time traveler information applications. In the following section, the issues of data quality for both public and private sectors will be covered.

3.1 Public Sector Use

Much of the interest from the public sector has been to deliver an acceptable level of traveler information. Public agencies increasingly rely upon traveler information products to convey transportation system status to travelers. Data quality thus is seen as an aspect of their services that instills a sense of public confidence.

The National ITS Architecture provides a common structure for the design and implementation of ITS including traveler information applications. The National ITS Architecture defines the functions, the interfaces and information flows, and the communication requirements for the flow of information. In addition, the National ITS Architecture identifies and specifies requirements for standards needed to support national and regional interoperability and product standards. The formal definition of the physical interfaces and information exchange requirements are included in these standards. The National ITS Architecture played a key role in establishing the initial data quality standard for ITS applications [Mitretek, Developing Traveler Information Systems Using the National ITS Architecture. 1998, Prepared for Federal Highway Administration: Washington D.C.].

511 is the nationally available telephone number that provides travelers access to traveler information in nearly every state in the country. The traveler information service being provided at the state and metropolitan area levels provides real-time traveler information using telephone and internet web sites. Information quality is a major concern for each of the 511 deployments. The Implementation and Operational Guidelines developed by the 511 Deployment Coalition employed the accuracy, timeliness, reliability, consistency of presentation, and relevancy of information as important parameters for content quality and consistency across systems [511 Deployment Coalition, America's Travel Information Number: Implementation and Operational Guidelines for 511 Services. 2003., 511 Deployment Coalition, Deployment Assistance Report #7: Roadway Content Quality on 511 Services. 2003.]. The recommendation of the 511 Deployment Coalition for each attribute is as follows:

Accuracy

Reports are recommended to contain information that match actual conditions. If the system reports construction events that are not occurring (or worse, does not report a construction event that is occurring) or a road closure is not reported, callers will start to distrust the information provided. If inaccuracies persist, callers will discontinue their use of 511.

Timeliness

Closely related to accuracy, information provided by 511 is recommended to be timely to the greatest extent possible in accordance with the speed of changing conditions. While it is recognized that non-urban areas may have more difficulty collecting, inserting and updating information quickly, it is recommended that every attempt be made in both urban and non-urban areas to update information as soon as there is a known deviation from the current route segment or service report. Thus, the timeliest reports are based on changing conditions and not on regular interval updates.

Reliability

Often, transportation management systems are staffed during normal working hours. But travelers use highways 24 hours a day, 7 days a week. In fact, often the most challenging travel conditions are at nighttime and on weekends. Methods must be developed to provide callers with a reliable stream of information 24/7.

Consistency of Presentation

It is recommended that reports use the same, or similar, terminology to describe conditions. Lack of consistent terminology leads to misunderstanding and confusion amongst callers and consistent terminology will make the system more usable as users move from system to system. The use of existing and evolving standards, such as the TMDD and SAE J2354, for messages enables this consistency.

Relevancy

The information that is provided needs to be relevant to the caller given their location, modal choice and/or actions they may need to take as a consequence of weather and road conditions or service disruptions.

Good data quality is required to provide quality information to travelers. Data sources for 511 are generally provided from the state DOT, the highway patrol and police departments, transit agencies and sometimes local jurisdictions and private companies. The traffic data are generated from direct measurement and estimation. The report Roadway Content Quality on 511 Services provided the recommended quality guideline for 511 as shown in Table 4 [511 Deployment Coalition, Deployment Assistance Report #7: Roadway Content Quality on 511 Services. 2003.]. The report also included recommendations developed by Caltrans as part of its TMS Detection Plan stating that for speed the accuracy level should be within +/- 5 mph over 30 second intervals with 99% availability for speed and 95% accuracy between detection points for travel time with 95% availability.

Peirce [Peirce, S. and J. Lappin. Why don't more people use advanced traveler information? Evidence from the Seattle area. Presented at 83rd Annual Meeting of the Transportation Research Board. 2004. Washington, D.C.] investigated the end user acceptance of traveler information using 2003 survey data that were collected in the Seattle metropolitan area. The study found that only 10% of the travelers utilize traveler information accessed via the internet web sites, TV, radio, VMS and less than 1% of travelers make a change in response to traffic information. The study also presented six factors affecting the decision to use traveler information as: the broader regional context, awareness levels, trip characteristics, information quality, the presence of delays, and the availability of alternatives. The study concluded that improved data quality and sufficient geographic coverage could increase ATIS usage with user demands.

Table 4. Data Quality Recommendations for 511

Applications

Data Quality Guideline

Traffic Data

  • Data from general purpose lanes and special purpose lanes (e.g., high occupancy vehicle lanes) should not be mixed;
  • No more than 15% mean error in reported data (e.g., a true 60 MPH average speed being reported between 51 and 69 MPH);
  • No more than a five minute delay in data (e.g., data collected at 6:00 p.m. should be available on the 511 service by 6:05 p.m.); and,
  • Data should be available for a given road segment at least 90% of the time, on average (e.g., equipment and communications failures should result in no report being available for a road segment for no more than 876 hours throughout the course of a year).

Incident/Event Data

  • No more than 10 minutes from the time an incident/event occurs to when it is available in a 511 service.
  • Incident/event reports are verified in some fashion prior to being included in 511 messages.
  • Incident/event report information (such as location, nature, severity, duration, etc.) is fully accurate in at least 85% of the reports.

Weather Data

  • Conditions (fog, dust, snow, etc.): 95% accuracy and 99% availability.

3.2 Private Sector Use

While many public transportation agencies provide traffic information to travelers, private sectors are also involved in the dissemination of traveler information. Convenient accessibility of traveler information attracts travelers to use traffic information services from private firms. In addition, various public-private partnerships encourage private partners to participate in traveler information system deployment. The major roles of the private sector, as an information service provider (ISP), are to collect basic traveler information from public agencies, supplement it with additional information, process and combine it for presentation in useful ways, and use it in the derivation of information to provide added services [Mitretek, Developing Traveler Information Systems Using the National ITS Architecture. 1998, Prepared for Federal Highway Administration: Washington D.C.].

Although the technology for generating and providing sophisticated traveler information services exists, the marketing of these services is relatively new. Internet map service providers such as Google Maps, Yahoo Maps, Mapqest.com, and Microsoft Live Search Maps provide real time traffic conditions with additional traffic information. Also private data provision services such as INRIX and Traffic.com are now being used as the principal data source in a few traffic control systems offering more complete services.

For example, Google Maps shows real-time traffic information across major US cities. Google Maps illustrates a layer that colors the roads in green, yellow, red, or gray. The colors represent how fast the traffic is moving as follows:

  • Green: more than 50 miles per hour
  • Yellow: 25 - 50 miles per hour
  • Red: less than 25 miles per hour
  • Gray: no data available

The traffic data that are provided for major highways is aggregated from several sources including road sensors as well as car and taxi fleets. Google Maps is not the only company providing real-time online traffic data. Yahoo Maps and Mapquest.com also provide real-time traffic information services and provide symbols designating specific traffic incidents. Recently Microsoft released its latest software technology called "ClearFlow" through Live Maps in April 2008. ClearFlow which was developed using an artificial intelligence algorithm provides real-time traffic data to help drivers avoid traffic congestion including major arterials. Clearflow predicts traffic patterns, while taking into account traffic congestion, and then reflects the back ups and their consequential spill over onto city streets [Wikipedia. Live Search Maps. 2008 [cited April 14, 2008]; Available from: http://en.wikipedia.org/wiki/Live_Search_Maps., Markoff, J., Microsoft Introduces Tool for Avoiding Traffic Jams, in The New York Times. 2008: New York.].

INRIX, a private data service provider, aggregates and enhances data from hundreds of sources to provide comprehensive traffic data information including real-time reporting of traffic flow information and improved quality through proprietary error detection and correction of individual road sensors. In addition to real-time traffic information, INRIX provides a dynamic predictive flow service. INRIX traffic speed prediction algorithms includes short-term predictions (next 2-3 hours) using current traffic, weather forecasts and other metadata impacting traffic, medium and long-range predictions (days, weeks and months ahead) using weather forecasts, school, construction and event schedules, error detection and correction of real-time flow data [INRIX. INRIX: The Leading Provider of Traffic Information. 2008 [cited April 15, 2008]; Available from: http://en.wikipedia.org/wiki/Live_Search_Maps.].

Traffic.com also provides comprehensive real-time traffic information for major US metropolitan areas including main arterial traffic information. In addition to traffic condition information, Traffic.com offers estimated travel times based on real-time traffic conditions, incidents, construction, events, and mass transit information. The service includes an alternate drive feature which suggests an alternate route based on real-time traffic conditions including current estimated delay information when the major route is congested.

While numerous private companies provide traffic information via a number of media, including the Internet, cell phones, radio, satellite radio, and television; the accuracy of their information has not been systematically verified. Table 5 illustrates and summarizes the data quality attributes of private sector traffic information services.

Table 5. Data Quality for Traffic Information Services from Private Sectors

Data Quality Measures

Traffic Information

Accuracy

 

  • No comparison with ground truth data
  • Accuracy level: Illustrates 3 to 4 levels of congestion condition (good, mild delay, congestion)

Completeness

  • Percent complete 100 %

Validity

 

  • No comparison with ground truth data
  • Error detection and correction by INRIX, however has not been documented

Timeliness

 

  • Percent timely data: 100% (24 hr/7 days)
  • Average data delay: Updated less than 5 minutes

Coverage

  • Major U.S. cities (Highways and Arterials (Limited service available))

Accessibility

  • Access time: Real time

In many cases the algorithms used by the private sector are not made public, presumably to promote competitive advantage and brand differentiation. Verification and validation of the information products in this environment is extremely limited. Some method needs to be devised that protects the private sector investment in data quality algorithms while also providing their customers adequate assurance that the data and information products are indeed valid.