Office of Operations
21st Century Operations Using 21st Century Technologies

Evaluation Methods and Techniques: Advanced Transportation and Congestion Management Technologies Deployment Program

Chapter 5: Technology-Specific Guidance

This chapter presents methods and lessons learned with respect to evaluating Adaptive Signal Control, Connected Vehicle technologies, and Automated Vehicle technologies.

Adaptive Signal Control

Adaptive signal control technologies (ASCTs) increase the flexibility of signalized control systems to meet changing traffic demand on key arterial corridors. A wide variety of systems have been developed which alter traffic signal timing dynamically by sensing traffic conditions in real time. These systems have widely varying capabilities and methods for allocating green time between movements. The specific algorithms and methodologies used by ASCT systems will not be examined in this chapter. Instead, this chapter focuses on the analysis techniques, performance measures, data sets, and tools needed to analyze the impacts of an adaptive signal system.

Key areas which must be considered include:

  • Analysis Approaches: What type of approach is right for testing the ASCT? Will the system be evaluated in a real-world setting, or will a simulation model be used to evaluate performance?
  • Data Collection: What types of data can or should be collected? Which data sets are required to support desired performance measures?

Analysis Approaches

The two broad categories for testing ASCT systems are real-world field studies and simulation assessments. In implementation settings, the adaptive signal system is installed and tested against actual traffic, with all the variation and idiosyncrasies which occur in demand from day to day. Such implementations necessarily give the best information about how well the system functions in a given corridor, but disentangling the impact of the signal system from other changes in conditions is more difficult because no perfect "control" scenario can be used for comparison. Instead, a reasonably large sample of data must be collected in order to capture and account for variation.

Analysis Approaches

Field Study
Real-world implementation using before-after or on-off data.

Simulation Study
High-resolution traffic modeling.

Alternatively, simulation studies offer a platform for testing an ASCT system using a control-experiment setup. The exact same demand pattern can be modeled for both the adaptive system and one or several other control systems. Demand can also be varied stochastically, but in a managed and replicable way using known random distributions. Simulation further has the advantage of being a controlled environment which doesn't produce actual negative impacts if a system or methodology leads to dramatically worse outcomes than expected. Since all vehicles are tracked and modeled individually, simulations allow for highly detailed performance measures to be created, some of which would be infeasible to collect in a real-world implementation (such as trip-level performance measures for every vehicle in the simulation).

However, simulations are only representations of how the ASCT system will work. The random variations inherent in actual demand are hard to fully model, so performance once the system is implemented may differ from modeled results. Further, simulation studies require the underlying simulation to be validated against existing conditions. Otherwise, any results generated by the simulation cannot be trusted. Calibration and validation of simulation models is not a standardized process in traffic simulation, and has numerous pitfalls which need to be considered. As noted, including the appropriate level of variation in a simulation model is difficult, and over-calibration is a significant concern. Models which have been tuned too tightly to match a small set of data will not produce realistic results when used to forecast the impact of an experimental treatment.

Within real-world implementation studies, several experimental approaches have been used in previous research. Ideally, a control-experiment approach would be taken (as in a simulation), but this is not possible. The exact same demand is never repeated from one day to the next because daily routines are not perfectly static and needs change from day to day. Instead, some alternating approach must be taken. Two general approaches have been used in previous studies: before versus after and off versus on.

Before-after studies collect data samples under both baseline and experimental conditions, usually prior to and following implementation of the new system. Some before-after studies add further complexity by breaking up post-implementation data into multiple cohorts, creating before-after-long after study setups in order to examine both immediate and long-term changes in traffic patterns. Before-after studies are common throughout transportation engineering. The critical concern for such studies is data collection and ensuring that sufficiently large samples have been captured to provide meaningful analysis.

Alternatively, on-off studies alternate back and forth between the old and new systems being considered. Such studies seek to more closely approximate a control-experiment study under the assumption that the traffic patterns are related to each other on a day-to-day, week-to-week, or month-to-month basis much more closely than on a year-to-year basis. Such a study may activate the new ASCT system on alternating weeks and compare those samples against one another with little or no modification, since the first and second weeks of any given month are likely to be quite similar (excluding holidays which can be easily filtered out). The primary disadvantage of the on-off approach is the inability to detect long-term changes due to the treatment. Drivers exposed to alternating traffic control systems may become highly conservative and allocate significant extra time for their journeys to accommodate the uncertainty produced by the study. If a before-after approach had been taken instead, those same drivers may have converged to a more stable, less conservative pattern after an acclimation period of several weeks. Thus, an on-off approach may provide more statistically accurate results comparing the two (or more) systems under current conditions, but may not be able to account for changes to those conditions caused by the treatment.

The following table summarizes the advantages, disadvantages, and considerations for each of the study approaches.

Table 18. Advantages and Disadvantages of Study Approaches.
Study Approach Advantages Disadvantages Considerations
Simulation Control-Experiment
  • Direct control-experiment analysis
  • Easy to implement alternative plans, optimize control algorithms
  • No real-world impacts if negative outcomes are found
  • Difficult to account for variation in traffic demand and unusual circumstances
  • Over-calibration or poor calibration can lead to unrealistic results
  • Thorough and multi-faceted calibration and validation should be done to ensure that the underlying model is applicable
Field Studies: Before-After
  • Allows traffic patterns to "stabilize" over time in the new system
  • Doesn't create confusion due to switching back and forth between control schemes
  • Any external major changes from before to after must be accounted for
  • Travel changes from before to after must be accounted for
  • Ensure sufficient data collection, especially for pre-treatment condition (harder/impossible to get more of after the fact)
Field Studies: On-Off
  • Direct comparison of treatment and non-treatment options
  • Easier to ensure that sufficient data are collected for each scenario (simply rerun whichever needs more)
  • Not able to examine long-term changes in the system due to the ASCT
  • Consider the impact of frequent changes to the control system on driver behavior

Data Collection

To support holistic analyses of ASCT systems, high quality data must be collected. As will be detailed in the next section, a wide variety of performance measures have been used to explore the impacts of ASCTs. As a result, a wide variety of data sources have been used to support and produce those performance measures. Within traffic operations, data collection generally follows three patterns: fixed-sensor data, floating vehicle probe data, and trajectory data.

Fixed sensors, typically inductive loop sensors which are embedded in the pavement, provide spot-measurements and are the main data source used by ASCT systems to sense the presence of vehicles at intersections. Radar-based or camera-based sensor options are also commonly used.

Arrays of fixed sensors provide volumes (and possibly speeds) directly and some simple modeling techniques can estimate speeds, queue lengths for individual approaches or lanes, and traffic movements through each study intersection. Fixed sensors are located closely to the intersections they relate to—with advanced queue detectors sometimes placed several hundred feet upstream in any given direction. Thus, fixed sensors are unable to provide any information (other than, perhaps, average speed or travel time) for the segments between intersections. This can be a significant hurdle if there are driveways or access points between intersections where significant traffic enters and exits the roadway in locations where the sensors cannot account for them.

Even within fixed-sensor systems, there can be notable variation. Some intersections feature independent detection on every approach lane, while others aggregate data by movement. The level of aggregation may vary depending on whether the approach is the "major" or "minor" road. Many turning lanes feature upstream queue detectors, although some through- and right-turn lanes also have such advanced detection to monitor queuing activity. More rarely, "exit" detectors are placed on the outgoing legs of the intersection to capture departures from the intersection. These exit detectors can be extremely valuable for determining accurate turning movements and looking for spillback queue issues in highly-saturated or closely spaced intersection systems.

Traffic Ops Data Collection

Fixed-Sensor
Volumes and queues collected using inductive loop detectors, cameras, or radar sensors.

Floating Vehicle Probe
Travel times, speeds, stops, etc. based on uniquely identified vehicles (instrumented research vehicles, commercial fleets, or Wi-Fi/Bluetooth-tracked private vehicles).

Trajectory Data
High-resolution vehicle traces, usually produced by a simulation model.

The data from the signal control system itself can also be considered a fixed sensor. Modern signal controllers have mechanisms for producing log files which detail the actuations and control decisions which the algorithm selects. Incorporating controller information into analysis is necessary to identify how platoons of vehicles are interacting with signal phasing. Data from probe vehicles, unlike fixed sensors, have wider geographic flexibility and can cover interstitial areas between intersections. Vehicles with built-in GPS devices, or drivers using GPS-based applications on smartphones or dedicated navigation tools, can produce sample measures of traffic conditions along the roadway as they travel. Individual vehicles can also be traced using Wi-Fi or Bluetooth communications. Dedicated instrumented roadway vehicles often employ radar, Lidar, or camera technologies to observe conditions around the vehicle for a more complete assessment of traffic behavior. Floating probe vehicle data provides more continuous samples across an entire study segment, especially for areas with significant traffic from which to potentially sample. However, the additional options and resolution provided by probe vehicle data come with a price tag. Commercial aggregation firms collect and sell data, or sensors can be purchased and attached to vehicles or installed in managed fleets to collect data internally.

To gain a complete picture of driving behavior in a corridor, full trajectories can be collected. Unlike the data options noted above, full trajectory data are not sampling specific locations or a subset of vehicles. Instead, every vehicle's full path, including location, speed, and surrounding conditions, is measured. Some real-world options exist for collecting full trajectories (helicopter or drone-based photography, for example), but only small samples are generally possible due to cost. Trajectories are, however, produced automatically by simulation models. Within the modeling environment, every vehicle is updated at extremely high resolution (usually once every 1/10th of a second), and any vehicle-specific or environmental factors can be calculated and stored for later analysis. This provides the most flexibility in terms of analysis and opens the door to highly sophisticated performance measures. Table 19 indicates the types of measures which are produced or can be modeled or estimated by each data collection technique (also see Table 9 in the Performance Measures chapter for PMs related to signalized control).

Table 19. Data Collection Techniques.
Measures Fixed Floating Probe Trajectory
Volume Yes No Yes
Queues Maybe1 Yes Yes
Speed/Travel Time Maybe2 Yes Yes
Delay Estimated Estimated Yes
Stops Estimated Yes Yes
Arrivals Estimated Yes Yes
Progression No Yes Yes
1 If upstream queue detectors are present. [ Return to Table Note 1 ]
2 If radar- or camera-based speed measurements are used. [ Return to Table Note 2 ]

Other data types are also necessary for holistic analysis of adaptive signal systems. Staying within operations, multimodal data can provide a more complete picture of total delay and movements through a signalized corridor. Bicycle detectors, information from transit systems, or measurements of pedestrian activity can be critical for exploring total multimodal person-delay occurring at each intersection in a study area, rather than limiting analysis to vehicle-delay.

Signalized control plays a significant role in safety and the number of traffic incidents which occur within a corridor. Crash reports form the basis for understanding where crashes occur and what, if any, role traffic signals may have played in each event. Collecting those for before-after or on-off studies is critical to incorporating safety aspects into analysis.

A significant concern regarding data collection is sample size. Regardless of the type of data which is collected and the technology used to collect it, sufficient data must be collected to ensure that any analysis provides an accurate assessment of the performance of the adaptive signal system. The concerns here are not particular to adaptive signals; generally accepted practices regarding data significance should be used.

When looking at overall performance within a cost-benefit type framework, several distinct costs arise for adaptive signal systems. Typical costs for implementing and maintaining the physical hardware are present, as with other treatments. However, adaptive signal systems are highly software-dependent, and thus generally require licenses from the vendor in order to use the product and get updates and support as the algorithms of the ASCT system are improved. Additionally, using such software requires an investment in workforce training so that operating engineers have the requisite expertise to use the system, make modifications over time, and troubleshoot issues. Collecting the necessary data to consider these costs is important to evaluating ASCT systems.

ASC References

Atkins North America. (2016). Automated Traffic Signal Performance Measures Component Details, Atlanta, GA, obtained from: https://atspm.cflsmartroads.com/ATSPM/Images/ATSPM_Component_Details.pdf.

Benekohal, R., Garshasebi, B., Liu, X., & Jeon, H. (2018). Evaluation of Adaptive Signal Control Technology - Volume 2: Comparison of Base Conditions to First Year After Implementation, Report no: FHWA-ICT-18-005, Urbana-Champaign, IL, obtained from: https://www.ideals.illinois.edu/bitstream/handle/2142/99873/FHWA-ICT-18-005.pdf?sequence=2&isAllowed=y.

Day, C. et al. (2014). Performance Measures for Traffic Signal Systems: An Outcome-Oriented Approach, West Lafayette, Indiana: Purdue University, obtained from: https://docs.lib.purdue.edu/cgi/viewcontent.cgi?article=1002&context=jtrpaffdocs.

Fontaine, M., Ma, J., & Hu, J. (2015). Evaluation of the Virginia Department of Transportation Adaptive Signal Control Technology Pilot Project, Report No: VCTIR 15-R24, Charlottesville, VA: Virginia Department of Transportation, obtained from: http://www.virginiadot.org/vtrc/main/online_reports/pdf/15-r24.pdf.

Sharma, A., Hawkins, N., Knickerbocker, S., Poddar, S. & Shaw, J. (2018). Performance-Based Operations Assessment of Adaptive Control Implementation in Des Moines, Iowa, Ames, Iowa: Iowa State University.

Stevanovic, A. (2010). NCHRP Synthesis 403: Adaptive Traffic Control Systems: Domestic and Foreign State of Practice, Washington, D.C.: National Cooperative Highway Research Program, obtained from: https://www.nap.edu/read/14364/chapter/1.

Stevanovic, A. & Radivojevic. (2017). Framework for Quantitative Annual Evaluation of Traffic Signal Systems, Report no: TRB No 2619, pp. 20-35, Washington, D.C., obtained from: https://journals.sagepub.com/doi/10.3141/2619-03.

Taylor, M. & Mackey, J. (2018). Automated Traffic Signal Performance Measures, Presentation, Utah Department of Transportation Annual Conference.

Utah Department of Transportation. Automated Traffic Signal Performance Measures, obtained from: https://udottraffic.utah.gov/ATSPM/.

Connected Vehicle

Connected vehicles (CVs) are vehicles that communicate with equipped infrastructure and other connected vehicles while on the roadway. Often, connected vehicles use Dedicated Short Range Communication (DSRC), a two way, low latency, 5.9 GHz communication channel that is reserved for transportation purposes. However connected vehicles can use other communication channels, such as a cellular network. Connected vehicle technologies have a number of different applications, including improving vehicle safety, improving mobility, and reducing the environmental impacts of transportation. The following subsections contain best practices that relate specifically to evaluating the performance of connected vehicle technologies.

Evaluation Planning

The first set of best practices relate specifically to evaluation planning and experimental design.

Use of Traffic Simulation Models

Prior to deploying technology, it's important to understand whether or not the deployment will generate the amount of exposure required to be able to evaluate the impacts of the technology being deployed. The required exposure will depend on the goals of the evaluation and the experimental design being used, and can be estimated using a power analysis.

In a traditional deployment with vehicle-based technologies, it's fairly straightforward to calculate exposure to various types of events based on the location of the demonstration, the demographics of the drivers, and the expected driving mileage. In a CV deployment, however, exposure is based on a CV being within close proximity of another CV or connected infrastructure. The best way to estimate how frequently this will occur is to run traffic simulations based on the planned deployment levels using real-world traffic data from the deployment site.

Strategic Use of Recruitment

For example, select drivers who:

  • Take the same freeway during the morning commute
  • Routinely pass through a specific intersection
  • Are most likely to experience hazardous weather conditions on the road.

Simulations should take into account the following:

  • Demographics of the drivers being recruited
  • Vehicle types (e.g., passenger cars, taxis, trucks, transit buses)
  • Travel patterns of the drivers being recruited
  • The types of applications that are being deployed in the site

The results of the traffic simulation model will provide an estimate of how frequently connected vehicles will interact with other connected vehicles and equipped infrastructure in different types of driving scenarios (e.g., vehicle following, lane change, intersection crossing, etc.) in the deployment environment, and will allow the deployer to understand the impacts of changing different variables to optimize the experimental design (e.g., adding more vehicles, changing the recruitment strategy, etc.) (Barnard, 2017) (Smith & Razo, 2016).

In the case that running traffic simulation is not possible, it is still helpful to think strategically about how to maximize CV interactions in the environment. One way to do this is through the recruitment strategy. Also, important to keep in mind is that the rarer the types of events that the CV system is trying to address, the larger the deployment will need to be to collect a viable sample size of these events.

Within-Subject Experimental Design

There are large individual differences between individual drivers, so in evaluations of how humans interact with in-vehicle technologies, the most robust experimental design is a within-subjects design. This design compares each driver only to themselves both with, and without the vehicle technologies. This allows the evaluation to focus on a specific driver's changes in behavior and performance, rather than average changes across a population.

To conduct a within-subject's evaluation:

  • Assign an individual vehicle to a single participant
  • Instruct them not to let anyone else drive the vehicle during the deployment.

Of course, in the case that connected vehicle technology is being deployed on fleet vehicles or into participant's personal vehicles, this may not be possible. As an alternative, it is helpful to mark in the data when an individual participant is driving a vehicle so that the data can still be parsed by drivers.

Tracking Vehicles Using Anonymous IDs

Why Use Anonymous Vehicle IDs

1. To identify vehicles that are:

  • Not working properly
  • Transmitting bad data
  • Not getting sufficient exposure

2. To make before-after comparisons of individual vehicles

  • Design accounts for variability between vehicles
  • Design most likely to measure an effect, if it exists

In a CV deployment using DSRC technology, basic safety messages (BSMs) are used to communicate a vehicle's location and vehicle kinematics to other vehicles. In a true deployment, these BSMs are anonymized so that individual vehicles can't be identified. However, in any deployment of CV technology that is being used for research and evaluation, it is important to create an anonymized vehicle identifier.

First, vehicle identifiers can detect vehicles that are having problems, so that corrective action can be taken. Second, vehicle identifiers allow a comparison of a specific vehicle both before and after the CV technology is deployed. If the evaluator does not have the ability to track specific vehicles throughout the different phases of the deployment, the evaluation will need to combine the data from all vehicles, making it much less likely that an effect will be observed.

The Impact of "Invisible" (Not Connected) Vehicles

One of the greatest challenges of evaluating a CV deployment in a real-world environment is that not all of the vehicles in that environment will be equipped with connected vehicle technology. This means that the technology will only become active in the presence of other DSRC-equipped vehicles or infrastructure, and equipped vehicles will not be able to "see" all the other unequipped vehicles in the environment. This presents a number of challenges to conducting CV evaluations that can vary based on which CV applications are being deployed, the deployment rate in the site (% of equipped vehicles), and the goals for the evaluation.

When conducting a CV demonstration, the evaluation team should carefully think through how the combination of equipped and unequipped vehicles may impact their unique deployment and evaluation. Some vehicle performance metrics for CV demonstrations may not represent the actual driving scenario if an unequipped vehicle that is not represented in the data is present in the driving scenario. For example, metrics showing how far from the crosswalk a vehicle stopped after detecting a pedestrian in a signalized crosswalk may not be accurate if there is an unequipped lead vehicle between the CV and the crosswalk. The driver may be responding to the lead vehicle slowing in front of them and not to the pedestrian or the warning, and there will be no way to determine this from the data. One way to mitigate this problem is to install additional sensors to collect data on the presence of surrounding vehicles and to validate the CV data against the data from these sensors (this will be discussed further in the section, Use of Other (i.e., Non-CV System) Data).

Other considerations due to "invisible vehicles" are described below.

  • Users of the system (e.g., drivers, pedestrians) won't be able to create a mental model of how the system works because it won't work all the time. Users generally have no way of knowing which other vehicles or infrastructure in the environment are equipped or unequipped, so they won't be able to develop an expectation of when the system will be able to support them and when it won't.
  • Often in transportation research, performance metrics are normalized by things like miles or hours of driving. These normalization metrics become irrelevant with CV evaluations because the system is not activated all of the time. In order for these metrics to be useful, the data collection strategy must provide insight into exactly when the system was active (interacting) and when it was not compared to the entirety of the users' experience.
System Performance Testing/Validation

System Performance Data

  • Rate of false system activations
  • Rate of missed activations
  • Frequency of different types of system errors

To be able to evaluate a demonstration and properly interpret the results of the evaluation on a CV system, it is critical to have objective data (data from non-CV sources or data from controlled experiments) on the performance of the system prior to the demonstration. Each CV environment is unique, and the performance of a certain application is likely to vary based on the environment where it is deployed. Impacts of a deployed system are highly dependent on how well the applications are working, and without this context it will not be possible to interpret the results of the impact analysis.

Ideally, system performance should be carefully tested and known prior to the start of the formal deployment. In situations where this is not possible, it is important to collect data that can support system and application validation during the deployment so it can be measured in hindsight and factored into the impact evaluation results.

Data Collection

The following set of best practices pertain to data collection and management.

Expected Versus Actual Interactions

Once a deployment location, size, and recruitment strategy have been identified and a traffic simulation model has been used to estimate the CV interactions, it is helpful to track the actual vehicle-to-vehicle (V2V) and vehicle-to-infrastructure (V2I) interactions observed during the demonstration. First, this exercise is helpful for validating the interaction model and ensuring that the expected number of interactions is being met. In the case that there is a lower than expected CV system engagement, understanding the interactions that the site is generating is important to understanding the cause of the system outputs (e.g., are the CV applications just not activated, or are they not getting any opportunities to activate because interactions are lower than expected?).

Second, data about actual interactions experienced during the demonstration can be very valuable to the evaluation. In a traditional demonstration, some evaluation metrics are normalized by driving miles or driving time to indicate exposure to a certain stimulus (e.g., number of warnings per mile driven). In a CV demonstration, overall driving miles are not relevant because the CV system is only active in the presence of other connected vehicles or infrastructure. As a result, the number of V2V or V2I interactions can serve as a surrogate measure for exposure to qualify the frequency with which an event occurs.

If interactions cannot be quantified and tracked in real-time during the demonstration, it is recommended to calculate interactions post-hoc so that this metric can be considered in the evaluation activities.

Monitoring CV Systems/Applications in the Field

Once CV-equipped vehicles are in the field, it is recommended to monitor the vehicles to keep track of how frequently they are having CV interactions and how frequently the applications are engaging. Devices that are getting very few interactions or system alerts (relative to other study vehicles) may have system health issues, or may not be driving in the study area enough to generate sufficient data. Additionally, vehicles that are receiving a very large number of system alerts (relative to other study vehicles) may have an issue that is causing false alerts. Monitoring and replacing problematic or low interaction devices during the demonstration can improve the evaluation results by ensuring that the largest possible amount of valuable data are collected for the evaluation.

CV Application Logic

Once vehicles are deployed in a CV environment (particularly if thorough performance validation testing and application tuning were not completed prior to the demonstration), it may be tempting to make adjustments to the applications to fix problems or reduce the frequency of false alerts. If one of the goals of the study is to conduct a rigorous evaluation, it's not advised to make changes once the official experimental design period has started, since doing so will compromise the integrity of the evaluation. An exception would be a situation where there is any risk to driver safety; such cases must be addressed immediately.

If it is expected that the team will want to adjust the applications after they've been in the field, it is advised to build a tuning period into the experimental design and mark the data from the different phases accordingly so that the evaluator can account for changes made to the system during the demonstration.

Use of Other (i.e., Non-CV System) Data

CVs generate a considerable amount of data, however the data generated by the system will likely not be sufficient to accommodate all of the evaluation goals. Most likely additional data collection (e.g., external sensors on either the vehicles or the roadway, as well as supporting data from external sources, such as weather data) will be required. The data being collected as part of the deployment should be determined by assessing the evaluation goals, objectives, and associated performance metrics, rather than just relying on data that the system will produce.

Validating the System Performance of the CV Applications

As previously mentioned, it is critical to the evaluation to have an understanding of the performance of the CV system and applications in the actual deployed environment; that is, did the CV applications function as they were designed to function? If thorough pilot tests/system performance tests are not conducted prior to the start of the experimental design, data to support a system accuracy analysis should be collected as part of the deployment. If the technology does not result in the desired impacts, this may be due to a system performance issue, but without measuring system performance, it will not be possible to determine if this was a factor.

Data Organization and Indexing

Data collected as part of the demonstration should be organized and indexed in a way that suits the experimental design being used in the site evaluation. This organization may not be inherent in the standard CV data that the system produces, and therefore may need to be added or post-processed. For example, as discussed in the section, Tracking Vehicles Using Anonymous IDs, it is helpful to include vehicle identifiers with the data so that the evaluators can identify an individual subject or vehicle across different test phases. Additionally, all data should be indexed in a way that makes it easy to identify the data from different test groups that the evaluator is interested in (e.g., control/treatment, before/after, equipped/unequipped).

CV References

Barnard, Y. (2017). D5.4 Updated Version of the FESTA Handbook. Leeds, UK: FOT-Net Data. Last accessed: December 31, 2018.

Bezzina, D., & Sayer, J. (2015). Safety Pilot Model Deployment: Test Conductor Team Report, Report No. DOT-HS-812-171, Washington, D.C.: National Highway Traffic Safety Administration, obtained from: https://www.nhtsa.gov/sites/nhtsa.dot.gov/files/812171-safetypilotmodeldeploydeltestcondrtmrep.pdf.

Gay, K., & Kniss, V. (2015). Safety Pilot Model Deployment: Lessons Learned and Recommendations for Future Connected Vehicle Activities, Washington, D.C.: Intelligent Transportation Systems Joint Program Office, obtained from: https://rosap.ntl.bts.gov/view/dot/4361.

Pecheux, K., Kennedy, J. & Strathman, J. (2015). Evaluation of Transit Bus Turn Warning Systems for Pedestrians and Cyclists, Report No. 0084, Washington, D.C.: Federal Transit Administration, obtained from: https://www.transit.dot.gov/sites/fta.dot.gov/files/FTA_Report_No._0084.pdf.

Smith, S., & Razo, M. (2016). Using Traffic Microsimulation to Assess Deployment Strategies for the Connected Vehicle Safety Pilot. Journal of Intelligent Transportation Systems, 66-74.

Automated Vehicle

This section discusses considerations for evaluating projects that include automated vehicles (AVs).1 Automation can be used in a broad variety of applications and may be used to support a variety of transportation and societal goals. Systems demonstrated in pilots may differ in critical aspects to those which would ultimately be deployed or commercialized. This should be taken into consideration when designing technical performance and user acceptance metrics. The following section provides considerations for evaluation design of projects including AVs.

Rapidly Evolving Technologies

Automation technologies are rapidly evolving. Many systems deployed in a pilot test or demonstration are being continuously refined and updated, such that their fundamental capabilities could significantly differ by project conclusion. This situation is intensified for a project with a long lead time or planning stage. This can be challenging to account for in planning an evaluation. If the technology is not held constant, or "frozen" during the course of the project, it may be difficult to understand the meaning of results over time. Are changes in desired metrics due to increasing user acclimation to the technology, for example, or due to changes in the technology itself?

For most technology demonstrations, the vendor is required to "freeze" the technology for the duration of the demonstration; however, for technologies such as AV, this may not always be desirable. First, there could be safety implications. As Automated Driving System (ADS) developers learn more about the performance of the hardware or software in real-world settings, they are constantly making improvements to support safe operations. Failure to incorporate these improvements could lead to unsafe operations. Second, given the rapid pace of change in this industry, evaluation results that are based on a previous generation of the automation technology may be somewhat less valuable for knowledge-sharing with other potential deployers. The evaluation team will need to consider these factors against the need for reliable evaluation data.

Human Factors and User Acceptance

Automation is very new to most host communities and users. Strong favorability ratings, or conversely, concerns, may be heavily impacted by the novelty of the technology and not merely its performance. Potential users may be unable to forecast accurately the extent to which they would use a system that is still immature and wholly novel to them. Some efforts have been made to understand the relationship of usage intention to actual future use, but this is an area where further research is needed.

Many applications are intended for eventual operation without a driver or operator on board the vehicle. However, due to both current technological limitations and State or local requirements, most demonstrations today are still staffed by a driver or attendant. The presence of a human staffer on board, even if he or she is not actively driving the vehicle, is likely to significantly influence the perception of those who interact with the vehicle and the system. While there have been creative approaches to address this problem,2 it is difficult on the whole to mimic the experience of an unstaffed system while still having a staff person on board. Evaluation design should take this into consideration.

For projects with AV systems at SAE Levels 1-3, where driver supervision is required for some part of the driving task, the evaluation should include metrics regarding driver engagement, fatigue, and other human factors issues such as mode confusion. An additional area for Level 3 systems is driver re-engagement.

Institutional Issues and Internal Capacity Building

The transportation industry is at an early stage with regard to adoption of automation. Very few public-sector agencies have any experience with these technologies. A benefit of early engagement with new technology is the ability for an organization to identify institutional issues and to build organizational capacity.

Federal, State, and local requirements may not be clearly understood at project outset, or they may change during the course of the project. Even for locations with clear AV-specific requirements in place, there are many unknowns. Procurement processes or labor issues could delay or even prevent projects. Identifying these issues through implementation of a demonstration can help an organization determine local policy positions and appropriate mitigations to achieve goals.

Similarly, there is a learning curve for agency staff in understanding what these technologies can and cannot do today, and what they might do in the future. Agencies may also need to define new ways of partnering with the private sector, as there are many new entrants to the transportation industry and accepted norms may no longer apply.

If the deployment's goals include identification of institutional issues and internal capacity building, the evaluation design should consider how to meaningfully incorporate these elements.

Identifying Critical Stakeholders

Introducing automation into motor vehicles may change the dynamic between existing stakeholders, and could elevate the importance of those previously less closely involved. For example, given the role of States in licensing drivers, Departments of Motor Vehicles have begun to engage with their State DOT counterparts in new ways, as States grapple with the question of what it means for the ADS to be the vehicle operator. These types of stakeholders should be identified early on so that impacts on them can be measured.

Relationships with Private Sector Partners

Automation research, development, and commercialization is an extremely competitive industry. The sector is also very fluid, with new companies forming and dissolving quickly, and key staff moving between companies relatively quickly. Transportation agencies may find that their relationships with private sector partners are somewhat different than for traditional transportation applications.

Private sector partners may have serious concerns about sharing or publicizing information which would be included in a standard government-sponsored evaluation, such as information about system performance and user acceptance. This concern can extend to the choice of metrics and survey design. It is therefore critical to define evaluation requirements, and the necessary data, clearly in the earliest stages of planning. Negotiation over the characterization of performance and user experience may also be required.

In order to assess desired metrics, data access is critical. Some vehicle and system data may be wholly within the control of the private sector partner in the absence of other contractual arrangements. Others can be externally measured, for example by use of infrastructure-based cameras. The evaluation plan should consider next-best alternatives if the desired data cannot be obtained.

Data Analysis and Management

The quantity of data collected by an automated vehicle is enormous. Project teams could be easily overwhelmed by available data, in terms of both data storage and analysis. While it is generally helpful to identify core data elements early on and disregard superfluous data, during the course of the evaluation new areas of interest may come to light. Each deployment will need to consider the preferred balance between manageability and the ability to explore previously unidentified questions. Below are key types of data to consider including an evaluation of a project involving AVs; some data may not be applicable depending on the applications deployed.

Operational Design Domain (ODD)
  • Describe the specific conditions under which an ADS or feature is intended to function.
  • Identify roadway and roadside features important for operations within the ODD.
  • Describe the data that will be used to verify whether the AV is properly operating within its ODD.
  • Describe metrics and indicators that quantify the level of safety within the ODD.
Vehicle Operational Data
  • Identify data that will be collected in crash, near-miss, malfunction, and degradation situations (e.g., Light Detection and Ranging, radar, Event Data Recorder or some other type of data acquisition system).
  • Identify what data are collected and whether the information is documented when crash mitigation technologies are triggered.
  • Describe the information to be collected in circumstances where the AV goes back to the minimal risk condition or fallback situation. Describe how disengagement events will be recorded and stored. Describe how the instruments used for data onboard the vehicle will be documented and maintained over time (i.e., AV maintenance, sensor calibrations and equipment check documentation).
  • Describe which curbside/infrastructure elements will be recorded and documented at the locations where pick-up and drop-off will occur.
Data Processing
  • Describe how any AV sensor data will be processed.
  • Describe how the AV software updates will be documented, and the data output for that information.
  • Describe the data output of the simulation, test track, and/or on-road test that will affirm the effectiveness of the solution(s) to respond to the research questions.
Perceived Safety
  • Describe the process for collecting the experience of those who interact with the AVs.

AV References

Campbell, J. L., Brown, J. L., Graving, J. S., Richard, C. M., Lichty, M. G., Bacon, L. P., …& Sanquist, T. (2018). Human Factors Design Guidance For Level 2 And Level 3 Automated Driving Concepts, Report No. DOT-HS-812-555, Washington, D.C.: National Highway Traffic Safety Administration, obtained from: https://www.nhtsa.gov/sites/nhtsa.dot.gov/files/documents/13494_812555_l2l3automationhfguidance.pdf

S. Nordhoff, J. de Winter, R. Madigan, N. Merat, B. van Arem, & R. Happee. (2018). User acceptance of automated shuttles in Berlin-Schöneberg: A questionnaire study, Transportation Research Part F, vol. 58, pp. 843-854, Washington, D.C., obtained from: https://www.sciencedirect.com/science/article/abs/pii/S1369847817305478.

Smith et al. (2018). Benefits Estimation Model for Automated Vehicle Operations: Phase II Final Report. Report No: FHWA-JPO-18-636, Washington, D.C.: Intelligent Transportation Systems Joint Program Office.

Thorn, Eric, Kimmel, Shawn, & Chaka, Michelle (2018). A Framework for Automated Driving System Testable Cases and Scenarios, Report No. DOT HS 812 623, Washington, D.C.: National Highway Traffic Safety Administration, obtained from: https://www.nhtsa.gov/sites/nhtsa.dot.gov/files/documents/13882-automateddrivingsystems_092618_v1a_tag.pdf.

Zmud, Johanna & Sener, Ipek. (2017). Towards an Understanding of the Travel Behavior Impact of Autonomous Vehicles, Washington, D.C.: Transportation Research Procedia 25 (2017) 2500-2519, obtained from: https://www.sciencedirect.com/science/article/pii/S2352146517305884.

1 This section discusses projects which include at least one SAE Level 1 or higher application. See: SAE Taxonomy and Definitions for Terms Related to Driving Automation Systems for On-Road Motor Vehicles J3016_201806, https://www.sae.org/news/2019/01/sae-updates-j3016-automated-driving-graphic [ Return to Note 1 ]

2 For example, a "Wizard of Oz" car may have an operator in the back seat using a specially-designed control system, while passengers ride in the front, thereby approximating the experience of driverless operation. [ Return to Note 2 ]

Office of Operations