Weather Applications and Products Enabled Through Vehicle Infrastructure Integration (VII)

8. VII Weather Data Processing

THE WEATHER DATA TRANSLATOR: FILTERING, QUALITY CHECKING, AND PROCESSING VII DATA

VII-enabled data are complex and pose a significant challenge, particularly when it comes to measuring or deriving weather and road condition data. Data issues include:

Data volume
Timeliness
Quality
Representativeness
Format

These issues are not dissimilar to those associated with other fixed meteorological datasets, but the complexities are compounded by the fact that end users will have little knowledge about the source of the data. The NWS, FAA and other traditional providers of weather data follow stringent guidelines for instrumentation accuracy, precision, and siting. Other non-standard sources of meteorological data (e.g., Mesowest, local mesonetworks, school networks, hydrological networks, etc.) have existed for years, but have not been fully accepted or adopted by many end users including meteorological service providers because of ongoing concerns over data quality.

There is a well-founded belief in the meteorological community that "bad data is worse than no data." End users have demonstrated that they need to be very comfortable with data quality before they will utilize new datasets. What does this mean for VII-enabled weather and road condition data? It is clear that it will take a significant amount of research and outreach to demonstrate that VII-enabled data are of sufficient quality to be used for operational purposes. The uncertainties and complexity associated with raw VII-enabled data (unprocessed data from vehicles) will likely deter many end users from using the data. It is likely that most end users will not be able to handle the shear volume of VII-enabled data, let alone deal with data quality questions.

It is anticipated that many, if not most, users of VII-enabled weather data will require processed data. In this context, processed data means vehicle data that are extracted from the VII network, quality checked, and disseminated to active data subscribers. It is also likely that, due to the volume of data, many users will prefer statistically derived data representing specific geographical areas or times.

8.1 Weather Data Translator (WDT)

The concept of a VII weather and road condition data processor (a.k.a Weather Data Translator) has been discussed in both Clarus and VII Program meetings over the last year. In the view of the authors of this document, not only is the concept sound, but the need for such a function is critical.

In a fully functional VII-enabled environment, millions of vehicles will be acting as probes and will continuously send reports to the VII data network. Data subscribers will obtain the data they desire and use them for various applications. For many potential end users of the data, the volume of data will be overwhelming. Applications (middleware) that will facilitate the use of VII data will need to be implemented. Without such a function, the feasibility of utilizing vehicle probe data will be lower and there will be substantially more risk in its use.

The function of the proposed WDT is to extract data elements needed to derive weather and road condition information from the VII data network, filter the data to remove samples that are likely to be unrepresentative, quality check the data utilizing other local surface observations and ancillary datasets, generate statistical output for specific areas and time periods, and disseminate the quality-checked and statistically processed data to data subscribers, which may include other data processing and dissemination systems such as the Clarus System.

A conceptual illustration of the primary processing components of the proposed WDT is shown in Figure 8.1. Data from VII-enabled vehicles is communicated to the RSEs when the vehicles are within range of the receivers. The RSEs are connected to the VII telecommunications network where most VII data will flow.

The proposed WDT will include a data parser function that will extract weather and road condition relevant vehicle probe data from the VII network. The data elements selected for extraction will be determined by research results and feedback from the stakeholders. Data elements could be added or subtracted as needs vary. The data flowing out of the data parser is still considered 'raw' as it has not been processed in any way.

Data filtering algorithms could be applied to chosen data elements to remove data that are not likely to be representative of the true conditions. For example, research conducted by Mitretek (13) indicated that, in general, outside air temperature data are not representative of the true ambient conditions unless the vehicle speed is at least 25 mph. It was also found that the speed threshold is highly dependent on the location of the temperature sensors. Temperature sensors in the front bumper were more representative on average than sensors in other locations (e.g., under the hood). A test could be applied to the vehicle data to throw out all outside air temperature data measured when the vehicle speed was less than 25 mph. If information on the make and model of vehicle were available, the speed threshold could be specific to the vehicle (assuming that the vehicle information could be used to determine the location of the air temperature sensor).

Filters could also be applied to data sensed at particular locations that are known to generate errors (e.g., data measurements from inside tunnels). The process of deciding when and how to filter data will need to be done with great care, as one would not want to remove data that may have some use.

A benefit of filtering data that is considered unrepresentative is that the filtering procedure will reduce the amount of data that will need to undergo more complex and, in some cases, computationally intensive quality checking procedures. The WDT processor will need to be sufficiently capable to parse, filter, and quality check the data with minimal latency. Estimates of the computational requirements for the WDT are discussed in section 8.2.

Data that 'pass' the data filtering step will be quality checked where appropriate. Quality checking tests will include many of the common tests that are applied to surface weather data and more complex tests to handle road condition data. It is proposed that the quality checking tests being developed for the Clarus system be considered for the WDT for like data elements with appropriate modifications to deal with mobile data issues and idiosyncrasies. Ancillary data will be required to conduct many of the quality tests. Ancillary data will likely include surface weather observations and analyses, satellite, radar, and climatological data, and model output statistics. Data quality flags will be applied to the raw vehicle data so that data subscribers will have the flexibility of utilizing the raw data or taking advantage of the quality checking flags. It may not be possible or appropriate to quality check many of the data elements. For example, ABS status indicates whether the ABS is activated or not. There may not be an appropriate quality checking test that can be applied to these data within the WDT. How the data are ultimately utilized by downstream applications will be determined by data subscribers.

One branch of the quality checked data will flow (stream) to the output queue to minimize data latency. A second branch (a subset of the full dataset) will be cached and processed to generate statistical values for given locations (grid cell and/or point) and time periods. The statistical processing should result in a more representative sample and reduce the overall data load for users that cannot handle or do not need the streaming data from individual vehicles. It is envisioned that many downstream applications will only need data on a regular grid or data representing specific time periods.

FIGURE 8.1. Conceptual illustration of the VII Weather Data Translator (WDT).

Figure 8.1 Conceptual illustration of the VII Weather Data Translator (WDT).

Research will be required to fully design, develop, and test the statistical processing component of the WDT. As a starting point, it is proposed that a regular grid with 2 km spacing be overlaid on the transportation network. Vehicle data that fall within the grid cells will accumulate over a period of 5 to 15 minutes and then be statistically processed. Data that can be arithmetically combined (e.g., air temperature, pressure) will be processed in that manner while system status data (e.g., wiper status) can be processed to generate information on event density, such as the number of events per grid cell per time period.

Output from the statistical data stream should be processed on a fixed temporal cycle and will need to include metrics that indicate the average, median, number of samples in the calculation, standard deviation, etc. The resulting capability, like the remainder of the WDT will need to be flexible and extensible to adapt to the changing VII environment.

It is strongly recommended that the methods and techniques utilized by the WDT remain non-proprietary and open for review and discussion. It is very likely that end users will be unwilling to utilize the data if there is a lack of knowledge or understanding of the processing techniques. The WDT should be designed to provide a single national conduit for VII-related weather and road condition data. This concept does not rule out a distributed computational capability, but, like the Clarus System, end users will likely demand a single interface for the data.

8.2 WDT Processing Requirements

The shear volume of data flowing through the VII system will require a network with substantial bandwidth and computational capacity to ensure that latency will not become a barrier to its use. Like the Internet, the technical capabilities, number of users, and applications of the system will expand with time. The design of the WDT will have to take into account the evolution of the VII network and be extensible to handle growing needs. Computer and network capabilities have consistently expanded over the last twenty years and this trend is expected to continue. The relatively slow implementation and adoption of VII across the nation will allow both VII data processing hardware and software systems to be adapted as the system evolves.

In this section, an attempt is made to estimate the computation requirements of the WDT. Several simplifying assumptions are made to constrain the problem including the following:

Input and output streams are not compressed.
Input and output record size of 40 bytes and 50 bytes, respectively.
Data archive files on disk are not compressed.
Data flowing into the WDT has been parsed to include only weather-related data elements. See Appendix C for a list of elements used in this exercise.
Only weather-related data elements are processed.
Statistical processing is done on a 2 km X 2 km grid covering the contiguous U.S. One representative value for each grid cell and each weather-related data element is computed every 15 minutes.
WDT output includes both raw vehicle data with quality checking flags and statistically processed gridded data.
Ancillary data used in statistical processing includes:
- Surface observations
- Radar data
- Satellite data
- Monthly climatology
- Model output statistics
Five basic quality checking tests are performed on each data element:
- Format
- Outlier
- Bounds
- Climate
- Barnes spatial
Periodic snapshots from 1 million vehicles are processed.
Rural data rate – one record every 20 seconds from each vehicle.
Urban data rate – one record every 4 seconds from each vehicle.

Given these assumptions, for periodic data coming from 1 million vehicles operating in a rural area, a computer node with two 3 gigahertz (GHz) dual core processors with 4 megabytes (MB) of cache, 6 gigabytes (GB) of memory, 600 GB of disk space, and two 100 megabit (Mbit) Ethernet cards would be required. The cost of this hardware would be about $6000. Data input into the WDT would average 2 megabytes per second (MB/s), and quality checked output would be on the order of 2.5 MB/s. Gridded output would be approximately 351 MB per hour. It would require 432 GB of disk space to keep a two-day archive of weather-related vehicle data including quality checking flags; a two-day archive of gridded data would require 16.8 GB of disk space.

To process urban data from 1 million vehicles, the minimal hardware requirements would include a computer node complete with two 3 GHz dual core processors with 4 MB of cache, 16 GB of memory, 2.5 terabytes (TB) of disk space, and three to four 100 Mbit Ethernet cards. The input data rate would average 10 MB/s, while quality checked output data would be roughly 12.5 MB/s. The amount of gridded data generated every hour would remain the same at 351 MB; however, it would require 11.25 GB of memory to handle the data processing needs compared to 2.25 GB for the rural case. The hardware cost for a system with these characteristics is estimated at $9000.

Ancillary data used in the quality checking process would require a single node consisting of two 3 GHz dual core processors with 4 MB of cache, 16 GB of memory, 1.5 TB of storage capacity, and a 100 Mbit Ethernet card. Presently, the approximate price of this system would be about $8000.

The estimated hardware specifications, data rates, and storage requirements are supplied here in an effort to give the reader a sense of what would be needed to effectively process weather-related VII-enabled data using a WDT. These estimates are based on several assumptions. Although all of these assumptions have an impact on the estimates, two are very central in terms of the amount of data processed in the WDT. The first is that data originate from 1 million vehicles in rural or urban environments. It should be noted that in the U.S. there are roughly 245 million registered vehicles¹⁰. As VII-related technologies are deployed and implemented, it is clear that the data flowing through the VII network will originate from more than 1 million vehicles. The second is that the WDT input data will include only weather-related elements, which will not be the case. The WDT will have to parse raw records (snapshots) before doing any quality checking or statistical processing. This would mean that the size of input data records would be significantly larger. Increases in data volume can be handled by adding additional nodes.

The amount of data produced by vehicles in a mature VII environment will be considerable; however, a gradual deployment of VII is anticipated. As a result, computer storage capacity, memory, and processing speeds will likely be able to keep pace with the growth and development of VII. The main factors that may constrain the use of VII-enabled data are network and device I/O (input/output). Testing using simulated data from 10 million vehicles revealed that the time needed for disk I/O (roughly 7-8 minutes) was 3 to 4 times that required for gridded data processing and quality control. As more and more vehicle data are generated, the capacity needed to acquire, distribute, read, and write data will increase.

Key Points:

The quantity of data that will result from VII deployment and implementation will be immense; in addition, the data quality will vary considerably. These factors could potentially overwhelm many prospective end users.

Techniques and methods that will facilitate the use of VII data should be explored and implemented. One such application that has been proposed herein is the Weather Data Translator, an algorithm-based software application designed to extract weather-related data elements from the VII data stream, remove unrepresentative elements, quality check data, generate statistical output, and disseminate data to end users or other data systems such as Clarus. End users that desire raw, unprocessed probe data would still be able to access those data.

WDT processed data will increase the feasibility of utilizing vehicle probe data as well as reduce the risks and uncertainties associated with using these data.

Developing a WDT to process VII-enabled data is viable using present day computational capabilities.

¹⁰ http://www.bts.gov (Bureau of Transportation Statistics)

Previous | Table of Contents | Next