Office of Operations
21st Century Operations Using 21st Century Technologies

Behavioral/Agent-Based Supply Chain Modeling Research Synthesis and Guide

CHAPTER 3. FREIGHT MODEL DATA AND RESOURCES

Acquiring the data and finding the resources required to support a behavioral supply chain model is often challenging, given the privacy and confidentiality issues surrounding supply chain data. This chapter describes the data required to support such a model. It also describes the data and resources assessment for model development.

COMMON DATASETS

Many data sources exist for estimating, calibrating, validating, and forecasting a freight modeling system. Advanced freight travel demand modeling in the United States often uses publicly available data sources to model freight movements. Publicly available data (at no cost) will require some level of effort to process and clean before it is available for use. This section describes common data sources used by public agencies to build advanced freight travel demand models.

Commodity Flow Data

Many models rely on Freight Analysis Framework (FAF) data as input. FAF is a publicly available commodity flow dataset used in supply chain models. The FAF integrates data from several sources to create a comprehensive picture of freight movement among states and major metropolitan areas by all modes of transportation. With data from the 2012 Commodity Flow Survey (CFS) and additional sources, FAF version 4 (FAF4) provides estimates for tonnage, value, and domestic ton-miles by region of origin and destination, commodity type, and mode for 2015 (the most recent year) and forecasts through 2045. Also included are state-to-state flows, summary statistics, and flows by truck assigned to the highway network for 2012 and 2040. FAF flows are also used as control totals for freight moving into and out of the region being modeled and for projected freight movements to exterior zones outside of the modeling region.

The structure of the FAF consists of 132 CFS regions (or FAF zones) divided in the following subsets: metropolitan area determined regions; regions representing a State’s territory outside metropolitan regions; and regions identified as entire states, within which no FAF metropolitan regions exist. Metropolitan regions do not cross State boundaries. Eight international trade regions model U.S. exports and imports. Figure 10 and Figure 11 show FAF domestic and international zones.

Map of FAF zones in the U.S. Each zone is colored differently for easier visibility and the colors do not have any other meaning.

Figure 10. FAF Domestic Zones.
Source: (Federal Highway Administration, 2017)
Note that Hawaii and Alaska are included as FAF zones, but are not shown on this map.

Map of international FAF zones in the world. Each zone is a combination of countries or one country (e.g. Canada is one FAF zone and Mexico is one FAF zone). Zones are colored differently for easier visibility and do not have any other meaning.

Figure 11. FAF International Zones.
Source: (Federal Highway Administration, 2017)

The FAF4 has several data products, including a regional database and network database with highway flow assignment. The principal dimensions to the flow matrix are origin, destination, commodity, and mode. The 2012 freight flow matrix is used as the starting point for future-year forecasts, projecting volumes out to 2045. The FAF4 makes extensive use of the CFS data, but also relies on other data sources. FAF4 reports annual tonnage and dollar-valued freight flows using the 43, two-digit Standard Classification of Transported Goods (SCTG) commodity classes used by the 2012 CFS. FAF4 modes include truck, rail, water, air, multiple models, pipeline, and unknown.

Employment and Establishment Data

County Business Patterns (CBP) data contains the number of establishments in each size category, defined by industry and number of employees for each county in the United States. Industry is defined based on the North American Industrial Classification System (NAICS) six-digit classifications. The dataset is an annual series that provides county-level economic data, by industry type. This series includes the number of establishments, employment, and annual payroll. CBP data have been used to develop advanced freight models. The data are used to synthesize establishments in the supply chain model before allocating to analysis zones and selecting supplier/buyer pairs. CBP data does not contain foreign employment data and foreign firms need to be represented in the model. The primary objective of including foreign establishments in the model is to ensure that international flows between the region and foreign countries can be allocated to either buyers or supplier firms at the foreign country end.

Another publicly available data is the Longitudinal Employer-Household Dynamics (LEHD) which is a national longitudinal job frame that combines data from State and federal sources to create a linked employer-employee dataset. These data are collated by the U.S. Census Bureau and cover approximately 90% of employed persons. LEHD data can be used to disaggregate the business locations for the regions of interest from the counties in the CBP data to the more detailed Traffic Analysis Zones (TAZs) used in the model TAZ system.

LEHD and CBP data provide summaries of establishment-based employment, aggregated by U.S. county and organized by NAICS industry codes and establishment-size groupings. Like the commercial datasets, CBP and LEHD do not cover agriculture, construction, and public administration. Due to this lack of coverage, State and metropolitan transportation planning agencies in the United States typically supplement these data using State or local employment estimates, filling in missing sectoral employment and reconciling known local discrepancies.

For instance, agriculture data on farms—by size and sales—can be derived from U.S. Department of Agriculture (USDA) Census of Agriculture data to provide supplemental data for understanding agricultural production locations. Agricultural establishments and employment are not well represented in CBP and LEHD data, and synthetic farms can be generated from other available data sources. Various tables from the USDA Census of Agriculture can be used to develop number of farms by county and NAICS industry codes to append to the CBP table to cover the missing agricultural industry establishment data. Employment on military bases is also not collected as part of the Economic Census, but it can be obtained on a case-by-case basis.

Industry Economic Accounts Data

The U.S. Bureau of Economic Analysis (BEA) produces Input-Output (IO) "Make" and "Use" tables that report the value of goods consumed by each buyer industry. This can be used to identify the most important consumed commodities (by dollar value) and their associated supplier industries. These tables show production relationships among nearly 400 industries and commodities. Table 2 is an example of the information content of a "Use" table. These tables can also be used to spatially apportion commodity flows by commodity type and industry.

DATA AND RESOURCE ASSESSMENT

There are multiple potential data sources used by agencies to estimate, calibrate, and validate the forecasting of a freight modeling system. Table 3 summarizes primary data sources used for behavioral supply chain freight models and includes details on each data source. The table does not include observed data (e.g., truck counts, Weigh-in-Motion [WIM] data) or local survey data available from local agencies.

Table 2. Example IO Accounts "USE" Table (Sample Data View).
Commodity Commodity Description Industry Industry Description Producer Value Purchaser Value
1111A0 Oilseed farming 1111A0 Oilseed farming 1025.2 1137.6
325320 Agricultural chemical manufacturing 1111A0 Oilseed farming 508.4 702.9
324110 Petroleum refineries 1111A0 Oilseed farming 413.4 462.4
1111B0 Grain farming 1111A0 Oilseed farming 320.4 320.4
325310 Fertilizer manufacturing 1111A0 Oilseed farming 269.8 316.6
212100 Coal mining 212100 Coal mining 1199.4 1970.7
333120 Construction machinery manufacturing 212100 Coal mining 628.7 760.7

Source: (United States Bureau of Economic Analysis, 2017)

Table 3. Primary Data Sources by Modeling Needs and Availability.
Data Source Availability (latest available) Spatial Temporal Modes Industry Detail Commodity Code Model Inputs For Model Estimation For Model Calibration For Model Validation
CBP Public, 2014 County Annual N/A Two to Six-digit NAICS codes N/A
BEA IO Accounts Public, 20151 National Annual N/A Two to six-digit NAICS codes N/A
FAF Public, 2015 FAF Zone Annual2 All modes N/A Two-digit SCTG Commodities
CFS Public, 2012 BEA Zone Every five years All modes N/A Two-digit SCTG Commodities
Vehicle Inventory and Use Survey (VIUS) Public, 2002 State Every five years Truck N/A Two-digit VIUS Commodities
TRANSEARCH Private, 2015 County Annual All modes N/A Four-digit Standard Transportation Commodity Code (STCC) Commodities
Surface Transportation Board Waybill Public, 2014 BEA Zones Annual Rail N/A Four-digit STCC Commodities
T-100 Public, 20163 Airport Annual Air N/A N/A
Port Import/ Export Reporting Service (PIERS) Private, 2015 Port Annual Water N/A Two-digit Harmonized System Commodities
ATRI Private, 2017 Truck O-D Daily Truck N/A N/A
National Transportation Atlas Database (NTAD) Public, 2015 Facility Location Annual N/A N/A N/A

1 The latest detailed (by six-digit NAICS) Input-Output table available is for 2007. [Return to Table Note 1]
2 Major updates to the FAF data are performed using the CFS data (every five years) and the latest is available for 2012. [Return to Table Note 2]
3 Private version of the waybill data includes more coverage and is often included in the TRANSEARCH data. [Return to Table Note 3]

Data Available for Model Inputs

Behavioral supply chain freight models often use the following six types of model input data:

  • Zone systems for behavioral supply chain models are tiered so that the model can operate at a national scale (with limited international zones), a regional or statewide scale, and at a Transportation Analysis Zone scale. The Transportation Analysis Zone system represents the study area of interest, the regional or statewide scale represents less detail in adjacent States (often counties), and the national scale represents states or metropolitan regions in the remainder of the United States.
  • Network systems represent multimodal networks supporting the movement of goods. Typically, modal networks include highway and rail; more advanced models also include water and air. Pipeline networks are being developed by RSG as part of the development of a behavioral national freight supply chain model for Federal Highway Administration (FHWA) and may be available in the future for others to include. National Transportation Atlas Database is a national source of modal network data and is typically combined with local network data sources.
  • Employment data are developed at the Transportation Analysis Zone level using locally sourced employment datasets, often derived from the quarterly census of employment and wages. In addition, the County Business Pattern offers marginal distributions of employment by size and industry at the county level
  • Transfer facilities include intermodal terminals, warehouses, and distribution and consolidation centers. Data for transfer facilities location can be found in the National Transportation Atlas Database merged with data on employment from the Transportation Analysis Zone-level employment dataset.
  • Economic data represent the value of commodities exchanged between industries, also called IO Make and Use Tables. Economic growth rates are also required for forecast models.
  • Freight flows are developed from the Commodity Flow Survey and products like the FAF provide a useful processed and cleaned dataset of freight flows sorted by mode at the national scale. These data must first be disaggregated to the local level. Freight flows are a primary input to most of the behavioral supply chain models—except in Chicago, Illinois, where the procurement market model produces freight flows instead of allocating freight flows to a smaller geography.
  • Freight Surveys can include commodity flow surveys, establishment surveys, truck diary surveys, vehicle use surveys to estimate parameters for mode, shipment sizes, distribution channels and buyer-supplier matching.
  • Truck Global Positioning System (GPS) Data is used to estimate parameters for truck time of day models and vehicle origin-destination patterns.

Freight flows have been typically included as an input in the models, since there are datasets (i.e., Freight Analysis Framework) that are publicly available. The Freight Analysis Framework and other freight flow datasets provide a version of the future based on a specific economic forecast. Including a procurement market model that produces freight flow forecasts based on economic and infrastructure forecasts in the freight model introduces more transparency and sensitivity into the freight forecasting process.

Availability of Data for Estimating Model Parameters

National Data

Data for estimating behavioral supply chain freight model parameters requires disaggregate data, which is difficult to obtain. Data from national, State, or regional surveys is difficult and costly to collect. Thus, these data are collected infrequently or with small sample sizes. National surveys commonly used for model estimation by agencies include the following:

  • CFS 1 is collected every five years (1997, 2002, 2007, 2012) and includes a large sample size. Unfortunately, these data are not available in disaggregate form and are not useful for estimating model parameters (the 2012 CFS Public Use Microdata [PUM] 2 has seen explored as an option for disaggregated data, but there are issues with data suppression).
  • The Freight Activity Microsimulation Estimator survey 3 (2009–11, in three waves) is an establishment survey with a small sample, but it contains information on transfer facilities, mode, and commodity that have supported estimation of distribution channel models.
  • The Vehicle Inventory and Use Survey 4 was last collected in 2002, but is still being used as the best source for truck payload factors.
Establishment Surveys

Several states and regional transportation agencies have conducted establishment surveys, but only the establishment surveys that are combined with commercial vehicle diary surveys can be used effectively to estimate model parameters for behavioral supply chain models. The following agencies contacted as part of this synthesis have used these types of surveys:

  • The Ohio statewide survey (2004).
  • Five regional surveys conducted in Texas (2001–2006).
  • The Maricopa Association of Governments (MAG) regional survey (2016).
  • The Portland Metro regional survey (2016).

Both the MAG and Portland surveys employed smartphone mobile applications to collect data, which provided more detailed and accurate truck travel data. Challenges around recruitment of establishments and drivers to participate in these surveys continue, which is the primary reason for high survey costs.

GPS Data

Passively collected GPS data offer a partial solution to the challenge of collecting data on commercial vehicles. GPS data typically includes data on travel time, origin-destination, and time of travel. Private vendors (e.g., ATRI, Streetlight) offer large samples of GPS data with these data. Also, private vendors (i.e., EROADS, INRIX) provide additional attributes on commercial vehicle travel, such as truck type, commodity or industry group, and weight. Private firms also collect their own data to monitor fleets, and transportation agencies can request these data. Many private firms will not share their proprietary data, but sharing these data offers a low-cost solution to the commercial vehicle data challenges; these data also contain additional attributes over the larger samples provided by GPS data vendors.

Data for Model Calibration and Validation

Travel demand modeling best practice includes selecting different data sources for model calibration and validation than those used in model estimation. This practice has not always been possible given limited data availability for the development of behavioral supply chain freight models. Available data sources identified for model calibration and validation typically fall into five categories:

  • Freight Surveys are used for shipment sizes, distribution channels and freight flows. These are typically collected locally by the agency developing the behavioral supply chain model, but could also be conducted as national surveys.
  • Freight Flow Data are used to compare the distribution of shipment sizes by commodity and modal freight flows.
  • Truck GPS Data are used to compare truck trip distributions, time of day, and volumes and is available through private vendors for a specific State or region.
  • Weight Data are used to adjust vehicle loading factors used to convert shipment tonnages to truck trips.
  • Modal Volumes are used to compare observed and modeled volumes. These are used to develop calibration weights for the mode and transfers model and the estimation of import and export volumes for each port.
Office of Operations