Office of Operations
21st Century Operations Using 21st Century Technologies

Mid-America Regional Council Pilot of the Data Business Plan for State and Local Departments of Transportation: Data Business Plan

Appendix H. External Data Sharing Case Studies

This appendix explains the purpose and benefits of data sharing, particularly in an open data platform. Several data format options are presented, followed by an outline of different types of portals, which can be used to publish open data. Resources for national guide for establishing open data policies and portals are available for the public to use. Several examples of State and local best practices are provided, along with case studies where multiple transportation agencies have engaged in data-sharing activities, focused on volume and speed data. In most cases, the agency in charge makes the data available for public access via Web tools after performing necessary processes. For each example, resources are provided for more information.

Purpose, Benefits and Common Platforms for Open Data

Open Knowledge International published the Open Data Handbook,5 which outlines the legal, social, and technical aspects of open data. This handbook can be used as a reference by anyone who is seeking to open up data. Government is one of the types of organizations, which collect a broad range of different types of data to perform their tasks. The centrality of the data that it collects and the laws surrounding it being open to public makes it a largely untapped resource. The handbook lists several areas where open government data has the potential to create value, either for government itself, or other groups of people and organizations, namely:

  • Transparency and democratic control.
  • Participation.
  • Self-empowerment.
  • Improved or new private products and services.
  • Innovation.
  • Improved efficiency of government services.
  • Improved effectiveness of government services.
  • Impact measurement of policies.
  • New knowledge from combined data sources and patterns in large data volumes.

In order for data to be considered "open data", the file formats they are published in must include the specifications for the software for anyone to reuse without legal, financial, or technological restrictions. Open file formats allow developers to produce software packages and applications using these formats. The downside of using proprietary file formats and not publishing the format specification is creating dependence on third-party software or file format license holders, which can become prohibitively expensive or obsolete over time.

Open data is a key component for achieving interoperability. Interoperability is the ability of different information technology systems and software applications to communicate, exchange data, and use the information that has been exchanged. Combining different datasets together to develop new applications within large, complex systems is where the real value of interoperability lies.

The most effective way for data to be turned into useful information is through visualization, analysis, or summarization. The U.S. General Services Administration, who manages Data.gov, recommends government agencies to release their data in a format that facilitates processing. In other words, publishing data in machine-readable formats are likely to be more useful for application development than purely human-readable formats. Table 10 provides several examples of data formats that can be applied to open data.

Table 10. Example data formats.
Format Human-Readability Machine-Readability
PDF (Portable Document Format) Primary document format used to make government information available to the public. To make a PDF machine-readable, Optical Character Recognition (OCR) is needed. Metadata on the document's author or nature of its contents can be included.
CSV (Comma Separated Variables) The most common machine readable format, which can be produced using many standard database and spreadsheet tools. Data is stored in a tabular, text-based format that is easily exchanged by machines, but is difficult for computers to find common elements between datasets.
XML (Extensible Markup Language) Popular format/language for data exchange because of the ability to structure the data with tags that can be interpreted by humans. Developed to make the metadata of documents more readily available, which is essential for search tools to find a particular document in response to particular queries.
JSON (JavaScript Object Notation) JSON is a text-based, human-readable format for representing simple data structures and associative arrays (otherwise known as objects). A machine readable data format derived from the JavaScript language used on many Web sites. Easily readable for any programming language.
RDF (Resource Description Framework) RDF is a general-purpose language for representing information in the Web. Less human readable than the other formats listed in this table. A data language used to represent data and information as Web resources, so that the can be "linked" together. It allows common terms to be linked between datasets.

Further information, including guide on how to begin opening up data, can be found at http://opendatahandbook.org/ and https://www.data.gov/developers/blog/primer-machine-readability-online-documents-and-data.

Not only is it crucial to pick the most effective data format for publishing, but picking the right portal to make open data accessible is just as important. While simple already structured or static data that does not need visualization can be posted in any number of ways, other datasets need special handling in order to be useful. Below are several types of commonly used and adaptable open data portals that are available to the public sector.

Enterprise Open Source

The Comprehensive Knowledge Archive Network (CKAN) is an open-source data portal that offers helpful tools for streamlining, publishing, sharing, finding, and using large enterprise datasets. CKAN has more than 300 open-source data management extensions that are constantly evolving. Features include a fast search experience, easy data uploading, and the ability to plot geographic data in an interactive map. For Data.gov, CKAN works as a data harvester, pulling data from other agencies like the Department of Agriculture and National Aeronautics and Space Administration (NASA), federating the data into one searchable catalog. Drupal Knowledge Archive Network (DKAN), a derivative of CKAN, offers a plugin for Drupal, an open-source content management system with the option for cloud‑hosting. It is simple to deploy and maintain, and can be self-hosted through GitHub.

Map-Based Portals

ArcGIS Open Data is a go-to solution for Esri software users because the open data builds directly on top of already published ArcGIS services. ArcGIS Server and ArcGIS Online allow the configuration and federation of geodata into an open data portal. Data and metadata can be viewed in the browser, and users can interact with the data and download it in several formats. ArcGIS offers a wealth of mapping options for geodata, but does not have other advanced visualization tools. There are ways to create charts and simple tools to view and interact with the datasets, however, and advanced search and filtration options are user-friendly.

Advanced Data Visualization Services

Organizations that want more data visualization should consider services like Junar, Socrata, and OpenDataSoft.

Junar is an easy-to-use, software-as-a-service open data cloud platform that focuses on powerful analysis and visualizations. It offers a range of routines, protocols, and tools for building software applications, otherwise known as Application Program Interfaces (API), which enable developers and users to integrate data back into their own applications, and is currently used for open data portals by the Cities of Sacramento and Palo Alto.

Socrata can host significantly large datasets. Users can publish to Socrata using a desktop sync tool or APIs; data can also be uploaded natively as CSV files, Excel files or Tab Separated Values (TSV) files. The portal offers support for shapefiles as well (e.g., Keyhole Markup Language (KML), KML Zipped (KMZ) and GeoJSON). Socrata has tools structured around metadata management and workflow, like filter tools to narrow the information, export data, conduct analytics, create visualizations—like charts and map overlays—and view the data from a spatial perspective. The City of Chicago uses Socrata for its public data portal of 5.8 million records of crime data dating back to 2001. The New York Police Department also uses Socrata to publish and publicly display crash and collision data.

OpenDataSoft also allows for interaction and visualization through automated API generation. The platform is easy to use, works well with large datasets, supports geospatial formats, leverages Elasticsearch, and ensures near real-time search and analysis. Publishing and management of data are easy with live dashboards, and the OpenDataSoft display is designed for display on mobile devices.

Further information:

https://gcn.com/articles/2015/07/10/open-data-portal.aspx
https://ckan.org/
https://getdkan.org/
https://hub.arcgis.com/pages/open-data
https://socrata.com/

Git is a distributed version control system, which is used by services, such as GitHub, BitBucket, GitLab, or Gitorious. The advantages of using a distributed version control system (versus non-distributed version control systems, such as subversion or CVS) is that when a user clones the project, it includes the entire project history. This allows a developer to commit, branch, and tag changes on their local machine without interacting with a server. Among open-source projects, GitHub is the most widely service to manage project code. It stores a copy of the project's repository, and allows developers to fork a project's repository to use as their own centralized repository. GitHub also has user-friendly documentation functionality.

Further information:

https://github.com/
https://www.unleashed-technologies.com/blog/2014/08/01/what-github-and-how-can-it-benefit-your-development-team

National Initiatives

Project Open Data

The White House developed Project Open Data—this collection of code, tools, and case studies—to help agencies adopt the Open Data Policy and unlock the potential of government data. Project Open Data has evolved over time as a community resource to facilitate adoption of open data practices. It is published on GitHub as a collaborative, open-source project for Federal employees, as well as members of the public. Since policy cannot keep up with the pace of technology advancement, Project Open Data was designed to be a living document, with the continual update of technology pieces that impact open data best practices. The Project Open Data Metadata Schema and Open Data Policy M-13-13 policies (refer to links below) have very regulated release cycles.

Further information:

https://project-open-data.cio.gov/
https://project-open-data.cio.gov/schema/
https://project-open-data.cio.gov/policy-memo/

Data.gov (The Home of the U.S. Government's Open Data)

In accordance with the 2013 Federal Open Data Policy, Data.gov is managed and hosted by the U.S. General Services Administration. It allows governmental agencies to share data for public access on various topics. Just like Project Open Data, it is an open-source project that is developed publicly on GitHub. Data.gov does not host data directly, but rather aggregates metadata about open data resources in one centralized location. Therefore, data sets displayed on Data.gov must follow the Project Open Data metadata schema. Once an open data source meets the necessary format and metadata requirements, the Data.gov team can pull directly from it as a Harvest Source, synchronizing that source's metadata on Data.gov as often as every 24 hours.

Further information:

https://www.data.gov/

Public Safety Open Data Portal

The Police Foundation's Public Safety Open Data Portal is intended to serve as a central clearinghouse for accessing, visualizing, and analyzing local and national law enforcement and public safety open datasets. The portal currently contains select datasets from agencies participating in the White House's Police Data Initiative (PDI), as well as national data to provide context for the local data.

Further information:

https://www.policedatainitiative.org/

State and Local Open Data Portals

In 2014, the Center for Data Innovation ranked each State's progress in creating open data policies and portals (see https://www.datainnovation.org/2014/08/state-open-data-policies-and-portals/). The top-scoring States in terms of quality of open data policies and quality of data portals were Hawaii, Illinois, Maryland, New York, Oklahoma, and Utah. The following case studies present several examples of portals, which contain extensive catalogs of open data, are relatively simple to navigate, and provide data in machine-readable formats. The portals also provide links to APIs to download particular data, and have other information designed specifically for developers looking to build applications using the data.

Maryland

One of the major strengths of Maryland's open data efforts is its Council on Open Data, a group that comprises 37 government, academic, and private-sector leaders in Maryland. The group meets at least twice a year to discuss recommendations to the State's Legislature, and improve transparency in the State. Senate Bill 644 mandates that open data be released to the public in multiple machine readable formats. The State's public datasets are housed via the Socrata Open Data Platform. Nearly 400 datasets are transportation related, including traffic volumes, vehicle miles of travel, port cargo, transit ridership, incident locations, and road network performance measures.

Further information:

https://data.maryland.gov/
http://www.govtech.com/data/Maryland-Legislation-Creates-Council-on-Open-Data.html
https://technical.ly/baltimore/2013/05/13/data-maryland-gov-launches/

City of Chicago

The City of Chicago's Data Portal is dedicated to promoting access to government data, and encouraging the development of creative tools to engage and serve Chicago's diverse community. The Socrata-powered site hosts over 600 datasets presented in easy-to-use, machine-readable formats about City departments, services, facilities and performance. Among these are average daily traffic counts, taxi trips, Divvy bikeshare trips, Chicago Transit Authority (CTA) bus speeds, and transportation system performance metrics. Datasets published on the Data Portal are fed into WindyGrid, the City of Chicago's internal situational awareness platform. Recently, the City released OpenGrid (see http://opengrid.io/), a new interface into the Data Portal, which allows members of the public who may not have access to Geographic Information Systems (GIS) or other data visualization tools to layer data on top of other datasets. This open‑source, low-cost business intelligence tool allows governments, nonprofits, and corporations to enable real-time situational awareness.

Further information:

https://data.cityofchicago.org/
https://socrata.com/case-study/chicago-growing-open-data-economy/

New York City

As part of an initiative to improve the accessibility, transparency, and accountability of City government, NYC Open Data offers access to a repository of government-produced, machine-readable data sets, also housed via Socrata (see https://opendata.cityofnewyork.us/ ). One of the areas within NYC Open Data is real-time traffic speed data. Real-time speed data are being collected by speed detectors belonging to different cities and State agencies. NYCDOT's Traffic Management Center (TMC) gathers this data from certain locations, mostly on major arterials and highways to create the Traffic Speeds Map (available for public access at https://webcams.nyctmc.org/). NYCDOT also uses this information for emergency response and management.

Further information:

https://data.cityofnewyork.us/Transportation/Real-Time-Traffic-Speed-Data/xsat-x5sa/data

Miami-Dade County

Miami-Dade County's transportation-related data is provided through a GIS open data site as a public service to its residents and visitors. This open data portal is powered by Socrata. The County is continually editing and updating GIS data to improve positional accuracy and information. Data can be previewed in the map and downloaded as a spreadsheet, shapefile. KML or linked via API. Currently, there are nearly 200 GIS datasets available for download. However, no volume or speed data is available on this site.

Further information:

https://opendata.miamidade.gov/

Traffic Monitoring Programs Case Studies

Case studies on statewide traffic monitoring were conducted by the Federal Highway Administration's (FHWA) Office of Highway Policy Information (https://www.fhwa.dot.gov/policyinformation/tmguide/tmg_2013/compendium-of-designing.cfm).

Regional Integrated Multi-Modal Information Sharing (RIMIS)

The Delaware Valley Regional Planning Commission (DVRPC) is the Federally designated Metropolitan Planning Organization (MPO) that serves the greater Philadelphia region, including nine counties. These agencies share their traffic data and resources through the RIMIS Project, whose primary objective is to provide information about incidents, maintenance, and construction activity; and special events that impact the transportation system. In addition to event information, RIMIS is a common platform to distribute CCTV images, VMS messages, and traffic speeds.

Further information:

https://www.dvrpc.org/Transportation/TSMO/RIMIS/
https://www.dvrpc.org/operations/pdf/2009-02_RIMIS.pdf

Internet Traffic Monitoring System (iTMS)

The Bureau of Planning and Research (BPR) in the Pennsylvania DOT partners with Metropolitan Planning Organizations (MPO), Rural Planning Organizations (RPO), PennDOT Engineering Districts, and vendors to accomplish traffic counting programs. The traffic data shared between these agencies will be eventually made available for public users through iTMS. The type of information provided by this tool include AADT, count frequency, count year, and latitude/longitude at any given site locations.

Further information:

https://www.dot7.state.pa.us/tire

Traffic Count Database System

The system, which is part of the Mid-Ohio Regional Planning Commission (MORPC) Transportation Data Management System, is the result of a multi-jurisdictional effort in modernizing traffic count data sharing in the Central Ohio region. Five agencies—Franklin County, City of Columbus, Delaware County, Licking County Area Transportation Study and Ohio Department of Transportation (ODOT)—directly input traffic counts into the system, and MORPC collects and inputs traffic counts from private consultants and other local governments across the region. The data are then being shared with the public instantaneously. Users can retrieve traffic count data by entering specific criteria or by clicking a location on the built-in Google Map.

Further information:

http://www.morpc.org/data-maps/transportation/index.aspx
http://www.ms2soft.com/wp-content/uploads/2014/12/25_CaseStudy-MORPCTrafficCountDatabase51.pdf

You may need the Adobe® Reader® to view the PDFs on this page.

Office of Operations