How we assemble election results
This guide outlines how we (the team at Deck) assemble precinct-level election results.
The finished product we’re hoping for will include:
- 1.clean precinct-level election results for all state and federal elections since 2012
- 2.a system for linking those results to specific geographies
- 3.a system for linking the results to specific voters based on their voting address at the time of each election.
Unfortunately, publicly available data on precinct-level election results is a mess.
Precinct-level election results are published by state and county governments in non-uniform formats that frequently contain strange formatting and inconsistencies. Open Elections has posted hundreds of examples through their Twitter account. In Election Data Transparency: Obtaining Precinct-Level Election Returns, Willis, Merivaki, and Ziogas detail several widespread issues with the availability and usability of precinct-level election results in the 2016 and 2018 elections.
At the same time… the results are arguably the easy part! As outlined in Mismatched: The Trouble with Making a National Precinct Return Shapefile by the MIT Election Data + Science Lab, the geographies associated with those precinct-level results are practically impossible to accurately pin down and associate with voters in a systematic way at a national scale.
To explain why, we need to introduce four concepts:
- Map precincts are usually maintained by county governments. These are theoretically the lowest level of electoral geography, which are then used to construct larger districts. Some offices, such as Precinct Captain, are even elected to represent these districts. However, precincts can be changed incrementally without much (if any) public notice. As a result, the relationships between precincts and larger electoral districts can fall out of sync.
- Registration precincts (a.k.a. voter file precincts) are the precinct listed in a person’s voter registration record. In theory, a person’s registration precinct should match their map precinct — but human error or bad geocoding can lead to a mismatch. Further, updates to map precincts are not always reflected in registration records. Sometimes, a person’s registration precinct is not updated to match their latest map precinct until they update their registration record.
- Voting precincts (a.k.a. consolidated precincts) are the smallest geographic units that report election results. In many cases, voting precincts are the same as map precincts. However, they are often combinations of map precincts that share a single polling place on Election Day. This means that voting precincts can have a one-to-many relationship with map precincts. However, those relationships are rarely documented, leaving us to deduce what map precincts have been joined together to form a voting precinct — usually with very limited information.
- Voting tabulation districts (a.k.a. VTDs) are maintained by the U.S. Census Bureau through a partnership with state governments. The Census Bureau is required to help states with redistricting by linking a state’s precincts to official Census geographies — which then allows states to more easily conduct demographic analyses of proposed districts so they can ensure compliance with the Voting Rights Act. The whole process and history is laid out here. In theory, this means that VTDs give us a clean, once-per-decade snapshot of a state’s map precincts. However, this isn’t always the case. First, several states (Kentucky, Montana, Oregon, Rhode Island, and sometimes California) do not fully cooperate with the VTD program. Second, when the Census revises a state’s precinct map to align with Census geographies, the state has no obligation to adopt those revisions. Third, it appears that states report a mix of map precincts and voting precincts for conversion into VTDs. And fourth, the names given to VTDs are often very different from the names used to identify precincts elsewhere. (This is the case because the purpose of VTDs is not to serve as a user-friendly precinct map, but rather to help states with redistricting.)
In short, state and county governments define small electoral geographies called precincts that are meant to serve as the atomic geographic unit for reporting results and constructing larger electoral districts. However, precinct boundaries are revised over time without clear public documentation and are often joined together in poorly-documented groupings for the reporting of results.
While the voter file gives us a theoretical window into each voter’s precinct assignment at a given time, these records are not always up-to-date. And while Census VTDs give us a theoretical national precinct map, state cooperation with the VTD program varies and the maps are only updated once every ten years.
So, making sense of precinct-level results and mapping them to the correct geographies and voters is challenging. But progress is possible!
U.C. Berkeley’s Statewide Database has documented the relationships between California’s map, registration, and voting precincts since 1990 in order to build an accurate statewide election result resource for all state and federal contests. However, as their amusingly complex diagrams of precinct relationships, conceptual outlines, and thorough FAQ make clear, doing it right is complex work and it’s unlikely that the final product will ever be perfect.
Acknowledging that, our process aims for thoroughness, but also accepts that there will be imperfections (which we hope to identify, measure, and limit) if we want national-scale coverage.
This approach involves:
- Gathering as much high-quality raw data on precinct results and geographies as possible.
- Standardizing all of the spatial data on precincts and maintaining as many variations of each year’s map, registration, and voting precincts as possible — acknowledging that credible sources may disagree on the correct boundaries.
- Adding metadata to the precinct shapefiles and results that will assist with matching (such as alternate names and name components, vote totals, and all of the overlapping districts contained in a given precinct).
- Using tiered matching logic to join precinct-level election results to each plausible precinct shapefile.
- Disaggregating results to the Census block level — using voting method and overlapping districts to guide proportional allocation. (With quality-based weighted averages when multiple precincts’ results match a given block for a given year.)
- Reassembling blocks into two sets of precinct-level results: one made up of VTDs and another made up of registration precincts from the voter file.
- Maintaining systems for ongoing validation and transparency.
While national-level, spatially-matched precinct results do not exist, many wonderful people and organizations have assembled most of the necessary building blocks.
Despite the issues with map precincts referenced above, a handful of good samaritan states are doing the lord’s work by publishing their map precincts as public shapefiles. In most cases, these shapefiles only capture current shapefiles. But in others (e.g., NC and WA), annual snapshots are provided.
The states currently publishing map precincts include Arkansas, Iowa, Michigan, Minnesota, North Carolina (plus a bonus source), Texas, and Washington.
Additionally, Nathaniel Kelso’s election-geodata is an open source project focused on bringing precinct shapefiles from numerous different sources together into a single national precinct map. While this is a helpful resource for filling gaps, the project has not been maintained since 2019.
We are relying on archived voter file records from the DNC and TargetSmart to construct annual snapshots of registration precincts. For each year, we identify which precinct is assigned to registered voters most often in a given Census block.
We then assign that precinct to the given Census block and union all of the blocks associated with a single precinct into a polygon.
While several election result sources provide combined spatial and results data (see the following section), we still need non-spatial results to fill in the gaps. We gather this data from three sources.
The first is Open Elections, an open source project that seeks to assemble precinct-level, county-level, and district-level results for as many elections as possible. Open Elections relies on volunteers to gather and process results for specific states or counties in stages: first collecting source data, then wrangling that data into a format compatible with the standard Open Elections format, then undergoing QA and being finalized into the standard format. Open Elections maintains three distinct Github repos for each state based on this staging system. For example, the FL repos are openelections-sources-fl (raw source data), openelections-data-fl (data after initial processing), and openelections-results-fl (finalized results ready for publication). Unfortunately, most of the content in the “data” repos has not been QA’d for transfer to a “results” repo. And since this is an open source project, that means there may be meaningful quality gaps. Still, it’s great that this resource exists as a fallback when more reliable results data is unavailable.
The second source we rely on is results.wa.gov (while WA provides a map precinct shapefile, as noted above, it unfortunately does not come pre-linked to results). While we mostly rely on other initiatives to gather and process raw results data, getting WA directly is important because the state administers top-two elections, which means a general election can include just two Democrats or just two Republicans. This adds a wrinkle in our result-to-campaign matching logic, which usually relies heavily on party affiliation. (Many sources fail to include candidate names or only include limited name information, such as last name or candidate initials.) Because of that, we grab this state’s results directly. (Given the spread of ranked choice voting and the similar challenges it presents, we may begin collecting results directly from other jurisdictions as well in the near future.)
Finally, our most significant source for processed standalone election results is the MIT Election Data + Science Lab – a truly great resource that covers recent elections (since 2016) with much higher quality (re: depth and standardization) than Open Elections.
Oh! And one more thing: in order to match these results to our precinct shapefiles, we need to identify unique “voting precincts” in these records. These will not have corresponding shapes, but they will have names, overlapping districts, and vote counts that can be used in a matching scheme (detailed below).
Finally, we’re lucky enough that three organizations — the Metric Geometry and Gerrymandering Group (MGGG), the Voting and Election Science Team (VEST), and Statewide Database (SWDB) — have built products that attempt to merge voting precinct shapefiles with published precinct-level election results.
MGGG is a project of Tufts University that has assembled precinct shapefiles and election results in 34 states for elections between 2012 and 2018. This data has been assembled to power their Districtr web app, which is meant to help researchers explore the impact of different approaches to redistricting. In most cases, the precincts represented in MGGG shapefiles are map precincts acquired from state and county governments. In others, they might be variations on Census VTDs or manually combined Census blocks. Much of the reported MGGG source data is either no longer available on the internet or was *never* publicly available, so their maps are very valuable. (The Redistricting Data Hub has documented the availability and lineage of all the source data reported by MGGG.)
VEST is maintained by professors from the University of Florida and Wichita State University and published by the Harvard Dataverse. As with MGGG, VEST precinct shapefiles are formed from a mix of state, county, and Census data. VEST’s election results come with detailed READMEs documenting the source and matching process for each county — including where, how, and why they may have manually adjusted shapefiles gathered from public sources (here’s an example from 2016). Unlike MGGG, VEST continues to actively gather data for recent elections. Their Twitter account announces new releases regularly.
SWDB, a project of U.C. Berkeley, works directly with county governments to link map precincts, voting precincts, registration precincts, and Census blocks to election results through a seemingly painstaking (and thoroughly documented) process. However, their work is limited to California, as the project was originally conceived to improve the state’s decennial redistricting process.
With this data in hand, we now have to standardize it so all the various shapefiles are compatible. Since the lowest level of reliable voter-level geocoding we have is a voter’s Census block, we want to enforce clean precinct-to-block relationships.
First, we identify the blocks that each map and registration precinct intersects with.
For registration precincts, this involves grouping archived snapshots of the voter file by block and identifying the most common precinct associated with a given block in a given year (sorted by registered voter count).
For map precincts, this involves identifying the blocks that intersect with a given precinct, calculating the intersection area, and assigning blocks to the precincts with which they share the largest area of intersection. Population and area precincts are not always the same, but since we don’t have reliable national lat-long geocoding of registered voters, this is our best option.
Then, using the crosswalks, we construct new standardized polygons by unioning each precinct’s block polygons together. We do this separately for map and registration precincts.
Before we can successfully match results without spatial attributes to the precinct shapefiles we’ve prepared, we need to generate features to ensure accuracy in the matching process.
Precinct names in two different sources do not always (or often!) match each other — and even when they do, it could be a false positive. By double checking the voter counts and district overlaps between two sources, we expect to get more accurate and plentiful matches.
First, we need to clean up precinct names. Across all of our raw sources, precinct names are littered with inconsistencies — leading zeros, accents, special characters, abbreviations, and more. Additionally, the same precinct might have entirely different names across multiple sources. It could be named for its polling place, for its VTD FIPS code, for a code assigned by its county government, or using some other convention altogether.
To manage this, we break a name up into a set of features: the numeric components (without leading zeros), the text components (in standardized unicode, all caps, and with no special characters), and tokenizations of all text components. This then allows us to match between arrays of name components among all the precincts in a given county, and then identify the precincts with the largest number of matching components (as a share of all constructed components).
Precinct results and shapefiles each indicate a set of related districts. In some cases, the sets of districts shared between sources can be a powerful matching key. In this step, we identify all of the districts associated with each voting precinct (in our results data) and all of the districts associated with map and registration precincts (in our geographic data). Given shifting boundaries, it is common for a precinct to represent multiple districts for the same office, so we make an effort to capture all precinct/district relationships.
Finally, when name matches cannot be determined with high confidence, or multiple matches are found, we can use counts of voters to prioritize likely fuzzy matches and break ties. Here, for each precinct/year combination, we gather the registered population, general election voting population (overall and by method), and mean partisanship scores.
With all the precinct shapes standardized and matching metadata prepped and ready to go, it’s MATCHING TIME!!!
Using the matching fields detail above, we join map and registration precincts to voting precincts (& their associated results) with the following tiering system:
> 90% match
> 75% match
> 50% match
At least one of each type
2, 4, 6, 8, 10
Vote count within 50% OR partisanship within 20%
5, 6, 7, 8, 9, 10
In this system, a given voting precinct and map/registration precinct can be present in multiple matched pairs. The goal here is to cast as wide a net as possible and document what factors were incorporated in identifying a given match, and what sources were used on either side of the match.
After matching across our various sources, we now have multiple overlapping results for many parts of the country in a given election. In a perfect world, these results will precisely match. But given the inconsistencies in precinct data, that is not the case.
So we then use all of the matches above to disaggregate precinct results to the Census block level, and create a single results value for each block (by method where available). In averaging together matched results from multiple sources overlaying a single block, we apply the following weighting system:
Finally, we can then use the crosswalks built in Step 2 to generate results for the map and registration precincts we’ve identified across our various sources. This is done by simply joining the block results above with the block/precinct crosswalks and block population count tables, then grouping by precinct (and method, where available) and generating a precinct-level weighted average (weighted on voting population for a given election) of the block-level results.
This is a messy-ass process with a lot of moving parts! We’re going to be iterating on validation and transparency measures over time. To start, we will
- 1.Roll up results to the county and district level to verify that we see the expected alignment
- 2.Lean on a Slack bot that will spit out 3 random precinct-level results every day for us to manually spot-check.