How we created Hubble Voter
At Deck we pull together data from many different sources, and build models to help campaigns and political organizations more deeply understand voters and their preferences. Because of this, we are in a unique position to assemble a person table that contains the best information available about voters all in one place. This also allows users to seamlessly connect person level data to robust election, geography, and results data, as well as individual level scores in Hubble.
- This person table contains all the information we have about voters in a human readable form
- Features are represented as a mix of numeric and categorical columns depending on what makes the most sense for the column
- This contains all voter information represented as numeric features
- Categorical features are one-hot-encoded into binary columns
- Date columns are converted to features that are useful in models, i.e. year registered rather than registration date
- The same as deck_person_modeling where all missing data is imputed
- Core person data including name, demographics, and geographic info
- TargetSmart scores are included
- Turnout info with vote method from primary, presidential primary, and general elections going back to 2008
- Census tract level data from the 2020 census
- Race and ethnicity, education, and socioeconomic indicators
- Support scores for generic Democrats at federal, state, and local levels
- Turnout scores for general and primary presidential, midterm, and odd year elections
- Retrospective support models for previous presidential campaigns and generic US House candidates
- Media consumption of different news and social media sources
- Contactability scores for texting, phones, and doors
All scores are formatted consistently from 0 to 1. Features have been renamed in some instances to be more intuitive and consistent. Key features including congressional and state legislative districts have been included, and additional features like local districts can be linked with other Hubble data.
One of the core challenges of building a national voterfile is identifying when a single voter has multiple records in different states. Depending on what users are doing, they might want to look at all possible unique voters, and want to avoid incorrectly linking two voters at all costs. Other users might want to minimize the chances of accidentally reaching out to the same person twice. To give maximal optionality, we’ve linked as many records as possible using combinations of name, date of birth, gender, and contact information. There is also a match_likelihood column to indicate how confident the match is from 0 to 1. No matches with match_likelihood < 0.4 are included in deck_person.
Using the linked records from entity resolution, reliable data like voterfile race, party registration, and complete DOB is pulled in when a previous record has more reliable or complete data than the most recent record. This gives our voterfile better data for millions of voters on some of the most important voter traits.
Last modified 1mo ago