top of page

US presidential elections

US congress elections

This website is designed with the aim to organize (more or less) rationally a list of bookmarks and other materials that I have collected over time about Google Insights dealing with politically related search trends. Honestly speaking, I wanted to do this website several years ago, but due to a chronic lack of free time, I managed to do it only at the end of summer 2012.



An increasing number of analysts and journalists has reported evidence of the possibility to predict elections results by using the relative frequency of Google search terms:



- "The Google poll"

- "Can Google call elections?"
- "Predicting Election Results With Google"
- "Searching your way to the ballot box"
- "Google Predicts Obama for President"
- "What Voters Google Before Heading to the Polls"
- "UK Election 2010: Google Insights for Search Tracks Interest In Campaign"
- "Is Google the ultimate UK political pollster?"
- "Bill White leads Rick Perry in one poll -- more people search Google for him"



The possibility to have access to Google search data has stimulated considerable interest in various disciplines. Among these, political forecasting is probably the most recent, while important results has already been achieved in the economic, financial and medical fields. [What follows is an article what I wrote and that appeared as a guest post here].

The main reason which has so far limited any development in the field of political forecasting is the so-called self-selection bias: in this case, the individuals included in the analyzed sample are not chosen randomly, but rather they decided themselves to enter the sample, thereby creating a biased sample, which is not representative of the entire population. Unfortunately, this is exactly what happens with Google Insights and Google Trends (GTI), which reports data only for those individuals who actively searched on Google.
For instance, GTI contains only a small number of online searches of political nature by people over 60 years old, who use the internet much less than younger people (for obvious reasons): unfortunately, this group is the most active when voting is of concern. Similarly, if the internet penetration rate is not very high in the examined population, the self-selection bias will be pretty large. See the links below for more details and references:



http://en.wikipedia.org/wiki/Sampling_bias

http://en.wikipedia.org/wiki/Selection_bias
http://en.wikipedia.org/wiki/Self-selection_bias


This issue has been a well known problem to people conducting online surveys using the surveying technique known as Computer-Assisted Web Interviewing (CAWI):


http://en.wikipedia.org/wiki/Computer-assisted_web_interviewing


However in this case, it is possible to know  which groups of individuals are under-represented and which are over-represented (at least approximately) by comparing the qualitative characteristics of the sample used for the online survey with those of the population, and then creating the appropriate sample weights to correct the bias present in the original online sample.


Unfortunately, when working with GTI data, you do NOT have access to this kind of qualitative information, so that it is not possible to re-balance the search data for a candidate or a political party in order to have a representative sample of the population. That said, we then have four possibilities:​



​1) Use the simple GTI data with no corrections: this option is currently possible with a limited number of political elections, only and exclusively at the national level, with a very large turnout (over several millions voters) and a very high internet penetration rate. One such case is the U.S. presidential election:

​

bottom of page