By Joseph Young
I have a (at least one) secret sin. I enjoy the blog sites/posts called “What (fill in the blank of a stereotypical person you’d like to parody; Example 1, Example 2) Like.” In this spirit, I can tell you one thing that Washington, DC think tanks currently like: Data. Just in the international politics realm, for example, there’s a failed states index put out by the Fund for Peace, Transparency International* has a cross-national corruption index, the Heritage Foundation produces an Economic Freedom Index (not to be confused with the political freedoms index put out by Freedom House), and the Institute for Economics & Peace just released a Global Terrorism Index**.
This is, I think, mostly positive. The turn towards Big Data, as Nate Silver and others have shown in the popular realm, is a welcome improvement over pundits and expert judgments alone.
The data turn has its dark side too (as Lionel Beehner and I begin to outline in our critique of the failed states index). If the index is generating more noise than signal, we may be worse off than with no data. Why? Anyone unfortunate enough to have used Apple’s new maps program might understand this intuitively. We need data that helps us explain and predict the world around us. Some indexes help with this task, while some confuse it or tell us to drive into a lake.
What is an index for? At its base, an index is an indicator constructed to proxy an underlying or latent — read: not directly observable — concept. It is an aggregation of different data to get at this underlying concept (e.g., corruption, violence, or peace). Creating an index requires at least three steps.
- Select data. The data should, on its face, all relate to the proposed underlying concept, with all variables relating to a single dimension. Otherwise, we need a separate index. For example, many scholars of democracy have coalesced around the notion that there are at least two dimensions of democracy necessitating two separate indices (to be fair this is a contentious issue).
- Examine relationship among the data. Are the distinct pieces of data related? Examining correlations among two or more variables or using some other more complex mathematical examination (factor analysis, pca, etc.) are also a possibility.
- Assigning actual numbers. If an index is an aggregation of separate sets of data, one has to decide on how to put them all together. Weighting is often used. In the new Global Terrorism Index, for example, the components of the index are numbers of terrorist attacks, numbers of fatalities, number of injuries, and property damage. Each of these components is given a weight (fatalities are weighted more heavily than attacks, 3 to 1). These choices can be fairly arbitrary. A naïve critique would be to drop the whole assigning numbers endeavor. A more nuanced critique suggests adjusting these weights and examining how it influences the rank ordering of the cases and choosing the one that best fits theory/expectations. In other words, if the Failed States Index ranks Brazil as more likely to fail than Greece, then we might want to consider adjusting some of these aggregation rules (which is fact the case: Greece is 138 vs. Brazil at 123. Syria and North Korea also defy face validity tests with North Korea at 22 while Syria is at 23).
This discussion brings up an often overlooked and important fourth step — validation. Is the index measuring what we think it is measuring? Can it rank countries in a meaningful way that are most corrupt, allow the most economic freedom or are most affected by terrorism? This is a necessary final beta test before an index goes live.
As we social scientists are getting more involved in actually trying to predict future outcomes, it requires better and detailed (microlevel, geolocated) data (see Mike Ward and Nils Metternich’s recent Foreign Policy piece for an upbeat assessment of predicting political conflict). Assuming that these indexes are taking advantage of this revolution in data by aggregating this information and attempting to forecast political instability, freedom, etc., this is a positive change. Even better, these organizations could allow the user to make their own index by changing some of these coding rules and aggregating based on their own theories/predictions. Aid Data, for example, provides foreign aid data at a granular, geocoded level and allows the user to aggregate and disaggregate the data in any way they’d like (see my piece with Mike Findley doing just that). The Global Terrorism Database (GTD) hosted by START has become the gold standard for projects examining terrorism (the Global Terrorism Index reformats the GTD to produce their index). Part of the reason for this is that the GTD allows the user to filter cases, create their own data, and reformulate based on theory or the specific needs of a project.
To the extent that some of these data projects are PR moves to sell magazines, host receptions, or generate donations from donors, then less information might be better than more indexes.
* Yes, I know Transparency International is not based in DC.
**The Institute for Economics & Peace also puts out a Global Peace Index.