Survey Format and the Trade-Off Between Internal and External Validity

By Sarah Kreps and Stephen Roblin.

Over the last couple of decades, survey experiments have increased in prevalence and prominence in the field of political science. International relations scholars’ increasing reliance on survey experiments has followed suit, as researchers look to experiments to study the subfield’s core questions, whether about audience costs, the democratic peace, or the (non-) use of nuclear weapons.

The growing popularity of survey experiments is largely due to one advantage they have over observational data: internal validity. Experiments are attractive because they help circumvent methodological concerns such as selection effects that often plague observational data. Experiments can also address endogeneity concerns—whether democracy causes peace or peaceful circumstances allow for democratic institutions to take root is difficult to parse from observational data.

To exploit the advantages of experiments, IR scholars typically embed key variables within vignettes, “short descriptions of a person or a social situation which contain precise references to what are thought to be the most important factors in the decision-making or judgment-making processes of respondents” (Alexander and Becker 1978). The vignette presents a short scenario that focuses primarily on the relevant theoretical factors, such as regime type, and holds other factors constant in order to isolate the effect of treatment on public opinion.

One of the common critiques leveled against survey experiments, however, is their lack of external validity. Some IR scholars have tried to mitigate this concern by taking heed from colleagues in American politics who embed variables within mock news stories that offer more external validity—while ostensibly retaining internal validity—on grounds that the mock news design better simulates the circumstances under which people consume information about international affairs.

In a study forthcoming in International Interactions, we perform the first test of whether the choice of survey format—namely short vignettes or mock news stories—entails a trade-off between internal and external validity. On the one hand, short vignettes may improve internal validity by isolating key variables without overloading respondents with information, thereby mitigating satisficing—economizing on time and responding without fully comprehending the details of the scenario or question—and improving data quality.

In contrast, mock news stories may strain the attentiveness of respondents and therefore increase satisficing by embedding more information, such as contextual details and controls for confounders, in stories that have the style and appearance of real news. For this reason, mock news stories may increase noncompliance rates, thereby sacrificing data quality. On the other hand, mock news stories may increase the external validity of the scenario by more faithfully representing the way in which individuals consume information in the real world. For this reason, mock news stories may enhance the generalizability of findings vis-à-vis short vignettes.

To study the potential tradeoffs between internal and external validity of different experimental techniques, we conducted a survey experiment on Amazon Mechanical Turk, with a sample of 1,400 respondents, in January 2018. Our empirical study was nested in the context of the well-established democratic peace theory. In their widely cited article, Tomz and Weeks (2013) produced an experimental test of the democratic peace using a vignette design in which a democratic or autocratic regime was developing nuclear weapons and the US was considering an attack to prevent the country from acquiring those weapons. We created three versions of their baseline scenario—a short vignette, mock news story, and long vignette:

  1. In the short vignette, we used Tomz and Weeks’ baseline scenario in order to simulate survey approaches common to other IR studies in which scholars design scenarios that manipulate the causal variable of interest—for example, regime type without embedding controls.
  2. We then incorporated the scenario into a mock news story that resembles an online news article. We included the controls in Tomz and Weeks’ scenarios, in particular the country’s economic and security ties to the USA, and, consistent with Press et al. (2013), we specified the region of the country (Eastern Europe). The headline specified the country’s regime type and pull-quotes reinforced the strength of security and economic ties to the US.
  3. To avoid conflating format with length, we created a long vignette that contained the same information in the mock news story, but excluded the headline, pull-quote, and trappings of an online news story.

To test whether mock news stories undermine internal validity, we gave respondents two manipulation checks to determine if they could recall the treatment, specifically the country’s regime type. To test whether mock news stories enhance external validity, we asked respondents whether they felt the scenario was believable, accurate, and authentic and used the results to create a credibility index.

Our findings showed that individuals’ responses to the manipulation checks and credibility questions do not hinge on the survey format they receive. In short, we found no evidence for the trade-off between internal and external validity.

We did find stronger treatment effects in the short vignettes. Consistent with Dafoe, Zhang, and Caughey’s (forthcoming) replication of the Tomz and Weeks study, the findings suggest that this was due to confounding effects, not the greater internal validity of short vignettes as we theorized. Respondents who received the short vignette with the autocratic regime treatment were least likely to believe the country had a majority White and Christian population, thus raising the possibility that relatively strong treatment effects in the short vignette groups were due to respondents’ greater willingness to attack non-White and non-Christian countries.

Overall, our results suggest that while existing studies in IR have used different experimental formats, primarily vignettes and mock news, the findings across formats may be comparable. The choice of survey format, we believe, should depend on the research question. Likewise, whether to account for confounders also depends on the research question. For example, scholars interested in understanding why regime influences support for war should control for confounders, whereas scholars focused on the effect of political labels on public opinion need not include controls.

Moving forward, researchers should investigate whether there is a threshold effect in which the quantity and complexity of information jeopardizes the internal validity of mock news stories. Scholars should also explore alternative methods of measuring external validity beyond credibility. Finally, extensions of our study could test the robustness of our findings within nationally representative samples.

Sarah Kreps is Professor of Government and Adjunct Professor of Law at Cornell University and a regular contributor at PV@Glance. Stephen Roblin is a PhD student at Cornell University. 

Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like