Taking Note: Even if You First Succeed, Try Again!
One of my colleagues at the NEA, Bill O'Brien, is fond of saying that a key difference between art and science is that artists don't need to replicate their findings. This observation in turn recalls for me a quote by Philip Larkin: "Poetry is not like surgery, a technique that can be copied: every operation the poet performs is unique, and need never be done again."
But scientists, even those examining the impact of arts and culture, need validation of their work if it is to prove original or, better still, generalizable. Little over a month ago, Science magazine published a shocker by Brian Nosek, executive director of the University of Virginia’s Center for Open Science. Nosek and his research team tried to replicate 100 experimental and correlational studies whose findings had been published in three psychology journals.
The result? While 97 percent of the original studies showed statistically significant findings, only 36 percent of the replicated studies did. More damningly, the effect sizes (i.e., the relative strength of the phenomena being tested by the individual studies) for the replicated studies were only half of those in the original.
The single worst conclusion to draw is that the problem is somehow limited to psychological studies. Critics have long maintained that publication in peer-reviewed scientific journals is biased toward reporting only positive results, which tempts some researchers to dismiss or downplay counter-findings. This bias also discourages additional, time-consuming research to see whether initial results are reproducible.
Although behavioral and social scientists have not been alone in encountering these flaws, they have been among the most eloquent in combating them. (Brian Nosek is himself a psychologist.) Exactly three years before Nosek’s article appeared in Science, Nobel laureate and behavioral-economics superstar Daniel Kahneman chided researchers studying the phenomenon of “social priming” for not insisting on greater rigor. In an open letter, Kahneman wrote: “Your problem is not with the few people who have actively challenged the validity of some priming results. It is with the much larger population of colleagues who in the past accepted your surprising results as facts when they were published.”
“These people have attached a question mark to the field, and it is your responsibility to remove it,” he added, calling for a “daisy chain of labs” that would replicate results from social-priming studies.
But the most thorough and, in my opinion, elegant statement of the problem is a May 2015 report commissioned by the National Science Foundation from a group of behavioral and social scientists. The report distinguishes usefully among the virtues of reproducibility, replicability, and generalizability. It also identifies several “questionable research practices.” (Examples include: “failing to report analyses of all the measures collected in a study and describing only those that yield desired findings"; “deciding whether to collect more data after determining whether obtained results with a smaller sample document desired results”; and “reporting an unexpected finding as if it had been predicted a priori and thereby increasing its apparent plausibility.”)
At this point, if you haven’t done so already, you will ask why such topics are being treated in an Art Works blog. Back in July, I used this space to examine the behavioral and social scientific (including economics) basis of many studies about the arts, and how new developments in the social sciences can have consequences for arts-related research. Attending to concerns about replicability will only improve the reliability and generalizability of studies about the arts. This becomes especially important when we begin to discuss the effects (or even correlations) observed in a study of the arts’ plausible benefits.
Partly to encourage more researchers to wade into arts-related data—even if study findings already have been widely reported—the NEA’s Office of Research & Analysis has established a National Archive of Data on Arts & Culture, and will upload datasets there annually. In addition, we ask applicants to our research grants program (Research: Art Works) to include a data management-and-sharing plan in their proposals to us, for the benefit of other researchers.
Academic researchers can be expected over time to conduct more replication studies if funders and journals demand it. But even when researchers do not seek government funding or journal publication—but rather are assisting arts organizations or agencies as internal or external consultants to study a program or intervention—caveats about generalizability are worth keeping in mind.
At a panel event celebrating the NEA’s 50th anniversary, Chairman Jane Chu remarked that she’s visited 96 communities in 31 states since arriving at the agency in the summer of 2014. The first thing she learns with every new visit, she said, is how every community is different; no two are alike. We researchers would do well to remember this when aggregating, analyzing, and reporting data about the arts’ relationships to individual and community-level variables.