Taking Note: Parsing Rigor in Arts Education Studies


By Steven Holochwost, Principal and Director of Research for Youth & Families at WolfBrown and Research Scientist at Science of Learning Institute, Johns Hopkins University
Graphs and coffee
In a recent post on this blog, Patricia Shaffer, PhD, the deputy director of the National Endowment for the Arts’ Office of Research & Analysis, upheld the need for more rigorous research in the field of arts education. Dr. Shaffer based her conclusions in part on a report released last year by the American Institutes of Research (AIR). I heartily agree with this sentiment and deeply appreciate the Arts Endowment’s commitment to fund such work through its Research: Art Works grants. However, I also think it’s important to take the opportunity afforded by Dr. Shaffer’s post to consider carefully what constitutes rigor and to reflect on the status of the field in using rigorous designs. Rigorous designs are often implicitly understood to mean randomized-control designs (or RCDs). But what makes RCDs rigorous? In the parlance of research, it is their capacity to protect against threats to internal validity; in plain English, RCDs allow a researcher to rule out the possibility that some alternative explanation accounts for the observed effects of a program. One alternative explanation that looms large in arts education research is self-selection bias, or the tendency for individuals with certain traits (or certain forms of privilege) to pursue arts education. If people choose to study piano because they are smart, it would be incorrect to infer that piano instruction makes people smarter. However, if a researcher randomly assigns people to piano lessons and observes that those who took lessons are smarter than those who did not, then he or she may conclude that piano lessons made people smarter. The operative word is “may.” In the example provided, it is entirely possible that people assigned at random to piano lessons were smarter to begin with. In large samples, such failures of randomization are rare, but in small samples—e.g., the sample size often available to researchers in arts education—they are more likely. For observed traits such as intelligence, it may be possible to stratify the sample by intelligence prior to randomization or to control for initial levels of intelligence in the analysis. But by definition these approaches cannot be employed to account for ways that groups may differ according to unobserved traits. The point here is not to question the rigor of RCDs or the findings that have resulted from them, but rather to point out that labeling RCDs as the “gold standard” for rigorous research in all cases may not be justified. One result of this tendency is that RCDs may be adopted to the exclusion of other designs just as effective (e.g., single-case designs) or nearly as effective (e.g., regression discontinuity designs, carefully executed quasi-experimental designs) as RCDs in mitigating threats to internal validity. This is unfortunate, given that such designs are often well-suited to situations in which an RCD is either impossible or unethical. Whole-school models of program implementation do not allow for the assignment of students to a control group, and even when this assignment is feasible, program staff may be unwilling to withhold a program from students on philosophical or ethical grounds. All of that said, findings from rigorous research designs for the impact of arts education on human development continue to accumulate. As the Arts Endowment’s Dr. Melissa Menzer noted in the introduction to a special issue of Early Childhood Research Quarterly, evidence from RCDs remains rare, but it does exist. Sometimes this evidence goes overlooked because it is not covered as arts education research, but as more “basic” research. For example, National Endowment for the Arts research grantee Eleanor Brown employed a RCD to demonstrate that participating in a program of arts-integrated early education led to reductions of the stress hormone cortisol among disadvantaged children. But the excitement about the broad implications of these findings for children in poverty somewhat overshadowed the fact that, at its core, this was a rigorous study of the effects of arts education on children’s development. In summary, there is much to celebrate when we consider how far the fields of arts and arts education have come in understanding and employing rigorous research designs. That fact becomes particularly apparent if we employ a more nuanced understanding of what constitutes rigorous research. A very recent report issued by AIR takes this approach by grouping studies into tiers of evidence afforded by the designs employed. The future of arts education research is bright, and can be brighter still if we maintain a clear and open mind about what constitutes rigorous evidence.