Enabling Reproducibility in Computational and Data-enabled Science
As computation becomes central to scientific research and discovery, new questions arise regarding the implementation, dissemination, and evaluation of computational- and data-enabled methods that underlie scientific claims. Reproducibility in research can be interpreted most narrowly as a simple trace of computational steps that generate scientific findings, and most expansively as an independent re-implementation of an experiment testing the same hypothesis. In this talk I present a new framework for conceptualizing the affordances that support scientific inference including computational reproducibility, transparency, and generalizability of findings, demonstrated by recent two sets of research results on reproducibility. The first evaluates Science journal publication standards that require data and code sharing and finds a 26% computational reproducibility rate, and the second evaluates reproducibility for articles published in the Journal of Computational Physics which encourages data and code sharing. In the second study, no articles were fully reproducible: 82% of articles in our sample had no supporting data and code available, and when such artifacts were available we discovered inadequate documentation and missing information such as software input settings and licensing information hampered our reproducibility attempts. Finally, I will discuss how the findings from these two studies suggest future research directions on 1) ethics and incentives to engage in new research practices supporting computational and data-enabled research and 2) cyberinfrastructure design for scientific discovery.