Hands on reproducible analysis of neuroimaging data: Nov. 2-3, UCSD: Reference

Key Points

Introduction
Reproducibility Basics
Shell: Getting around the “black box”
  • A command line shell is a powerful tool and learning additional ‘tricks’ can help make its use more efficient, less error-prone, and thus more reproducible

  • Shell scripting is the most accessible tool to automate execution of an arbitrary set of commands. This avoids manual retyping of the same commands and in turn avoids typos and erroneous analyses

(Neuro)Debian/Git/GitAnnex/DataLad: Distributions and Version Control
  • Distribution and version control systems allow for the efficient creation of tightly version-controlled computation environments

  • DataLad assists in creating a complete record of changes

ReproEnv: Virtual machines/Containers, Neurodocker
Break
  • Coffee is essential!

  • Food cannot be controled/distributed by Git, but recipes could/should

FAIR Data
  • Slides with solutions can be found in the sfn2018-training repository: https://github.com/ReproNim/sfn2018-training/tree/master/FAIR

Overview: Data and the FAIR Principles
  • There are a number of practical guidelines and best practices for ensuring data supports reproducible research

  • This module is in line with our overall goal of making science (including scientific training) more open by ensuring that data is made FAIR (Findabile, Accessible, Interoperable, and Reusable).

  • There are a number of tools and standards to assist in making data FAIR.

Lunch
  • Food is necessary for our survival

  • Food cannot be controled/distributed by Git, but recipes could/should

Data Processing
Neuroimaging Workflows
ReproIn/DataLad: A complete portable and reproducible fMRI study from scratch
  • it is possible and easy to create valid BIDS datasets straight from scanner with little of planing upfront

  • we can implement a complete imaging study using DataLad datasets to represent units of data processing

  • each unit comprehensively captures all inputs and data processing leading up to it

  • this comprehensive capture facilitates re-use of units, and enables computational reproducibility

  • carefully validated intermediate results (captured as a DataLad dataset) are a candidate for publication with minimal additional effort

ReproIn/DataLad: A Reproducible GLM Demo Analysis
  • we can implement a complete imaging study using DataLad datasets to represent units of data processing

  • each unit comprehensively captures all inputs and data processing leading up to it

  • this comprehensive capture facilitates re-use of units, and enables computational reproducibility

  • carefully validated intermediate results (captured as a DataLad dataset) are a candidate for publication with minimal additional effort

Break
  • Coffee is essential!

  • Food cannot be controled/distributed by Git, but recipes could/should

Statistics
An introduction to the Statistics in reproducibility module
  • Reproducible analysis is impacted by statistical analyses.

  • Reproducible research requires understanding the notions of sampling, testing, power, model selection.

FIXME: more reference material.