Computational basis and ReproIn/DataLad

These lectures and hands-on exercises are a part of the training curriculum from the OHBM 2018 training course ran by ReproNim (Reproducible Neuroimaging) Center. Selected materials are taylored for this course and cover only some sections within the full-day, but otherwise really compressed, event schedule. Please visit ReproNim: Teach for more materials, which we will also reference within specific lessons here.

Introduction

The term “reproducibility” conjures a mental image of dedicated systems conducting automated and repeatable computations. However, you can embrace reproducibility as a principle to apply to your day-to-day research activities. Neuroimaging is a heavily data and software driven field of science. As a result, by learning more tricks and techniques of the tools that you already use daily, you will discover ways to not only improve efficiency but also increase the reproducibility of your research.

To some degree, reproducibility requires knowledge of what, when, and how any particular analysis was carried out. Therefore the lessons in this module will focus on helping to answer those questions, while going from how “black box” shell could provide you valuable record of your activities, over to use of complete computational environments where versioning and origin information about each component is either exactly prescribed or just could be identified, and then to entire (simple but complete) data analysis from raw data while maintaining a complete and unambiguous provenance of all actions and access to all components of the study (code, data, computational environments).

Schedule

10:15 Computational Basis
10:15 Shell: Getting around the “black box” Why and how does using the command line/shell efficiently increase reproducibility of neuroimaging studies?
How can we assure that our scripts do the right thing?
10:35 (Neuro)Debian/Git/GitAnnex/DataLad: Distributions and Version Control What are the best ways to obtain and track information about software, code, and data used or produced in the study?
11:15 ReproEnv: Virtual machines/Containers, Neurodocker How to encapsulate complete computational environments into redistributable/reusable containers?
11:45 Lunch What and where?
13:15 Neuroimaging Workflows Principles of re-executable processing
13:45 ReproIn/DataLad: A complete portable and reproducible fMRI study from scratch How to implement a basic neuroimaging study with complete and unambiguous provenance tracking of all actions?
14:15 Finish