Overview

Teaching: 5 min
Exercises: 0 min
Questions
  • Why should I care ?

  • Who is this module for ?

  • How should I study it ?

Objectives
  • To think about reproducibility as an inherent part of research activities.

  • To use lesson materials as soon as possible.

Introduction

The term “reproducibility” conjures a mental image of dedicated systems conducting automated and repeatable computations. However, you can embrace reproducibility as a principle to apply to your day-to-day research activities. Neuroimaging is a heavily data- and software- driven field of science. As a result, by learning more tricks and techniques for the tools that you already use daily, you will discover ways to not only improve your efficiency but also to increase the reproducibility of your research.

To some degree, reproducibility requires knowledge of what, when, and how any particular analysis was carried out. Therefore the lessons in this module will focus on helping answer those questions. Before addressing these specific questions, referenced external materials (tutorials, lessons, etc) will provide a more generic and thorough presentation of the topics.

Who is this module for?

The module is for any scientist, researcher, or a student who is using software for data analysis, writing custom code or editing documents.

Prerequisites

Depending on your level of competence in any particular topic, you might like to go through additional materials that will be referenced in each particular lesson. Even if you feel that you are very proficient in all of those topics, we hope that you can still learn some new “tricks” or are willing to recommend or contribute new materials to the lessons.

How much time should this take?

That primarily depends on your familiarity and experience with working in command line/shell, using version control systems, managing software environments, having experience providing constructive feedback about defects you found in software you use, etc. All those topics seem independent but are also very much related, so it is likely that you might have some familiarity with all of them, or that you know just one of them well. If you have no experience with any topic, this may take you a long time: for instance, 5 to 7 full days. If you are experienced, some of the information may be redundant and it may take you only a few hours to go through this material in detail. In each lesson, we provide an estimate for the time it would take to learn the lesson, assuming you have a basic understanding of the topic.

How should the acquired knowledge be used?

It is important to apply the knowledge gained from the lessons to your day-to-day activities as soon as possible! Start using shell and/or do it more efficiently by using shortcuts, scripting, making those scripts robust, etc. Use version control systems for anything you change (code, data, documents) and increase “traceable” collaboration exchange, even if it’s just between you on computer 1 and you on computer 2. Get curious and check what you are using; start provisioning your own computation environments. Report problems you run into and don’t just leave them unresolved.

An efficient approach to learning the materials is to first skim through all the materials, noting the key concepts and applying them right away. Then, after gaining experience and stumbling upon some problems, it will be useful to review the relevant topic in greater detail while concentrating on the relevant aspects.

The least efficient approach would be to spend a week “learning it” only to forget all of it by not using any of the learned tools or recommended practices.

What are the lessons in this module?

This module guides through three somewhat independent topics, which are at the heart of establishing and efficiently using common generic resources: command line shell, version control systems (for code and data), distribution package managers, and a few additional aspects such as bug reporting and licensing. It is very unlikely that you have managed to completely avoid using those tools in past research activities, but it is possible that you have under-utilized their capabilities. Gaining additional skills in any of these topics can not only help your day-to-day research activities become more efficient, but also lay the foundation for establishing habits that will make your work more reproducible. Moreover, these topics are the foundation of future modules in the ReproNim curriculum.

Key Points