Reproducible and Publicly Visible Data Analysis

Nils Ratnaweera

2024-03-05

Today

1 Reproducibility in Data Science

I believe history will see RMarkdown as a turning point in the replication crisis.

Lack of reproducibility

is a problem

Quarto, Git & GitHub are

(part of) the solution

2 Literate Programming and Quarto

When was the last time you spent a pleasant evening in a comfortable chair, reading a good R Script?

– Adapted from John Bentley (1986) “Communications of the ACM”

content

Donald Knuth 1984

Donald Knuth 1984

I’ve stumbled across a method of composing programs that excites me very much.

Let us change our traditional attitude to the construction of programs:

Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to human beings what we want a computer to do.

Exercise: Literate Programing 🛠️

Make your R-Script literate

  1. Download the Folder GOT.zip from moodle (Game of Thrones Dataset)
  2. Unzip the File in a reasonable location
  3. Open the .RProj File with RStudio
  4. Run the RScript got-1.R and try to undestand what is happening
  5. Make the R-Script literate by explaining to human beings what we want a computer to do

3 Markdown

Why Markdown?

What is Markdown?

  • A Lightweight markup language
  • Easy to learn
  • Machine readable and human readable

Markdown Syntax

# A story of a fox

The *quick* brown fox **jumps** over the lazy dog.

![](images/fox-over-dog.jpg)

A story of a fox

The quick brown fox jumps over the lazy dog.

Markdown exercise 🛠️

  1. Grab your smartphone (if you have one)
  2. Open your favorite messenger: Threema, Telegram or WhatsApp (not Signal)
  3. Send the following text message to your favorite geek:

Threema / WhatsApp:

Did you know? Threema and WhatsApp support markdown formatting.
*Bold*, _italic_ and ~strikethrough~!

Telegram:

Did you know? Telegram supports markdown formatting.
**Bold**, *italic* and ~~strikethrough~~.

If that was Markdown, what’s Markup?

  • Markup is a generic term for a language which structures text in a machine readable way (use of tags / symbols)
  • Markdown is a markup language
  • Other markup languages include: HTML and LaTeX
  • Here is an example of making bold text
    • Markdown: **Hello World**
    • HTML: <b>Hello World</b>
    • LaTeX: \textbf{Hello World}

4 Quarto

What is Quarto?

  • Combines Markdown with R to generate Documents
  • Also works with Python, Julia…
  • Can create PDFs, Websites, Books or Slides of your data analysis
  • Is OpenSource
  • Is used via the commandline / terminal
  • Workflow:

Quarto Exercise 1 🛠️

  1. Check if quarto is installed by running the following command in your terminal:

    quarto --version
  2. install quarto if necessary (➲ quarto.org)

Quarto Exercise 2 🛠️

  1. Create a new Quarto file

    • Remove the checkbox Use visual markdown editor
    • Click on Create empty document
  2. Save as got-2.qmd

  3. Write some prose on the analysis done in got-1.R (no R code yet)

  4. Render the file to html by running the following code in your terminal

    quarto render got-2.qmd

Quarto Exercise 3 🛠️

  • Iterative development with render: 🫤

  • Iterative development with preview: 🥰

    # quarto render markdown.md
    quarto preview markdown.md

Quarto Exercise 4 🛠️

  • Markdown talks to humans
  • R Code talks to humans & the computer
  • Let’s add the R-Code from got-1.R with Ctrl+Alt+i
  • This adds a “fence” where we can add R-Code
  • Add the R-Code from got-1.R into the code block
  • Check your updated preview

YAML Header

  • YAML: YAML Ain’t a Markup Language

  • A machine- and human-readable way of storing structured data

  • An example:

    title: Analysis of GOT Series
    author: Nils Ratnaweera
  • In Quarto: Record metadata (e.g. title, author and date)

  • Is inserted at the beginning of the document and enclosed with ---

YAML Header Exercise 1 🛠️

  • Add metadata to your markdown file using YAML headers
  • Regard the output (quarto preview)
---
title: Analysis of GOT Series
author: Your Name
---

YAML Header Exercise 2 🛠️

  • Add a format specification to your YAML header: Either typst (recommended) or pdf
  • This renders your file to pdf
  • If you use pdf, you might need to install tinytext (see message in terminal)
---
title: Analysis of GOT Series
author: Nils Ratnaweera 
format: typst
---

Exercise YAML Header 3 🛠️

---
format: 
  pdf:            # pdf specific optons:
    toc: true     # - should a Table of Contents be published?
    toc-depth: 1  # - How many layers should be displayed in the TOC?
---

5 Quarto Advanced

Quarto Advanced Exercise 1 🛠️

  • Insert one of the images included in your folder into your document
  • (quarto.org → Guide → Authoring → Figures)

Quarto Advanced Exercise 2 🛠️

  • Insert an cross-reference to this image. E.G: see Figure 1
  • (quarto.org → Guide → Authoring → Scholarly Writing → Cross-References)
Figure 1: The coat of arms of House Baratheon from A Song of Ice and Fire

Quarto Advanced Exercise 3 🛠️

  • Insert a crossreference to a chapter eg see Section 5.3
  • (quarto.org → Guide → Authoring → Scholarly Writing → Cross-References)

Quarto Advanced Exercise 4 🛠

  • In Quarto, figures can consist of subfigures
  • See: quarto.org → Guide → Authoring → Figures → Subfigures
  • Creates a subfigure layout similar to the following:
(a) House Baratheon
(b) House Baylish
(c) House Arryn
(d) House Bolton
Figure 2: A collection of different coats of arms from the book ‘A song of Ice and Fire’, created by dezzzart published on deviantart.com

Quarto Advanced Exercise 5 🛠

  • Add a caption and a cross reference to your ggplot-figure

Quarto Advanced Exercise 6 🛠

  • Display a table with the top 10 characters in regard to screentime, displaying only the columns name, screentime and episodes.
  • Reference this table in your text
  • Add a caption to your table
  • quarto.org → Guide → Authoring → Tables → Cross References
  • Hint: use knitr::kable()

Quarto Advanced Exercise 7 🛠

  • Add a (dummy) abstract
  • Make sure this abstract is not numbered and not in the TOC
  • quarto.org → Guide → Documents → HTML → HTML Basics

Quarto Advanced Exercise 8 🛠

Let’s add a citation!

  1. Got to scholar.google.com
  2. Get the bibtex entry of a scientific paper
  3. Create a new text file named bibliography.bib
  4. Add the bibtex entry to this file
  5. Include the file in the YAML header: bibliography: bibliography.bib
  6. Now you can reference this entry using:
    • @knuth1984 for Knuth (1984)
    • see [@knuth1984] for see (Knuth 1984)
  • For more information see quarto.org → Guide → Authoring → Scholarly Writing → Citations
  • To include Zotero, see quarto.org → Guide → Tools → RStudio IDE → Visual Editor → Technical Writing → Citations → Citations from Zotero

6 Publishing HTML / Websites

Publishing HTML Files

  • HTML can be shared publicly by creating a website
  • Some options:
    • RPubs
    • QuartoPub
    • GitHub Pages
    • Netlify

Exercise Publish HTML with RPubs 🛠️

  • Create a new account on rpubs.com
  • Click on the Blue Publish Button on the Top right of your script
  • Choose RPubs
  • Click on Publish

7 Versioncontrol with Git and GitHub

GIT LETS YOU TELL THE STORY OF YOUR PROJECT

You use Git to take snapshots of all the files in a folder.

This folder is called a repository or repo.

When you want to take a snapshot of a file or files, you create a commit

Final.doc Final_rev.2.doc FINAL_rev.6.COMMENTS.doc FINAL_rev.8.comments6.CORRECTIONS.doc FINAL_rev.18.comments7.corrections9.MORE.30.doc FINAL_rev.22.comments49.corrections.10.#@$%WHYDIDICOMETOGRADSCHOOL????.doc
💥 commit (c84ef9) 💥 commit (12e6d8) 💥 commit (be60d0) 💥 commit (597dfe) 💥 commit (f79a85) 💥 commit (cf9253) Who comitted? When was comitted? What changed? (Why?)

Convinced?

Want to learn Git?

→ Do the optional exercises on computationalmovementanalysis.github.io

(Week 3 → Exercise A and Exercise B)