title: “Version control and reproducible research” subtitle: “PHS 7045: Advanced Programming” author: “George G. Vega Yon, Ph.D.” date: 2024-11-06 date-modified: 2024-11-07 format: html: embed-resources: true

Note

The content of this lab is based on UofU’s PHS 7045 (Advanced Programming with R and HPC) (link).

Preamble

Today’s lesson

  1. We will learn about version control and GitHub.

  2. Set up git and GitHub (make sure it works).

Part I: intro

Brief review of technologies

Throughout the course, we will be using the following tools:

  • R (duh!)
  • GitHub co-pilot: An AI-powered pair programmer (when OK; more on this later).

What is ‘version control.’

[I]s the management of changes to documents […] Changes are usually identified by a number or letter code, termed the “revision number”, “revision level”, or simply “revision”. For example, an initial set of files is “revision 1”. When the first change is made, the resulting set is “revision 2”, and so on. Each revision is associated with a timestamp and the person making the change. Revisions can be compared, restored, and with some types of files, merged. – Wiki

Why do we care

Have you ever:

  • Made a change to code, realised it was a mistake and wanted to revert back?
  • Lost code or had a backup that was too old?
  • Had to maintain multiple versions of a product?
  • Wanted to see the difference between two (or more) versions of your code?
  • Wanted to prove that a particular change broke or fixed a piece of code?
  • Wanted to review the history of some code?
  • Wanted to submit a change to someone else’s code?
  • Wanted to share your code, or let other people work on your code?
  • Wanted to see how much work is being done, where, when, and by whom?
  • Wanted to experiment with a new feature without interfering with working code?

In these cases, and no doubt others, a version control system should make your life easier.

Stackoverflow (by si618)

Git: The stupid content tracker

During this class (and perhaps, the entire program,) we will be using

  • A great reference about the tool can be found here
  • More on what’s stupid about Git here.

How can I use Git

There are several ways to include Git in your work pipeline. A few are:

  • Through the command line

  • Through one of the available Git GUIs:

More alternatives here.

A Common workflow

Git workflow

Git has a ton of features, but the daily workflow only features a handful of commands: git pull, git add, git commit, and git push:

A Common workflow

  1. Start the session by pulling (possible) updates: git pull
  1. Make changes

    1. (optional) Add untracked/new files: git add [target file]

    2. (optional) Stage modified files: git add [target file]

    3. (optional) Revert changes: git checkout [target file]

  1. Move changes to the staging area (optional): git add
  1. Commit:

    1. If nothing pending: git commit -m "Your comments go here."

    2. If modifications are not staged: git commit -a -m "Your comments go here."

  1. Upload the commit to the remote repo: git push.

Part 2: Hands-on local git repo

Hands-on 0: Introduce yourself

Set up your git install with git config, start by telling who you are

$ git config --global user.name "Juan Perez"
$ git config --global user.email "jperez@treschanchitos.edu"

Try it yourself (5 minutes) (more on how to configure git here)

Hands-on 1: Local repository

We will start by working on our very first project. To do so, you are required to start using Git and GitHub so you can share your code with your team. For now, you can keep things local and skip Github. For this exercise, you need to

  1. Create a new folder with the name of your project (you can try foresite-project)
  1. Initialize git with git init command.
  1. Create a README.md file and write a brief description of your project.
  1. Add the file to the tree using the git add command, and check the status.
  1. Make the first commit using the git commit command adding a message, e.g.

    $ git commit -m "My first commit ever!"

    And use git log to see the history.

Note 1: We are assuming that you already installed git in your system.

Note 2: Need a text editor? Check out this website link.

Hands-on 1: Local repository (solution)

The following code is fully executable (copy-pastable)

# (a) Creating the folder for the project (and getting in there)
mkdir ~/foresite-project
cd ~/foresite-project

# (b) Initializing git, creating a file, and adding the file
git init

# (c) Creating the Readme file
echo An empty line > README.md

# (d) Adding the file to the tree
git add README.md
git status

# (e) Committing and check out the history
git commit -m "My first commit ever!"
git log

Hands-on 1: Local repository

Ups! It seems that I added the wrong file to the tree, you can remove files from the tree using git rm --cached, for example, imagine that you added the file class-notes.docx (which you are not supposed to track), then you can remove it using

$ git rm --cached class-notes.docx

This will remove the file from the tree but not from your computer. You can go further and ask git to avoid adding Docx files using the .gitignore file

Part 3: Hands-on cloud

Hands-on 2: Remote repository

Now that you have something to share, your teammates are asking you to share the code with them. Since you are smart, you know you can do this using something like Gitlab or Github. So you now need to:

  1. Create an online repository (empty) for your project using Github.

  2. Add the remote using git remote add, in particular

$ git remote add origin https://github.com/[your user name]/foresite-project.git

Then, use the commands git status and git remote -v to see what’s going on.

  1. Push the changes to the remote using git push like this:
$ git push -u origin master

You should also check the status of the project using git status to see what Git tells you about it. Origin is the tag associated with the remote repo setup, while ‘master’ is the tag associated with the current branch of your repo.

Recommended: Complete GitHub’s Training team “Uploading your project to GitHub”

Hands-on 2: Remote repository (solutions a)

New GitHub repo 1

New GitHub repo 2

Hands-on 2: Remote repository (solutions b)

For part (b), there are a couple of solutions, first, you could try using your ssh-key (if you set it up)

# (b)
git remote add origin git@github.com:gvegayon/foresite-project.git
git remote -v
git status

Otherwise, you can use the simple URL (this will prompt user+password) every time you want to push (and pull, if private).

# (b)
git remote add origin https://github.com/gvegayon/foresite-project.git
git remote -v
git status

Hands-on 2: Remote repository (solutions c)

For the first git push, you need to specify the source (master) and target (origin) and set the upstream (the -u option):

# (c)
git push -u origin master
git status

The --set-upstream, which was invoked with -u, will set the tracking reference for pull and push.

Example for .gitignore

Example extracted directly from Pro-Git (link).

# ignore all .a files
*.a

# but do track lib.a, even though you're ignoring .a files above
!lib.a

# only ignore the TODO file in the current directory, not subdir/TODO
/TODO

# ignore all files in any directory named build
build/

# ignore doc/notes.txt, but not doc/server/arch.txt
doc/*.txt

# ignore all .pdf files in the doc/ directory and any of its subdirectories
doc/**/*.pdf

Resources

  • Git’s everyday commands, type man giteveryday in your terminal/command line. and the very nice cheatsheet.

  • My personal choice for nightstand book: The Pro-git book (free online) (link)

  • Github’s website of resources (link)

  • The “Happy Git with R” book (link)

  • Roger Peng’s Mastering Software Development Book Section 3.9 Version control and Github (link)

  • Git exercises by Wojciech Frącz and Jacek Dajda (link)

  • Checkout GitHub’s Training YouTube Channel (link)

ForeSITE