Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
116 changes: 66 additions & 50 deletions includes/_git-basics.qmd
Original file line number Diff line number Diff line change
@@ -1,93 +1,109 @@
In our work lives, we regularly work with files, either creating,
editing, moving, copying, or deleting them. These files can be
anything from text documents, to images, to code. When we work on these
files, we often make changes to them, and sometimes many changes. We
might want to keep track of these changes, so we can see *what* we've
done, *when* we did it, *why* we did it, and *who* did it. This is both helpful for
potential collaborators and our future selves.

If a file has the ability to internally "track changes", like Word
does, you may have used that before, but likely only when getting
feedback from others. On the file level, you may have "tracked changes"
informally by saving multiple versions of a file with different names,
like in the example image below.

![File naming in the commonly used *informal* 'version
editing, moving, copying, or deleting them. These files can be anything
from text documents, to images, to code. When we work on these files, we
often make changes to them, and sometimes many changes. We might want to
keep track of how our files change over time or "save" specific versions
of the files. This tracking of file changes over time is known as
*version control*.

It can be useful to keep track of changes to files for many reasons. For
example, we might want to keep track of changes to a file so we can
revert back to a previous version of the file if we make a mistake or so
we can see how the file has changed over time. This is especially useful
when we are collaborating with others on a project, as we might want to
keep track of changes made or feedback given by different people.

Tracking file changes is, however, also useful when we are working
Comment thread
signekb marked this conversation as resolved.
mostly alone on a project, since we humans tend to forget things. This
could be why we made a certain change or what the file looked like at a
certain point in time (e.g., if we want to go back to an earlier version
of the file).

If a file has the ability to internally "track changes", like Word does,
you may have used that before, maybe when getting feedback from others.
On the file level, you may have "tracked changes" informally by saving
multiple versions of a file with different names, like in the example
image below.

![File naming in a commonly used *informal* 'version
control'.](/images/informal-version-control.jpg)

Does this way of saving files and keeping track of versions look
familiar? The above image may exaggerate how some people's versioning looks
like, but there is some truth to it: It is the most common approach to
"version control".
familiar? The above image may exaggerate what some people's versioning
looks like, but there is some truth to it: It is the most common
approach to "version control".

This "informal" version control isn't ideal because it involves multiple
copies of the same file. It makes it difficult to keep track of specific
changes and find the right version of the files.
changes and find the right version of the files. This also just
highlights the need for version control and the fact that it can be
difficult to keep track of file changes manually.

There are, however, "formal" version control systems that automatically
manage changes to files. One of the world's most popular version control
systems is called
Luckily for us, there exist "formal" version control systems that
automatically track changes to files. One of the world's most popular
version control systems is called
[Git](https://git-scm.com/book/en/v2/Getting-Started-What-is-Git%3F).
Git is used by millions of people around the world, including thousands
of organisations and researchers.

With Git you can create snapshots of file changes, known as *commits*. Each commit
captures:
With Git you can create snapshots of file changes, known as *commits*.
Each commit captures:

- What specific changes were made to the file or files.
- Who made the changes to the files.
- When they made the changes to the files.
- *What* specific changes were made to the file or files.
- *Who* made the changes to the files.
- *When* they made the changes to the files.

Each commit also has a short message attached to it that can
describe *why* the changes were made.
Each commit also has a short message attached to it that can describe
*why* the changes were made.

Git stores these commits in a history log. The history log allows you to
quickly go back and explore the changes made to files, along with a
message describing the changes. This is extremely useful when you
revisit your own work after a long time (because you *will* forget
things) and when you work in groups or with collaborators.
revisit your own work after a long time and when you work in groups or
with collaborators.

Git only tracks changes to files *within a specific folder* (and it's
sub-folders). In Git terminology, this folder is called a
**repository** (or a *repo* for short). The best way to use a repository
is to store all files related to a specific project, like a research
project, in this repository (this folder). This way, you can track all
changes made to all files in the project. It keeps things more organised and
sub-folders). In Git terminology, this folder is called a **repository**
(or a *repo* for short). The best way to use a repository is to store
all files related to a specific project, like a research project, in
this repository (this folder). This way, you can track all changes made
to all files in the project. It keeps things more organised and
self-contained, since everything related to a project is in one place.

Any type of file can be stored in the repository, including both
code and other non-code based files like Word or images. However, Git has
more features and tools for tracking specific changes when the file is
Any type of file can be stored in a repository, including both code and
other non-code based files like Word or images. However, Git has more
features and tools for tracking specific changes when the file is
text-based, like a `.txt`, `.csv`, or code. Since these text-based files
are literally only text characters, it is easier to track the changes to
exact lines of text. Unlike files like images, or Word documents (that
actually aren't just text), there are no "lines" to track changes on.

To understand how powerful formal version control like Git is, consider about these
questions:
To understand how powerful formal version control like Git is, consider
these questions:

- How many files of different versions of a scientific document or
thesis do you have laying around after getting feedback from your
supervisor or co-authors?
- Have you ever wanted to test an analysis in a
file but ended up creating a new one to avoid modifying the original?
- Have you ever wanted to test an analysis in a file but ended up
creating a new one to avoid modifying the original?
- Have you ever deleted something and wished you hadn't?
- Have you ever forgotten what you were doing on a project, or why you
chose a particular strategy or analysis?

All these problems can be fixed by using formal version control! There
are many good reasons to use version control, especially in science:

- Transparency of work done to demonstrate or substantiate your
scientific claim and protect against accusations of fraud.
- Claim to first discovery, since you have a time-stamped history of
your work.
- Evidence of contributions and work, since who does what is tracked.
- Easier collaboration, because you can work on a single file/folder
in a single central location rather than emailing file versions
around.
- Organized files and folders, since there is one single project
folder and one single version of each file, rather than multiple
versions of the same file.
- Ability to see previous versions of files using the history log.
- Less time spent on finding things related to your projects, because
everything is organized and in one place.
- Easier collaboration, because you can work on a single file/folder
in a single central location rather than emailing file versions
around.
- Transparency of work done to demonstrate or substantiate your
scientific claim and protect against accusations of fraud.
- Claim to first discovery, since you have a time-stamped history of
your work.
- Evidence of contributions and work, since who does what is tracked.
36 changes: 23 additions & 13 deletions includes/_github-basics.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,9 @@ that use Git (meaning, Git repositories). This means that your Git
repositories can be stored on GitHub, and you can manage your files and
projects using Git through GitHub's web interface.

Everything we do in this workshop (including storing and managing files)
will be done through the GitHub website, which under the hood uses Git
to track the changes.
Everything we do in this workshop (including storing and managing files
and folders) will be done through the GitHub website, which uses Git
behind the scenes to track changes.

In the simplest terms, GitHub is a company and website while Git is
software. GitHub is a website that hosts Git repositories and builds on
Expand All @@ -33,7 +33,7 @@ For instance:
- Viewing the changes you've made in files is easier and nicer on
GitHub than with Git.
- Viewing the history log of changes is easier and more pleasant on
GitHub than on Git.
GitHub than only using Git.

GitHub offers many tools to help you manage your project if its files
are stored there. For instance, GitHub has "Issues" that allow you to
Expand All @@ -47,28 +47,38 @@ benefit of being faster (you do work locally, so don't need to wait for
the internet) and more flexible (you can do more things with Git on your
computer than on GitHub). Then you can use GitHub as a place to keep
backups of your repository, to track tasks, and to make use of the other
features GitHub has. How you would use Git with GitHub would look
something like @fig-git-sync-github.
features GitHub has. How you would use Git locally with GitHub would
look something like the figure below.

```{mermaid}
%%| label: fig-git-sync-github
%%| fig-cap: "How Git and GitHub can work together by synchronising changes between GitHub and your computer."
%%| fig-alt: "A diagram showing two boxes, one of a Git repository on your computer and another of a Git repository on GitHub, along with an arrow between each box showing them synchronise between both."
graph
github(Git repository<br>on GitHub) <-- Synch --> git(Git repository<br>on your computer)
github(Git repository<br>on GitHub) <-- Sync --> git(Git repository<br>on your computer)
```

Using GitHub on its own is a great way to get started with Git. It
allows you to learn the concepts of version control and Git without
needing to install anything on your computer and without needing to
learn some of the more technical details of Git. Since GitHub is a
website it also makes it easier to share your work with others and to
collaborate with others. This is one of the main reasons why GitHub is
so popular.

::: callout-note
You may notice that GitHub sounds a bit like file synching tools such as
OneDrive or Dropbox. So how is GitHub different? Unlike OneDrive or
Dropbox, GitHub (via Git) tracks line-level changes to files, not just
file-level changes. This means you can see the specific changes made in
a file, not just that it was changed. OneDrive and Dropbox also use a
simple way of handling conflicts when synching between the cloud and
your computer by either creating a new file with some details appending
to it or by overwriting which ever is newer. GitHub, on the other hand,
uses a more complex way of handling conflicts by showing you the changes
and asking you to resolve them.
a file, not just that it was changed. The messages you attach to commits
can also help you keep track of *why* the changes were made.

OneDrive and Dropbox also use a simple way of handling conflicts when
synching between the cloud and your computer by either creating a new
file with some details appending to it or by overwriting which ever is
newer. GitHub, on the other hand, uses a more complex way of handling
conflicts by showing you the changes and asking you to resolve them.

File synching tools are really good for easily sharing files within a
team or group, but they aren't as good for collaboratively working
Expand Down
7 changes: 5 additions & 2 deletions pre-workshop/git-and-github.qmd
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
# Version control with Git and GitHub {#sec-git-and-github}

This reading is meant as a primer to the workshop. It will introduce you
to the concepts of *version control*, *Git*, and *GitHub*.
to the concepts of *version control*, *Git*, and *GitHub* which are
central concepts of the workshop and to working with files on GitHub in
general.

## What is version control and Git? {#sec-what-is-version-control}

Expand All @@ -17,7 +19,8 @@ to the concepts of *version control*, *Git*, and *GitHub*.
track of changes to your files and projects.
- A Git *repository* is a place where you store all the files for your
project along with their history.
- GitHub is a website that hosts Git repositories.
- GitHub is a website that hosts Git repositories, allowing you to
store and share your files and projects online.
- Through GitHub you can manage your files and projects using Git.

<!-- TODO: Move the definition list below to a glossary repo? -->
Expand Down
2 changes: 1 addition & 1 deletion pre-workshop/pre-survey.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ If you haven't read the sections under "Workshop overview" and
"Pre-workshop tasks", **please read them now**.

Also make sure to read the [Code of
Conduct](https://guides.rostools.org/conduct.html), since the survey
conduct](https://guides.rostools.org/conduct.html), since the survey
involves a question about it. We want to make sure this workshop is a
supportive and safe environment for learning, so this is
quite important.
Expand Down