Git From Scratch


This is an introduction to git. I designed this first for non-coders, and have tried to make it useful for anyone who wants to learn more fundamentals of git. We’ll do everything locally, so feel free to follow along.

Hopefully this introduction will show you how git can be used, and how it fits into things like GitHub.

I will use the command line for the most part, but don’t worry if you don’t know how to use the terminal, it’s not about remembering the commands, it’s about understanding git. You can also use git through a user interface, such as GitHub Desktop, or some plugin for your text editor (VSCode comes with git integration).

Git Fundamentals

You may see git described as a “SCM”, or Source Code Management, tool. At its core, git is a tool for keeping track of changes in files. Most often these files are text, and quite commonly this text exists as “source code”. Git can also handle any file type, including images or binary downloads, but by far the most popular use is managing code.

Because we can use git with just text files, I’ll use English for the examples. You’ll see that git doesn’t care about specific programming languages. All it cares about is text, and changes!

Creating a git repository

You’ll need to have git installed. If you use macOS, you’ll most likely already have git, although it probably is an old version. Linux distributions might also ship with git.

To know if you have git installed you can use this command in a terminal:

$ git --version

It should reply something like: git version 2.39.3 (Apple Git-146).

If you see an error, you don’t have git installed. To check for the latest version, install it, or update it, you can head to git’s official website.

And here are some commands you can use to install or update git from the command line:

$ brew install git # macOS
$ sudo apt-get install git # Ubuntu
$ winget install --id Git.Git -e --source winget # Windows

Creating a Git Repository

With git already installed, we can now create a git repository. The first step is to create a directory somewhere in our computer so we can play around with git. You can create a new directory in any way you want. Using the command line, you could do it with the mkdir command:

$ mkdir git-training

Now we go into that directory and create a new repository:

$ cd git-training
$ git init

With the command line interface, we use git’s init command to create a new repository in the current directory. The init command will create a .git directory. It’s hidden by default, so most likely it won’t show up in your file explorer or if you use the ls command. But we can see it as such:

$ ls -a

You’ll see we have a .git directory:

.         ..        .git

A repository is just that, a directory with a .git folder inside. All the information about the different versions, and everything else git needs to work with, is stored inside that .git folder.

If you delete that folder, then the git repository will also be removed. For example, now, you can do:

$ git status

And you’ll see something like:

On branch main

No commits yet

nothing to commit (create/copy files and use "git add" to track)

Now if we delete the .git folder:

$ rm -rf .git

And try again:

$ git status

We get:

fatal: not a git repository (or any of the parent directories): .git

When you run a git command, git looks for a .git directory in the current directory. If it can’t find one, it tries the parent directory recursively until it can find one. If no .git directory is found, it complains as such.

We can re-create the repository with:

$ git init

Tracking files

By default, git won’t keep track of anything unless you tell it to. To tell git to track a file, you have to add it to the repository. That will make git track the file. When a tracked file changes, git will notice and you can
see what changed, add the changes, and commit (save) them.

Let’s start by creating a new file to test with. We’ll create a new file called haiku.txt, with the following content:

An old silent pond

If we run the status command to see what’s changed:

$ git status

We see this:

On branch main

No commits yet

Untracked files:
  (use "git add <file>..." to include in what will be committed)
    haiku.txt

nothing added to commit but untracked files present (use "git add" to track)

The message is pretty helpful. We are not tracking anything yet, but git did notice we have an untracked file, and suggests to add it using git add. We’ll do just that:

$ git add haiku.txt

Most of the time you’ll want to add all the changes you made to a commit, and if you changed multiple files, it can be annoying to type all of those file names on by one. You can then just do:

$ git add .

To add all files in the current directory (including subdirectories). If we run git status now, we get a different output:

On branch main

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)
    new file:   haiku.txt

Below “Changes to be commited”, we see haiku.txt recognized as a new file. We can now commit these changes with the commit command:

$ git commit -m "Initial commit"

Now if we run git status, we get:

On branch main
nothing to commit, working tree clean

Meaning there are no new changes since our last commit, so everything looks good!

Configuring your git user

Because all commits have authors, if you haven’t configured your author information now, git won’t be able to create a commit and fail instead.

You’ll see an error looking something like this:

fatal: could not read username from git config
fatal: could not read email address from git config
Your name and email address are not set up.
You can set them using:

    git config --global user.name "Your Name"
    git config --global user.email "your_email@example.com"

Please set up the user by running the above commands.
Aborting commit due to missing author information.

You can follow those instructions to configure your username and email and then try doing the commit again.

Making changes to a tracked file

Now that we have a tracked file, let’s make changes to it. Let’s change the text as such:

An old silent pond
A frog jumps into the pond -

Once we save those changes into our haiku.txt file, git status will tell us the following:

On branch main
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
    modified:   haiku.txt

It noticed we modified haiku.txt but it says “Changes not staged for commit”. We need to tell git that we want to add those changes to the staging area.

In git, the working directory is your actual folder in your computer, the one with a .git directory inside, we called it git-training in this case.

The staging area, on the other hand, is a temporary zone where you choose which specific changes from your working directory will be included in your next commit.

It’s common to not include all the changes present in your working directory in a single commit. Maybe you changed file1.txt and file2.txt and want to track the changes in two separate commits instead of one.

We can also use the diff command to see what’s changed since the last commit:

diff --git a/haiku.txt b/haiku.txt
index 73efcca..ad5f3cd 100644
--- a/haiku.txt
+++ b/haiku.txt
@@ -1 +1,2 @@
 An old silent pond
+A frog jumps into the pond -

The output is a bit terse but a - before a line means that line was deleted, and a + means that line was added. In this case, we added a line at the end of the file. When in diff mode, you can press q to exit back to your terminal.

It’s a good idea to run the git status and git diff commands often, as they will give you a glimpse of the repository and they are safe, non-destructive actions.

To include the changes we did to a file to the staging area, so they will be present in our next commit, we can use the add command:

$ git add .

Now if we run the status command:

$ git status

We get:

On branch main
Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
    modified:   haiku.txt

If we wanted to unstage the changes (remove them from the staging area), we could do git restore --staged haiku.txt. But we’re not going to do it in this case 🙂

We commit the changes with the commit command:

$ git commit -m "Added second line"

Notice the -m option. That is the message we’ll use for the commit. It should be a short message, a single line describing the change.

You can use multiple lines if you want, but the -m option is meant to be for short messages. If you don’t pass the -m option, git will open your default text editor instead (which would be vim or nano most of the time), and wait until you save the file with a commit message. This way you can write multiple lines rather than just one.

Inspecting the history

We have two commits so far, and we can already take a look at our version history with the log command:

$ git log

This is what it looks to me:

commit 8dfa4b45a0b1413578287543d16e8c937ed93af9 (HEAD -> main)
Author: Federico Ramirez <federico_r@beezwax.net>
Date:   Wed Apr 24 14:14:36 2024 -0700

    Added second line

commit b07ea31670a86b892e895e1ec44d5c1a627523d7
Author: Federico Ramirez <federico_r@beezwax.net>
Date:   Wed Apr 24 14:02:19 2024 -0700

    Initial commit

It displays a list of all the commits, with the newest commits on top.

Each commit has a hash, which is a unique string representing that particular commit, and it looks like this: 8dfa4b45a0b1413578287543d16e8c937ed93af9.

The hash is very important and we can use it to reference a particular commit any time we need to. The log command also gives you the author, date, and commit message.

We can see more details about a commit with the show command. If you are repeating the steps on your computer, you’ll have different commit hashes than I do. Try using the show command to see the detailed information for your latest commit. In my case, this is the command I need to run:

$ git show 8dfa4b45a0b1413578287543d16e8c937ed93af9

This is what I see:

commit 8dfa4b45a0b1413578287543d16e8c937ed93af9 (HEAD -> main)
Author: Federico Ramirez <federico_r@beezwax.net>
Date:   Wed Apr 24 14:14:36 2024 -0700

    Added second line

diff --git a/haiku.txt b/haiku.txt
index 73efcca..ad5f3cd 100644
--- a/haiku.txt
+++ b/haiku.txt
@@ -1 +1,2 @@
 An old silent pond
+A frog jumps into the pond -

So we get the same information we got in the log command, as well as the diff for that commit.

Navigating the history

Besides just inspecting tracked files, we can actually make them look exactly as they did at any point in history. Effectively restoring old versions in case something breaks.

To understand how navigation works, let’s first talk about git’s concept of HEAD a bit. This is an oversimplification, but you can think of the HEAD as a reference that holds the branch or commit that you are currently looking at. It’s like a “YOU ARE HERE” arrow.

Git uses the HEAD to know things such as what commit to start with when you use the log, or what the parent of a new commit will be when you do use the commit command.

To illustrate this, consider our commit history so far:

Most of the time, HEAD will be pointing to the latest commit of your current branch. I simplified the commit hashes to just A and B to make it more clear.

To navigate in history, we have to move the HEAD to the commit (or branch) we want to navigate to. In this case, we want to move it to A. We can do so with the checkout command:

$ git checkout b07ea31670a86b892e895e1ec44d5c1a627523d7

You can assume b07ea31670a86b892e895e1ec44d5c1a627523d7 represents A.

You’ll see output like this:

Note: switching to 'b07ea31670a86b892e895e1ec44d5c1a627523d7'.

You are in 'detached HEAD' state. You can look around, make experimental changes and commit them, and you can discard any commits you make in this state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may do so (now or later) by using -c with the switch command. Example:

  git switch -c <new-branch-name>

Or undo this operation with:

  git switch -

Turn off this advice by setting config variable advice.detachedHead to false

HEAD is now at b07ea31 Initial commit

Don’t worry about the detached HEAD stuff for now, we’ll come back to it when we talk about branches. For now, just know that it’s fine, nothing is broken, this is the way it’s supposed to work. Git is not famous for its UX!

When we navigate history using checkout, git will sync the tracked files in our current directory with the version of the files in that commit.

Let’s take a look at our haiku.txt file to see if it really worked:

An old silent pond

Indeed, it removed the second line, so it’s now as it was in the previous version. This is incredibly useful, as when things break, you can always go back in history until you find a version that works!

That is why I like to think of commits as “checkpoints”. You should try to always leave them in a working state, and relatively small.

To go back to our latest commit, let’s do:

$ git checkout main

And we are back in the latest commit of the main branch, which is the default branch git creates for us.

Branching

You can think of a branch as a sandbox you can use to create new commits in, do some work, and then merge the changes back into the base branch. This is particularly useful so that you…

  • don’t risk breaking the main branch while developing
  • can collaborate with other people, with multiple people working with the
    files at the same time without breaking things

Let’s start with creating a new branch and call it my-new-branch. By convention, branch names use kebab-case formatting.

$ git checkout -b my-new-branch

When you pass the -b option to checkout, you can give it the name of a new branch and it will create that branch for you based on your current HEAD, and switch to it by pointing the HEAD to this new branch.

If we do git status now, we’ll get:

On branch my-new-branch
nothing to commit, working tree clean

We are still looking at the latest commit, but instead of being in main, we are now in my-new-branch. This is our sandbox and here, we can create new commits, make all the changes we need, and we could even discard this branch and nothing will ever happen to main.

Let’s add a new line to haiku.txt:

An old silent pond
A frog jumps into the pond -
The sound of water.

The process for creating commits is the same we’ve done so far:

$ git add .

Before running a commit, it’s always a good idea to check the status:

$ git status

We get:

On branch my-new-branch
Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
    modified:   haiku.txt

That looks good. But what if we want to see what will get added to this commit? The diff command only tells us the changes between our working directory and the previous commit, but once we add a change to the staging area, it’s no longer showing in the diff:

$ git diff

Is empty. For this, we can use the --cached flag:

$ git diff --cached

Now we see it:

diff --git a/haiku.txt b/haiku.txt
index ad5f3cd..f1ea5b9 100644
--- a/haiku.txt
+++ b/haiku.txt
@@ -1,2 +1,3 @@
 An old silent pond
 A frog jumps into the pond -
+The sound of water.

Great! We now can commit with:

$ git commit -m "Add third line"

And running git status now reports everything is fine:

On branch my-new-branch
nothing to commit, working tree clean

By now, our history looks like this:

Merging

Once we finished our work, we want to merge those changes back into the main branch.

When merging, git takes the latest commit of your current branch (my-new-branch) and the latest commit of the branch you want to merge into (main) and creates a new commit, reconciling the changes automatically.

We can merge with the merge command. That command will merge whatever branch you tell it to into your current branch.

So, if we want to merge my-new-branch into main, we have to do so from main:

$ git checkout main # make sure we are in the main branch
$ git merge my-new-branch # merge `my-new-branch` in

When merging two commits, git will normally create a new one, making our commit history look like this:

When that’s the case, it will open your default text editor and wait for a commit message. Most of the time the default message is fine, and you can just save the file and exit your editor.

If you want to change the default editor, you can do so with this command:

$ git config --global core.editor "your_editor"

If you use VSCode, you can use "code" there.

In this case, though, because git noticed B is the parent of C, it’s smart enough to simply move the HEAD accordingly, and avoid creating a new commit:

That’s called “fast forward” strategy in git-land.

Fixing Merge Conflicts

Sometimes, when two branches change the same line in the same file, git has no way to know which one should be used when merging.

It needs you to tell it what line to choose, and will refuse to merge until you fix all the conflicts.

Let’s create a conflict so we can see how this looks like and how to deal with them. We’ll create two branches: alice and bob, representing two people simultaneously changing the same file.

$ git checkout -b alice

Remember we use the -b option to create a new branch and navigate into it all at once.

Let’s start with Alice’s changes. She updates the first line of haiku.txt, changing An old silent pond for A new silent pond.

A new silent pond
A frog jumps into the pond -
The sound of water.

If we run git diff after saving the file, we can see what changed:

diff --git a/haiku.txt b/haiku.txt
index f1ea5b9..e611674 100644
--- a/haiku.txt
+++ b/haiku.txt
@@ -1,3 +1,3 @@
-An old silent pond
+A new silent pond
 A frog jumps into the pond -
 The sound of water.

You see that, even though we only changed part of a line, as long as the line is different, git considers the whole line to have changed. So it thinks we removed the whole line, and added a new one in its place.

Let’s now save those changes by creating a new commit:

$ git add .
$ git commit -m "Alice's changes"

Now let’s make the changes for Bob. If we were to create a new branch at this time, the new branch would be created from the alice branch, which is not what we want.

So to create Bob’s branch, we have to go back to main, and checkout from there:

$ git checkout main
$ git checkout -b bob

Now let’s say Bob also changes the first line:

An old loud pond
A frog jumps into the pond -
The sound of water.

And he also commits his changes:

$ git add .
$ git commit -m "Bob's changes"

Now our git history looks like this:

The head is at Bob’s branch currently, we can check by using status:

On branch bob
nothing to commit, working tree clean

Now, this is how the conflict will happen:

  1. Alice merges her changes first
  2. Because it has no conflicts with main, the merge will just work
  3. Now Bob wants to merge his work into main
  4. Because Alice’s changes are in main, and they both changed the same line,
    there will be merge conflicts

Let’s see how this works in practice. We start by merging Alice’s branch first into main:

$ git checkout main
$ git merge alice

We get:

Updating b50f6c0..92873f6
Fast-forward
 haiku.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

It worked! Now let’s bring in Bob’s changes:

$ git merge bob

Oops, we get:

Auto-merging haiku.txt
CONFLICT (content): Merge conflict in haiku.txt
Automatic merge failed; fix conflicts and then commit the result.

For each file with conflicts, we’ll get a CONFLICT line in the output of the merge command. In this case, it’s only haiku.txt. If we open the file, we’ll see it looks like this:

<<<<<<< HEAD
An new silent pond
=======
An old loud pond
>>>>>>> bob
A frog jumps into the pond -
The sound of water.

What’s with that weird stuff? Well it’s git’s way of telling you that everything below <<<<<<< HEAD up to ======= is your current HEAD (main in this case). Below that, and up to >>>>>>> bob are the changes in the other branch (bob).

This makes sense, because both Alice and Bob changed the first line. Alice’s changes were “A new silent pond”, while Bob’s changes were “An old loud pond”.

We can fix the conflict by choosing either one version or the other. We could also discard both versions and create a new one, which is what we’ll do:

An old boring pond
A frog jumps into the pond -
The sound of water.

Notice we removed all of the weird markers git added for us. We can now add those changes to continue the merge process:

$ git add .
$ git merge --continue

Because git can’t use the fast-forward strategy now, we need to give this new merge commit a message. The default one is fine, so you can just save the file git will show you.

If we run git log now, we’ll see our new merge commit:

commit d1d2e2e9364427dd99a2758113364f5482893129 (HEAD -> main)
Merge: 92873f6 ee57f11
Author: Federico Ramirez <federico_r@beezwax.net>
Date:   Thu Apr 25 12:15:01 2024 -0700

    Merge branch 'bob'

And this is our new history:

Aborting a merge

In case you want to abort the merge for whatever reason, you can always do so safely with:

$ git merge --abort

Conclusion

That was a lot! We’ve only scratched the surface of git. But, still, we covered some sophisticated use cases for git, such as what branches and merges are, why merge conflicts happen and how to fix them, how to navigate our project’s history, etc.

My plan for a follow-on post is covering how to use GitHub as our remote repository rather than doing everything locally. It’s important to note that one big advantage of git is that it runs locally and doesn’t need a remote server for you to be productive writing code! Everyone has a fully working copy of the repository, so you could use git without a remote if your use case is simple.

For most real world projects though, you’ll want to collaborate with others, and GitHub does just that: It makes it easy for you to collaborate with other people using git, and it gives you lots of extra tools such as a way to track and manage issues, review merges before they change, see differences between commits and branches, and even run actions on your code such as running a test suite every time a change to the code is pushed to the remote.

Want updates on our latest blog posts?
Subscribe to our newsletter!

Previous Post
A Couple Fav Innovations from Tableau Conference 2024
Next Post
OData – Use Cases Compared with FileMaker Data API