Git Internals
Git might seem like some hidden computer magic if you don’t understand how it works inside. Let’s break it down in simpler terms for a naive idea on it:
.git folder
Where does Git store all of its histories? It would be really hard to guess if your hidden folders are hidden in the IDE you use.
Everytime you initialize a repo by “git init” command, a hidden folder “.git” is created within your repo, this is the warehouse for git to store everything about your project.
Initially it would look something like this,
1$ ls -1a .git
2./
3../
4config
5description
6FETCH_HEAD
7HEAD
8hooks/
9info/
10objects/
11refs/
We don’t really know what these directories and files are, let’s have a look and how they allow git to track all the modifications so smoothly.
Config
It contains configurations specific to you current repo and overrides system Git configurations.
Description
Short text about your project.
Head
It is a pointer to the branch you’re working on right now.
Hooks
You can write bash scripts here that run after a Git event is triggered. Some of these events are post update, pre commit, pre rebase and more.
Info
It is used to tell Git to ignore some files locally that you don’t wish to specify in .gitignore file.
Objects
It’s the directory that stores all Git objects with subdirectories and files named after object’s commit hashes. We’ll talk more about Git objects further on.
Refs
Contains references to commit hashes of heads and tags of all the branches.
Logs
As we initialized a Git repo just now and haven’t done anything it’s not shown above but it appears as soon you make a commit and tracks all movements of head with their commit hashes.
Index
It contains the data of tree after changes have been staged. We’ll know about trees just beneath.
Git Objects
Blobs
Blobs are the content you have in your file. It’s hash is calculated based on the content.
Trees
Trees are the directories of your project, a tree can contain sub trees within itself with blobs. It contains hashes of every object present in it and yes, tree has a hash of itself as well,
Commits
Commit objects are references to the tree of current snapshot of your repo. Commit’s hash contains more information like author’s info and commit message.
You can’t open Git objects normally but there is a command “git cat-file”, it allows you to lookup within the object and check it’s content.
Git Hashing
Git uses SHA-1 hashing to generate a unique hash for every Git object, these hashes are generated based on the content of the object. They allow Git to track and distinguish between different objects.
Workflow on how a commit is saved
1git add
2Local → Blob → Index
3
4git commit
5Index → Tree → Commit → HEAD
Staging Changes
-
Git reads the file content and saves blobs in the objects directory.
-
The index is updated with current tree of the repo.
Committing Changes
-
A Commit object is created pointing to the tree, with author’s info, commit message, and timestamp.
-
Head is pointed to the new commit and the changes are saved.