How git stores your data

This overview delves into the fundamental concepts of how Git organizes and stores data, offering essential insights for those learning Git.

At the core of Git’s data storage model is the blob. Git tracks file history by storing the contents of each file as a blob, rather than just the differences between files for each change. These contents are referenced by a 40-character SHA1 hash, ensuring uniqueness. While SHA1 hashes may seem cryptic, the first 7 characters are typically sufficient for referencing.

One of Git’s advantages is its ability to store only one copy of duplicate files internally. Each blob is represented individually, forming the foundational units of Git’s data storage.

Another crucial object in Git’s model is the tree. Trees function as containers, resembling directories, and can include other blobs and trees. They organize the hierarchical structure of a repository.

Finally, commits serve as snapshots in Git. They capture the state of trees at specific points in time and include metadata such as author, date, and a message. Commits form a Directed Acyclic Graph, reflecting the linear flow of history within a repository. This flow can vary in complexity, especially in repositories with multiple branches.

Visualizing Git’s data model becomes clearer when using tools like GitX, which display commit objects and their associated data. From there, users can navigate through the commit’s tree structure.

Understanding these concepts demystifies the terminology encountered in Git commits. For further exploration, resources like Git for Computer Scientists and Scott Chacon’s talk on Getting Git offer valuable insights into Git’s inner workings.