Short and concise stories, for software engineers.

Join 1,202 other busy engineers

Stay current with a weekly email of bite sized software engineering stories.

You should not use Git as a database

You should not use Git as a database
Photo by Campaign Creators / Unsplash
This story turned out to be a bit longer than 500 words, as I had a hard time deciding which parts to leave out. Nevertheless, I hope you find it insightful.

My team inherited a system that stores its data (not code!) on a file system, constantly being saved to Git, acting as a single source of truth. Is it better or worse than a database? The short answer is, of course, “it depends on your use case”. Git has its advantages over a traditional database, but it comes at a price.

Our system’s data is a set of rules that depend upon each other. For every new rule, we use an abstraction called sessions - which are basically git worktrees, allowing users to work on different trees and create PRs from within the system.

Over the past few months, we have been trying to work our way understanding the pros and cons and thinking whether or not we should make the effort in replacing git with another storage solution, such as a traditional relational database. I believe that a database is more suited for most use cases (including ours), but we have yet to reach a decision. I will try to summarize our learnings in points:

  1. Multistate: git allows us to have multiple states, meaning that anyone can work separately on their own branch, even offline without using the system. This allows all users to be completely independent of one another, which could lead to faster deployment cycles.
  2. History: We get history for free with Git. We can use git blame and git log to find out what changed and when it changed, which helps us fix bugs faster, ask for guidance from specific committers and deal with customer support. With that being said, it's important to note that this history can change - so to blindlessly trust it with data is probably not a good idea.
  3. Scalability: Using git means using a file system as storage. That means we need to implement the file handling functionality (for each session) by ourselves. As of today, we store the directory tree in memory. That means that as long as the data grows, our file storage will need to grow accordingly. This is not a scalable solution, and even if the storage limit is not a concern - the amount of time it will take to load massive trees will cause issues. For more information about git storage requirements, refer here.
  4. Conflicts: As a direct cause of the multiple states, if we don’t model and plan our work correctly, merge conflicts will imminently happen which will slow down effectiveness within the team. On the other hand, manual conflict solving might be better than implementing conflict management logic within the system.
  5. Metadata: Storing metadata is hard. It is possible to store it inside the files and mark it as such, or as extra layers (e.g Markdown files). Both solutions are complex and hard to maintain.
  6. Performance: Using a file system is usually slower than using a database since databases optimize for you, as opposed to a file system where you would have to build these optimizations (especially if you use EFS like we do, for example).
  7. Reliability: Git is the single source of truth. What happens if the file system crashes before being pushed to git? That means all work is lost. Databases usually implement fail-safe mechanisms which prevent this from happening.
  8. Testing: Put simply, testing git commands is harder (and slower) than using database mocking solutions.
  9. Querying: Databases allow us to query their data with query languages, optimizing for performance. If we decide we desire querying capabilities in our system, we would have to implement them in the system, increasing complexity. As of today, the data is highly interconnected, but there is no way to query these relationships.
  10. Product: When the employees manipulating the data are not experienced with git, you have 2 options: educate or create abstractions over Git within the system. Both are costly, and the latter is reinventing the wheel, product-wise: you start off by building a minimal UI for committing and pushing, and next thing you know, you find yourself thinking about whether or not your users could benefit from a PR UI or even a history UI, “replacing” GitHub.