Hacker News new | ask | show | jobs
Show HN: VersionDB – A key, value store inspired by Git (github.com)
9 points by josephsweeney 3108 days ago
3 comments

Creator here, this project is still in an early state, but if anybody has any questions I'd be happy to answer.
Why choose SHA1 and not something that is collission-resistant like SHA256 or SHA3?
Mainly because SHA1 was convenient, but also Git uses SHA1. See this Linus rant:

https://marc.info/?l=git&m=148787047422954

Most of that argument applies, but if it ever becomes a problem, we should be able to move to something like SHA256 fairly easily.

git creators refuse to migrate because they selected sha1 in the start and because of backwards compatibility its harder to just change it. Also git is a situation where its harder to get a maintainer to push your binary blob. In a database, its more probable that a user includes malicious data. The hash used is not so easy to change, unless you are willing to make the change not backwards-compatible (break existing DBs)
You're definitely correct. This project is still in its early stages so no one is really using it yet, so its easy in the sense that I just have to change the hashing algorithm. No need to worry about backwards compatibility.
This sounds very interesting. Which database is it - mongoDB, mySQL,...? It is not very clear.
It actually doesn't use another database. Just uses plain old files for storage.

It takes after Git where it stores each piece of data in a file with the name of the file as the hash of the data.

So does it export the database in file and then version controls that file?
There actually isn't any database outside of a directory of files. The version control is done the same way that Git works under the hood but written from scratch in C.

Essentially, we have a database directory with two sub-directories, refs and objects. In refs we have a file for the id of each piece of data stored. The id file contains the hash of the latest commit for this id. A commit is just another file that contains a time, a hash of the previous commit, and the hash of the data.

The objects directory stores all the data and the commits, with each entry's filename being the hash of its data and its contents as the data stored.

So all we're doing is making a linked list where each entry points to a different version of that data. No external database or version control needed.