Hacker News new | ask | show | jobs
Ash HN: What are some good resources on building a relational database?
19 points by ashwin110 702 days ago
I was hoping to build a simple relational database as a side project, focussing mainly on learning how the internals and the algorithms used work, as I never plan to make this a published product of any sort.

So far I have the CMU Advanced DB course (https://15721.courses.cs.cmu.edu/spring2024/) and the Database Internals book.

While I'm learning a lot about how databases work, I have no clue how to start writing my own, so I was wondering if there were any resources for building a relational database, I've only found some for KV Stores. Hopefully something less intimidating to get started than having to read SQLite code.

9 comments

Comes up a fair bit in Ask HN - https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...

You can widen the search a bit by taking out 'implementation' and trying some other terms like 'book', 'internals', etc.

This is perfect, I should've known this would be a fairly discussed topic here. Thanks!
You may check https://cstack.github.io/db_tutorial which teaches writing an SQLite compatible database from scratch in C.

I know you mentioned about RDBMS, but may I introduce you to a structured path for building a KV Store, which can be a foundation for a RDBMS? My project is in TDD fashion with the tests. So, you start with simple functions, pass the tests, and the difficulty level goes up. When all the tests pass, you will have written a persistent key-value store.

https://github.com/avinassh/py-caskdb

This is amazing, kudos on all the great work on this project! This definitely seems like an interesting place to look for ideas and learn!
https://www.youtube.com/@CMUDatabaseGroup

They publish their latest course videos every year, during the year. Andy Pavlo is highly knowledgeable about the field.

"This course is a comprehensive study of the internals of modern database management systems. It will cover the core concepts and fundamentals of the components that are used in large-scale analytical systems (OLAP). The class will stress both efficiency and correctness of the implementation of these ideas. The course is appropriate for graduate students in software systems and for advanced undergraduates with dirty systems programming skills. "

That class drops a few buzz words in its advert: OLAP and dirty!

What sort of "simple" RDBMS are you envisioning that is different from the current lot?

Nothing different if I'm being honest, I'm just looking to learn and was having some starting issues so I was looking for guidance on that. Realistically speaking, if I can build out a storage engine and execution planner that isn't terrible, I'll call it a success as I would've learnt what I'm looking for
That's not the only the class drops (the course has its own DJ).
I'd have a look at DuckDB as well, looks like they're doing a great job with their useful, practical and successful innovation and a ton of interesting differentiating design decisions; I hear that's on top of SQLite, is that right? They must have a fair amount of code of their own regardless.

Then there's also some projects who have tried to port or re-create SQLite in Rust.

You want our Intro DB Systems course not the Advanced one:

https://15445.courses.cs.cmu.edu

Lectures start next month. Or you can watch previous years. Learn to walk before you run.

I was going to suggest the SQLite source code.

One could probably go quite a ways in bare python with lists of dataclasses and pickles, never mind the performance.

That's your backend.

Then you might find some prior art in the way of a SQL parser for a front end.

Back in the day the hardcore passionate lovers used to recommend CJ Date.
This looks great, but 90% of the actual content is behind a $40 subscription though. Maybe an option if I don't find anything else, thank you!