Hacker News new | ask | show | jobs
by titovodka 1713 days ago
Im 2 years into my career. Is there a benefit in reading a database book cover to cover? Book in question is Database Systems by Garcia-Molina.

I’ve found the material so far interesting. But I’m not sure how much value and how much information there is for me to possibly retain.

10 comments

I often see people on HN recommend the book, Designing Data-Intensive Applications (2017). I've personally been chewing on the material for a while now, gaining new insights.

Here's the table of contents: https://www.oreilly.com/library/view/designing-data-intensiv...

It seems to cover roughly the same areas and range as the book you mentioned, Database Systems: The Complete Book (2008). http://infolab.stanford.edu/~ullman/dscb.html

Yes, yes, yes, yes, yes! DDIA is amazing. It is _the_ premier reference manual, as far as my brain and I are concerned. It's the only "computer book" I keep next to my computer. (Not that I reference it frequently enough to justify its proximity; it's mostly so I can tell people "DDIA is the only 'computer book' I keep next to my computer." But still... it's great.)
Turns out I am a fool who let his enthusiasm get in the way of being right. I have DDIA on my Kindle. Next to my computer is a book that obviously makes much more sense to be next to my computer, Systems Performance: Enterprise and the Cloud (SPEC) by Brendan Gregg.

Still, DDIA is an incredible resource. But the Kindle is actually better for my usage pattern. I wish I had SPEC on Kindle.

My advice: Yes. Get a good book and a interactive SQL editor. Learn the concepts and try them out. Play, ask questions and try to answer them. Skim over stuff that doesn't interest you at the moment. Write stuff down.

Then after a while you'll encounter new concepts (a new book, an online comment, an interesting tech talk etc.), have new ideas or forget/get rusty on old ones, then you repeat the above.

Continuous learning is part of the job IMO. I did the above several times and will do them several times again in some fashion or another.

For me it has had real value - repeatedly - that I didn't even know existed before I accidentally came across a situation where I could apply some concept I learned this way.

Sure, the first time you apply something "in anger", you will get to the nitty gritty details, which is work but you cannot even recognize an opportunity if you haven't prepared your mind for it and played around with ideas and concepts before - except you have a significant time budget for this kind of thing on the project, or a colleague who can introduce it, but my recommendation is to not rely on that.

You don't need to master it, but it serves as a map in your head. It will bring you in, in the worst case, an intuition on how things work. The "retain" part isn't an essential thing if you have at least the hints. Even if they are few, they are far more helpful than not having a clue at all.

I always recall two fundamental programming books and one DB design book I read years ago. And still, for today, my mind comes back to them, and it makes me faster when looking for documentation or just googling things more properly :)

Knowing a complex subject, even not deeply, is always a benefit.

you and i see i to i on that <g>
I know two DB bibles which explain how a DB engine actually works. It's much more interesting reading than another SQL manual. But it's not awfully practical if you have no particular interest in DB technology. And both are pretty dense. * https://www.amazon.com/Database-Systems-Complete-Book-2nd/dp * https://www.amazon.com/Database-System-Concepts-Abraham-Silb...

There's a very cool not-quite-alternativE: https://leanpub.com/how-query-engines-work. It covers a fair chunk of DB technology but not storage. Definitely check out their repository at https://github.com/andygrove/how-query-engines-work/tree/mai... .

A companion to DDIA would be https://www.amazon.com/Database-Internals-Deep-Distributed-S... (especially its treatment of LSM trees which is harder to come by).

I would suggest something more directly applicable. Textbooks are useful, but if you're looking for something to get the most out of today:

SQL Performance Explained

Art of SQL

Database Design for Mere Mortals

Huge +1 to Database Design for Mere Mortals. This was the second database book I read (the first was SQL Antipatterns, not a bad book but also not a good first book) and far and away the most helpful. It took me 3 days to read, I had a miserable flu after flying home from a vacation (pre-COVID), and I was still able to follow the author without any issue. I highly suggest a physical copy & writing in the margins, underlining etc, though that's something I do in general so YMMV if you don't get anything out of that.

As far as "technical" books go, it's very nontechnical, and if you're already very experienced & good at normalizing it maybe won't help you THAT much, but for someone who's at all junior, I can't recommend it enough. The biggest takeaway (beyond, "what's a normalized database look like") is to keep questioning the real-life system you're working in & to ask questions of stakeholders who understand it better than you, but me saying that isn't at all doing it enough justice compared to how well the author elaborates on the point. (btw, Data And Reality is also good for this, though a lot more theoretical in approach)

If you're really new to databases you'll want to supplement this with something that does normalization more formally, I don't have a great single suggestion for that. I read two of C.J. Date's books, honestly I found them mind-numbingly tedious to get through but I can't say I didn't learn from them. But...I also can't really recommend them because I was so bored by them that it made me stop reading altogether for like 4 months because I was just putting off finishing these books (LPT: always be in the middle of 2 books at the same time, unfortunately for me, both of my books were these lol). I did also read Art of SQL, it was both enjoyable & interesting, but maybe not what you're looking for if you want something super structured that will take you from point A to point B.

If you weren't aware, Database Design for Mere Mortals 25th Anniversary Edition was released last December!
I had seen that! Do you think it's worth buying & reading the updated version? As much as I loved the original, I'm not sure how much I'd get out of reading a book that's 90% the same content at this point. Though tbf I guess I wouldn't mind just supporting the author so maybe I'll pick up a copy and skim it at least.
The first thing I try to do upon encountering any technology/stack people recommend everywhere is try to figure out what’s wrong with it.

I then skim through anything I can find that’s relevant, and dig into the parts I didn’t fully understand (or skimming didn’t fully answer), until I can provide a comprehensive answer as to where it shouldn’t be used.

In general, implementation details from this skimming is only relevant as per the black-box abstraction — I only care about what workloads and impute the stack falls over on. The implementation itself I only keep around as:

1. I trust the looks of this implementation, so if I need it I know where to start

2. I can give a very superficial explanation of the implementation, just enough to defend my answer as to why it falls over in the workloads it falls over on.

So for databases, as an example, I care about the fact that

1. it stores by row (because this is the key differentiator from column-stores, and is the key as to why OLAP and OLTP have different preferred workloads — and why they can make different optimizations)

2. the indexes and roughly why they work/excel/fail (again, their existence changes preferred workloads)

3. That the SQL compiler produces a “plan”, and this plan is based on heuristics and estimates — So it shouldn’t be blindly trusted. You can view the plan and review it.

4. ACID, its guarantees, and implementation only as far as why can’t MongoDB and friends have it too (and why you might want to drop it — where does it only add overhead)

5. Probably other stuff not on the top of my head

Everything is a trade-off. The goal is to know what trade-offs you’ve actually made.

Yes.

I started programming long before the internet and read cover to cover every computer book I could get my hands on. Subsequent years had me prepared with a huge background of ideas I could apply to problems I encountered.

However the depth of understanding I sought differed depending upon the book.

Some books I dug into intensely. When the SICP came out I wrote my own Scheme interpreter (using the information in SICP) just so I could play with the examples.

Other books I just skimmed and indexed (in my memory) for possible future reference. The 80s 12 volume Systems Programming series by IBM is an extreme example.

Before the web I had accumulated 50 bankers boxes of computer books that I had "read" cover to cover. When they were on my shelves it often happened that when I had a problem to solve I could remember which book(s) had relevant information. I was my own search engine!

So, yes. If you are interested in databases start by skimming the paper. You might decide to read for details or you might not. Regardless, the more you know the more you can synthesize solutions when needed.

While that would probably be very valuable, it will probably come with lots of implementation details that won't really matter to you unless you plan to write a database yourself.

I'd suggest this one: https://dataintensive.net/ Its intended for _users_ of databases, that is, developers incorporating databases (and other kinds of data systems) into their applications. It gives just enough explanation on how those systems work and what they can (and can't) provide, and also gives a more general overview of what's available beyond relational SQL databases.

I would also recommend "Database Internals: A Deep Dive into How Distributed Data Systems Work" which gives a good overview of the implementation of a database. I felt that closer to an engineers perspective
Not really, I've learned a lot from the PostgreSQL documentation and various Stack Overflow answers.

Like with everything, getting experience and your hands dirty will make you far more knowledgeable. There is no silver bullet.