Hacker News new | ask | show | jobs
by throw345hn 2102 days ago
Could you suggest resources (books, articles, videos, moocs or others) to learn good database design. I am picking up skills about sql but want to better understand and learn about databases. As someone who doesnt have that background, a lot of the times I am just googling for stuff and just trying out bits and pieces.
5 comments

Apparently there's a 4th edition coming, but if you like the form-factor of a book-length text I'd throw a recommendation to "Database Design for Mere Mortals"[1]. I read the 1st edition from the late 90's, but I'd imagine it's still just as good. It approaches database modeling from a practical non-technical perspective, and I found it helped me learn data modeling in a software-agnostic manner, and later to influence how I talked to non-technical audiences about data modeling. I'm really glad to have found it early in my career.

I will echo what others have said. Data modeling is a force-multiplier type of skill. Combine it with a reasonable understanding of SQL and you can return a lot of value very quickly.

[1] https://www.amazon.com/Database-Design-Mere-Mortals-Hands/dp...

Not required, but knowing a little relational algebra helps queries make a lot more sense. A lot of the features like constraints and indices are very thin veneer over the underlying concepts.

At the other end of the spectrum, once you can write a few basic queries check out something like SQL Murder Mystery https://mystery.knightlab.com/

This one is super-fun and lets you practice some basic to low-intermediate skills.

Fun fact: You can do relational algebra on plain text data (e.g., csv, as well as output of commands like ls, etc.) using only shell + piping + well-known shell tools.
The Art of Postgresql. As the title suggests, it's targeted for Postgresql. However it has a full section on data modeling, another section on data types. It's packed with lots of examples and reasoning. I feel it gives you a good mix of theoretical background and hands on immediate experience. You'll find your way around what is (in my worthless opinion) the best database engine, and the pgsql specific examples will mostly translate to other relational databases.

https://theartofpostgresql.com/

I'd like to know this as well. I think you'll just have to build things (potentially horribly) and fail.

I took three semesters of database (granted, baby database classes) and I still have no idea how you can do something pretty straightforward like creating a room reservation system.

If there is a reservation beginning at 10:15 AM and ending at 12:30 PM and someone tries to book a reservation from 10:00 AM to 10:30 AM, the transaction should fail.

and before someone screams db2! yes, db2 can. but then you'd have to use db2 https://www.ibm.com/support/knowledgecenter/SSEPGG_11.1.0/co...

Why is this so difficult...

The documentation for Postgresql range types describes how to do exactly this.

https://www.postgresql.org/docs/11/rangetypes.html#RANGETYPE...

Edit: and if you didn't want to use postgres, you could have "starttime" and "endtime" columns and reject any bad bookings with a before insert / before update trigger.

This approach is the best and works really well if you don't need to do a join on a related table to look up information. If you need to use data outside of the current table for exclusions/check constraints, you have to write a trigger function (as far as I know).

I had to solve this recently, where the actual start/end times were stored on a related table. I'm no SQL wizard, but I'd love to share my solution in case it helps others (it might be terrible).

Note: I changed the actual tables/domain to be generic, this is a poor example and it made more sense for my usecase, but this shows general approach.

  -- Let's pretend we have these tables (awful design, but for sake of example):
  -- room <-> reservation <-> reservation_info
  -- Where "reservation_info" has "start_time" and "end_time"
  CREATE FUNCTION check_for_overlapping_reservations()
    RETURNS trigger
    LANGUAGE plpgsql AS
  $$ BEGIN
    IF (
        -- Find the newly created reservation and join it with the info record to grab "start_time" and "end_time" for check below
        with this_reservation as (
              select * from reservation
              inner join reservation_info on reservation_info.id = reservation.reservation_info_id
              where reservation.id = NEW.id
          ), bookings_for_timerange as (
              -- Select every other reservation, where the reservation is happening in the same room
              select * from reservation as other_reservation, this_reservation
              inner join reservation_info as other_reservation_info
                on other_reservation_info.id = other_reservation.reservation_info_id
              where other_reservation.room_id = this_reservation.room_id AND
              -- And the timerange from start to end overlaps the newly created record
                  tstzrange(this_reservation.start_time, this_reservation.end_time) &&
                  tstzrange(other_reservation_info.start_time, other_reservation_info.end_time)
          -- Get a count of all the records, it should only be 1. If it's greater than one, there's overlap.
          select count(*) from bookings_for_timerange
        ) > 1
    THEN
        RAISE EXCEPTION 'Room is already reserved during this time period';
    END IF;
    RETURN NEW;
  END;$$;
TBH it's a bit more verbose and less performant than need be, but hey, if it works, rock on! But don't use 'select *' in production code.

Consider next time something like

    if exists (select from sometable t1
      join sometable t2 on 
        t1.resource_id = t2.resource_id 
        and t1.res_id <> t2.res_id
        and tstzrange(t1.start, t1.end) && tstzrange(t2.start, t2.end)
      where t1.res_id = new.res_id )
then ... raise exception
Ahh, that looks much cleaner. Thanks for the knowledge =)
hold start & end time in normal DB columns, use business logic for your rule enforcement, have a query to identify rule violation. Maybe add constraints on your warehouse data but I probably wouldn't on the operational DB. I could be convinced to do so for the PoC because it might make sense, but what if you're managing thousands or millions of scheduled "resources" like a chain of libraries, or hotel rooms? What if you need to coordinate "leases"? DB-level is an OK place to start, but also one of the hardest to tease apart when you need to scale.

This is kind of the point of data modeling; you start with an idea, make it all pretty 3rd-normal form, define your projected indices and checks and constraints, then you mess it up a bit where it makes sense or you've had experience in the past.

YMMV

Exclude constraint on a GiST index?

https://stackoverflow.com/a/51247705

Just about anything written by C. J. Date would be worth your while in this regard. It will give you a solid grounding in the relational model, which you can then apply to any RDBMS. Here's a good list to start with:

- Database in Depth, O'Reily (2005)

- Relational Theory for Computer Professionals, O'Reilly (2013)

- SQL and Relational Theory, 3rd Ed, O'Reily (2015)

- Database Design and Relational Theory, 2nd Ed, Apress (2019)

If you can find a copy of this, I'd recommend it "Conceptual Schema and Relational Database Design: A Fact Oriented Approach" by Nijssen and Halpin. They presented a richer model (NIAM) for designing databases than the ER model by Codd/Date, though CJ Dates book is useful to. Both of these are out of print now, but anything by Halpin or Date would be a useful addition. To design databases relational theory and normal forms are the language you'd need to understand - they're not very hard though seems they aren't as widely taught any more.