Hacker News new | ask | show | jobs
by SamuelAdams 2102 days ago
I am forever grateful that I took a full semester of database design in my undergrad. This single skill has stood with me for my entire career so far and has enabled me to figure out the root cause of many production issues. Plus people really like it when you can answer ad-hoc questions like "what inspections are still open and when were they first opened".

If y'all can understand Angular / React / Vue there's no reason to not learn databases.

5 comments

Could you suggest resources (books, articles, videos, moocs or others) to learn good database design. I am picking up skills about sql but want to better understand and learn about databases. As someone who doesnt have that background, a lot of the times I am just googling for stuff and just trying out bits and pieces.
Apparently there's a 4th edition coming, but if you like the form-factor of a book-length text I'd throw a recommendation to "Database Design for Mere Mortals"[1]. I read the 1st edition from the late 90's, but I'd imagine it's still just as good. It approaches database modeling from a practical non-technical perspective, and I found it helped me learn data modeling in a software-agnostic manner, and later to influence how I talked to non-technical audiences about data modeling. I'm really glad to have found it early in my career.

I will echo what others have said. Data modeling is a force-multiplier type of skill. Combine it with a reasonable understanding of SQL and you can return a lot of value very quickly.

[1] https://www.amazon.com/Database-Design-Mere-Mortals-Hands/dp...

Not required, but knowing a little relational algebra helps queries make a lot more sense. A lot of the features like constraints and indices are very thin veneer over the underlying concepts.

At the other end of the spectrum, once you can write a few basic queries check out something like SQL Murder Mystery https://mystery.knightlab.com/

This one is super-fun and lets you practice some basic to low-intermediate skills.

Fun fact: You can do relational algebra on plain text data (e.g., csv, as well as output of commands like ls, etc.) using only shell + piping + well-known shell tools.
The Art of Postgresql. As the title suggests, it's targeted for Postgresql. However it has a full section on data modeling, another section on data types. It's packed with lots of examples and reasoning. I feel it gives you a good mix of theoretical background and hands on immediate experience. You'll find your way around what is (in my worthless opinion) the best database engine, and the pgsql specific examples will mostly translate to other relational databases.

https://theartofpostgresql.com/

I'd like to know this as well. I think you'll just have to build things (potentially horribly) and fail.

I took three semesters of database (granted, baby database classes) and I still have no idea how you can do something pretty straightforward like creating a room reservation system.

If there is a reservation beginning at 10:15 AM and ending at 12:30 PM and someone tries to book a reservation from 10:00 AM to 10:30 AM, the transaction should fail.

and before someone screams db2! yes, db2 can. but then you'd have to use db2 https://www.ibm.com/support/knowledgecenter/SSEPGG_11.1.0/co...

Why is this so difficult...

The documentation for Postgresql range types describes how to do exactly this.

https://www.postgresql.org/docs/11/rangetypes.html#RANGETYPE...

Edit: and if you didn't want to use postgres, you could have "starttime" and "endtime" columns and reject any bad bookings with a before insert / before update trigger.

This approach is the best and works really well if you don't need to do a join on a related table to look up information. If you need to use data outside of the current table for exclusions/check constraints, you have to write a trigger function (as far as I know).

I had to solve this recently, where the actual start/end times were stored on a related table. I'm no SQL wizard, but I'd love to share my solution in case it helps others (it might be terrible).

Note: I changed the actual tables/domain to be generic, this is a poor example and it made more sense for my usecase, but this shows general approach.

  -- Let's pretend we have these tables (awful design, but for sake of example):
  -- room <-> reservation <-> reservation_info
  -- Where "reservation_info" has "start_time" and "end_time"
  CREATE FUNCTION check_for_overlapping_reservations()
    RETURNS trigger
    LANGUAGE plpgsql AS
  $$ BEGIN
    IF (
        -- Find the newly created reservation and join it with the info record to grab "start_time" and "end_time" for check below
        with this_reservation as (
              select * from reservation
              inner join reservation_info on reservation_info.id = reservation.reservation_info_id
              where reservation.id = NEW.id
          ), bookings_for_timerange as (
              -- Select every other reservation, where the reservation is happening in the same room
              select * from reservation as other_reservation, this_reservation
              inner join reservation_info as other_reservation_info
                on other_reservation_info.id = other_reservation.reservation_info_id
              where other_reservation.room_id = this_reservation.room_id AND
              -- And the timerange from start to end overlaps the newly created record
                  tstzrange(this_reservation.start_time, this_reservation.end_time) &&
                  tstzrange(other_reservation_info.start_time, other_reservation_info.end_time)
          -- Get a count of all the records, it should only be 1. If it's greater than one, there's overlap.
          select count(*) from bookings_for_timerange
        ) > 1
    THEN
        RAISE EXCEPTION 'Room is already reserved during this time period';
    END IF;
    RETURN NEW;
  END;$$;
TBH it's a bit more verbose and less performant than need be, but hey, if it works, rock on! But don't use 'select *' in production code.

Consider next time something like

    if exists (select from sometable t1
      join sometable t2 on 
        t1.resource_id = t2.resource_id 
        and t1.res_id <> t2.res_id
        and tstzrange(t1.start, t1.end) && tstzrange(t2.start, t2.end)
      where t1.res_id = new.res_id )
then ... raise exception
Ahh, that looks much cleaner. Thanks for the knowledge =)
hold start & end time in normal DB columns, use business logic for your rule enforcement, have a query to identify rule violation. Maybe add constraints on your warehouse data but I probably wouldn't on the operational DB. I could be convinced to do so for the PoC because it might make sense, but what if you're managing thousands or millions of scheduled "resources" like a chain of libraries, or hotel rooms? What if you need to coordinate "leases"? DB-level is an OK place to start, but also one of the hardest to tease apart when you need to scale.

This is kind of the point of data modeling; you start with an idea, make it all pretty 3rd-normal form, define your projected indices and checks and constraints, then you mess it up a bit where it makes sense or you've had experience in the past.

YMMV

Exclude constraint on a GiST index?

https://stackoverflow.com/a/51247705

Just about anything written by C. J. Date would be worth your while in this regard. It will give you a solid grounding in the relational model, which you can then apply to any RDBMS. Here's a good list to start with:

- Database in Depth, O'Reily (2005)

- Relational Theory for Computer Professionals, O'Reilly (2013)

- SQL and Relational Theory, 3rd Ed, O'Reily (2015)

- Database Design and Relational Theory, 2nd Ed, Apress (2019)

If you can find a copy of this, I'd recommend it "Conceptual Schema and Relational Database Design: A Fact Oriented Approach" by Nijssen and Halpin. They presented a richer model (NIAM) for designing databases than the ER model by Codd/Date, though CJ Dates book is useful to. Both of these are out of print now, but anything by Halpin or Date would be a useful addition. To design databases relational theory and normal forms are the language you'd need to understand - they're not very hard though seems they aren't as widely taught any more.
Pardon my ignorance--is inspections some concept that relates to database management, or are you referring to a query like "select * from inspections where status = 'open';". Honestly asking.
I believe OP is just referring to the ability to run adhoc queries. Typically nosql solutions are built to be performt for common access patterns, at the expense of being difficult or impossible to query in unplanned ways. SQL DBs are very good in that regard.
Thanks! And yes, I agree. The ability to write ad hoc SQL is a great skill.
Totally. I’ve always admired SQL wizards and their ability to (quickly) produce the most optimal queries given the relatively limited syntax (compared to what you can do with frameworks like, for example, LINQ today).
Ah yes, inspections in this context means Safety Inspections, aka a business rule or thing. Industry is Retail / Grocery. Inspectors inspected food at stores on day one and Stores had 5 business days to “complete” the inspection. Internal audit wanted a list of “defaulted” inspections so they could go harass the store people to get their things done.
Absolutely agree. Learning data normalisation was up there with Typing 9 (touch-typing class in grade 9, not Hindley-Milner whatever) in terms of long-term usefulness.
I feel the same way. It’s been nearly 30 years since my undergrad database class and I used what I learned in that class yesterday mocking out a schema for a personal project. We didn’t touch any real dbms in the class but rather did most of the class work using relational algebra.
Is "inspections" a typo for "transactions" here?