Hacker News new | ask | show | jobs
by A_Person 2922 days ago
Gosh, I must say there seems to be some misunderstanding of RDBMS concepts in some posts in this thread!

I was writing database systems professionally, back in the days before the RDBMS concept was even a thing. So here's my (enormously long and convoluted) 2 cents worth. Please be sure to pack a sandwich and get hydrated before you continue.

Say you were dealing with doctors and patients, and needed to store that information in a database. Back in the day, you'd typically use a so-called hierarchical database. To design one of those, you need to decide, what is the most common access method expected to be: getting the patients for a given doctor, or the doctors for a given patient? You'd design the schema accordingly. Then, the preferred access method was easy to code, and efficient to run. The "other" access method was still possible, but harder to code, and slower to run. The database schema depended on how you thought the users would access the data.

But that is absolutely what NOT what to do with an RDBMS. Certainly you look at the users' screens, reports, and so on - but that's just to determine what unique entities the system must handle - in this case, doctors and patients. Then you ignore how the users will access the data, and work out what are the inherent logical relationships between all the entities.

Your initial answer might be this. A doctor can have many patients, and a patient can have many doctors. As any competent relational designer will instantly know, this means you need a resolving table whose primary key is a composite key comprising the primary keys of the other two tables. So if Mary was a patient of Tom, you'd add Mary to the patients table (if not already there), Tom to the doctors table (ditto), then add a Mary/Tom record to the resolving table. By that means, a doctor could have any number of patients, a patient could have any number of doctors, and it's trivially easy to write simple, performant SQL to access that data however you want.

But then you'd have a ghastly thought: patients can also be doctors, and doctors can also be patients! Say Tom was also a patient of Mary! Now you need a Tom record in the patient's table, but that would inappropriately duplicate all his data from the doctors table! Something's clearly wrong. You'd soon see that from a data modelling viewpoint, you don't want doctors and patients as separate entities - you want a single entity Person, and a resolving table to relate arbitrary people in specified ways.

So what?!!

So this. IMHO, many developers using relational databases have absolutely no idea about any of that. They design hopelessly unnormalised schemas, which then need reams of ghastly SQL to get anything out. The query planner can barely parse all that crap, let along optimise it. The database has to stop every five minutes to wet its brow and take a drink.

So here's my advice to any inexperienced relational database designers who have actually managed to get this far!! If you can answer the following questions off the top of your head, you're probably on the right track. If you can't, you're lacking basic knowledge that you need in order to use an RDBMS properly:

- what is a primary key? - what is a foreign key? - what is an important difference between primary keys and foreign keys? - what is a composite key? When would you use one? - what are the three main relations? - what is normalisation? - what is denormalization? - what is a normal form? and so on.

Just my 2c! :-)

3 comments

I'm guessing A_Person is making some up of what he/she said (to be entertaining) (I don't mean the DB facts - which are right, of course, and the questions at the end), but it was an amusing post anyway :)

Well done.

Thanks :-)
I imagine you've written more than your fair share of PROGRESS 4GL code in the past .. your qualifications questions are pretty much straight out of the PROGRESS 4GL user guide .. ;)
Nope! I've never used postgres at all. Most of my RDBMS work was done on HN NSFW ALERT - oracle :-) But your comment supports my general point, which is, that data modelling and relational schema design skills are product agnostic; normalization is normalization, as it were.
Just to point out, "PROGRESS 4GL" is a different thing, not PostgreSQL based.

It seems to be called OpenEdge Advanced Business Language these days:

https://en.wikipedia.org/wiki/OpenEdge_Advanced_Business_Lan...

And yeah, the terms you mentioned are foundational RDBMS terms, common to all reasonable implementations of RDBMS's. :)

Ok. Oracle had/has a presumeably similar thing (PL/SQL). I misread his comment to mean, postgres SQL.
Yes indeed, I meant PROGRESS 4GL, which was sort of around before all the "RDBMS" hoopla congealed all this wonderful info into a single art. Its just that your last paragraph really sounded like a quote, near-verbatim, from the very guides published by PROGRESS just for the purposes of educating people on how to make their data relatable .. but of course that is because these are natural laws for databases.
I remember when 4GLs were the hot new thing. Let's get into those! Oops, nope, forget that, now it's Data Dictionaries, let's get into those instead! Rinse & repeat ...
So, were you writing DB software before Codd's research at IBM was available?
Further to my other reply, I've just checked the intertubes, and found that Codd's paper "A Relational Model of Data for Large Shared Data Banks" was published in 1970, and Dijkstra's "Go To Statement Considered Harmful" in 1968. So I think my fading memories of all this are accurate (for once!).
Yes indeed. I haven't checked his research dates, but I was writing database software in the late (uh) 60s and early 70s. I think that was even before "goto harmful". I still remember our standards officer saying, let's try some of this structured programming stuff!