Hacker News new | ask | show | jobs
by Akronymus 1775 days ago
And even if you don't want to normalize, at least don't make the DB basically incomprehensible.

Speaking from experience, a "abnormalized" db sucks. https://news.ycombinator.com/item?id=27842820

2 comments

Yeah one should know why it’s important to normalize, and then when they go a different path, it is done consciously and not because they are inexperienced and didn’t really know any better.

Folks out there who don’t want to learn the ins and outs of database theory - please don’t; read up on it, it’s a very important skill & knowledge to have and doesn’t take long to master. It does take a few days, perhaps weeks, for all the ideas to “click” in your head.

Whilst we having many fads in IT, there are certain sound principles of computer science developed over 40-50 years which one should learn :-) Know the rules before you break them!

For Databases, it takes me a concious effort to not go to at least 3rd NF.

Except for addresses/names/telephone numbers, I have been burned by ANY assumption I made. Those, I tend to just store as something like char(255) nowadays by default, unless business requirements force me to split it up, but even then, after voicing my concerns.

Edit: reading up on the addresses falsehoods again, 255 may not be enough.

https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-...

https://github.com/google/libphonenumber/blob/master/FALSEHO...

https://www.mjt.me.uk/posts/falsehoods-programmers-believe-a...

If you use Postgres or SQLite, TEXT is preferable for strings.
I just wanted to give a easily understood example. Specifically to point out that even limiting the length can make problems.
There's reasons for one wide table.

The traditional SQL database uses row based data structures.

When you shift to columnar data formats for the table, parquet or any columnar DB, the normalization rules which were developed for row based become extremely different.

The brave new world now is in-memory DBs.

All those 1970s rules, which make sense for row based tables stored on disk, don't apply to data in RAM.

At all.

So it's going to get very interesting.

Shouldn't the normalization rules be exactly the same whether the data is stored row-oriented or column-oriented and on disk or memory?
This is a common misconception.

Columnar stores do indeed enable wide tables but they don’t replace the need for normalization or the benefits. Wide tables and normalization (or denormalization) solve different problems.

From the perspective of a database the difference between RAM and disk is latency. Normalization is still a factor in query performance.

> There's reasons for one wide table.

There are indeed. It should still be a conscious decision. For the DB I work with, it wasn't.