Hacker News new | ask | show | jobs
by marginalia_nu 1663 days ago
Integer columns are significantly faster and smaller than strings in a SQL database. It adds up quickly if you have a sufficiently large database.

I use the following scheme:

   1 - exhausted
   0 - alive
  -1 - blocked (by my rules)
  -2 - redirected
  -3 - error
1 comments

The author is scraping fewer than 1,000 records per day, or roughly 365,000 records per year.

On my own little SaaS project, the difference between querying an integer and a varchar like “active” is imperceptible, and that’s in a table with 7,000,000 rows.

It would take the author 19 years to run into the scale that I’m running at, where this optimisation is meaningless. And that’s assuming they don’t periodically clean their database of stale data, which they should.

So this looks like a premature optimisation to me, which is why it stood out as odd to me in the article.

I'd put it closer to the category of best practices than premature optimization. It's pretty much always a good idea. It's not that not doing this will break things, the alternative is slower and uses more resources in a way that affects all queries since larger datatypes elongate the records, and locality is tremendously important all aspects of software performance.
I disagree. I think a better "best practice" is to make the meaning behind the code as clear as possible. In this case, the code/data is less clear, and there is zero performance benefit.
There is absolutely a performance benefit to reducing your row sizes. It both reduces the amount of disk I/O and the amount of CPU cache misses and in many cases also increases the amount of data that can be kept in RAM.

You can map meaning onto the column in your code, as most languages have enums that are free in terms of performance. It does not make sense to burden the storage layer with this, as it lacks this feature.

You can map meaning onto the column in your code, as most languages have enums that are free in terms of performance. It does not make sense to burden the storage layer with this, as it lacks this feature.

Was just looking at how to do this with an enum today! Read my mind. :)

The performance benefit is negligible at the scale the author of the article is operating at. You already alluded to this point being context-dependent earlier when you said:

> if you have a sufficiently large database

Roughly 360,000 rows per year is not sufficiently large. It's tiny.