Hacker News new | ask | show | jobs
by no-s 2101 days ago
>>That’s a bad idea: that pushes the burden of data validation entirely on your client or application code. Textual column lengths should be used to enforce sanity checks on data.

It just doesn't work out that way in practice. For example SQLite, the most popular RDBMS of all time, pisses on strict column type and gives you value type instead with hints and storage classes. For several large systems I architected in SQL Server and Oracle, I gave developers heuristics to follow for column type selection and in some cases strictly re

1 comments

I compare it to "defence-in-depth" (i.e. data validation at every level).

Ever since RDMBS moved-away from XBase-style table-files database textual/string column length limits are conceptually the same thing as having a CHECK CONSTRAINT on the length of a string (something that SQLite does support!).

e.g. https://stackoverflow.com/questions/8252875/how-to-restrict-...

I think the CHECK CONSTRAINT is explicit and intentional WRT data validation. However it does not make a different type out of the value. I don't really think there is a one-size-fits-all for this problem anyway. The real problem expressed is accepting too much data - a check constraint requires the entire value to be present before validation. So if too much data shows up and there is a weak link in the data processing system, T.U. as they say...Many libraries provide for controlling how much physical data can be sent without resorting to a large data API.

Most character values in a database are not really text entered by a person, it's typically a symbol for some kind of enumeration or key referring to external semantics, e.g. compass direction "N", "NNE", etc. Or street names or postal codes or country names and so on. You can either model this directly or if it makes no difference, accept whatever you get. Actually examining the data can be interesting; I noticed once in a database with 50 billion stored dollar amounts there were only about 50,000 unique values. This led me to assert the dollar amounts were actually symbols in the system.

Sorry, my comment was truncated in my original post and I didn't notice it in time to edit it.