Hacker News new | ask | show | jobs
by downvoteme1 2101 days ago
Why don’t you use varchar(max) as the range always. The varchar data type specified that the length of this attribute is variable in each record and the memory allocated depends only on the number of actual characters stored in the column.
2 comments

That’s a bad idea: that pushes the burden of data validation entirely on your client or application code. Textual column lengths should be used to enforce sanity checks on data.

I’ve worked on more projects than I care for which had nvarchar(max) columns for storing the contents of a small 3-4 line HTML textarea: most users were expected to type in less than 100 words or copy-paste the output of another program.

One day, that other program had a bug that made it generate about a gigabyte of textual output. That program had a “Copy output” button so the user didn’t realise how much data they were copying. I don’t know how it didn’t timeout when it was inserted, but that user brought the system down for everyone because that gigabyte-sized text value was used in lots of places.

>>That’s a bad idea: that pushes the burden of data validation entirely on your client or application code. Textual column lengths should be used to enforce sanity checks on data.

It just doesn't work out that way in practice. For example SQLite, the most popular RDBMS of all time, pisses on strict column type and gives you value type instead with hints and storage classes. For several large systems I architected in SQL Server and Oracle, I gave developers heuristics to follow for column type selection and in some cases strictly re

I compare it to "defence-in-depth" (i.e. data validation at every level).

Ever since RDMBS moved-away from XBase-style table-files database textual/string column length limits are conceptually the same thing as having a CHECK CONSTRAINT on the length of a string (something that SQLite does support!).

e.g. https://stackoverflow.com/questions/8252875/how-to-restrict-...

I think the CHECK CONSTRAINT is explicit and intentional WRT data validation. However it does not make a different type out of the value. I don't really think there is a one-size-fits-all for this problem anyway. The real problem expressed is accepting too much data - a check constraint requires the entire value to be present before validation. So if too much data shows up and there is a weak link in the data processing system, T.U. as they say...Many libraries provide for controlling how much physical data can be sent without resorting to a large data API.

Most character values in a database are not really text entered by a person, it's typically a symbol for some kind of enumeration or key referring to external semantics, e.g. compass direction "N", "NNE", etc. Or street names or postal codes or country names and so on. You can either model this directly or if it makes no difference, accept whatever you get. Actually examining the data can be interesting; I noticed once in a database with 50 billion stored dollar amounts there were only about 50,000 unique values. This led me to assert the dollar amounts were actually symbols in the system.

Sorry, my comment was truncated in my original post and I didn't notice it in time to edit it.

Varchar(max) is fine for some cases but you can't use it for everything because there's some limitations compared to varchar(1000) because the latter is stored in the actual table structure while the former is not.