| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by simonw 2103 days ago

My least favorite part of database design is the bit where you have to pick lengths for your char columns.

Twenty years in and I'm still picking these pretty much by guessing. And when I guess wrong it causes really annoying problems further down the line.

I love how SQLite doesn't make me do this - it just has a TEXT type which is always unlimited in length.

10 comments

pizza234 2103 days ago

I suppose that in every RDBMs that makes the distinction, it's an optimization matter - VARCHAR being stored on-page, while TEXT off-page (although there can be optimizations for short TEXT values); the latter will cause an extra page seek on access.

Some database [versions] may also be unable to apply certain optimizations in certain cases, when TEXT is used (eg. temp tables on MySQL <= 5.7).

That doesn't prevent one from always using TEXT, and possibly, for most of the use cases (surely, if one uses SQLite, that's the case), the performance impact is not meaningful.

nemothekid 2103 days ago

For what I understand in the docs, in postgres at least, VARCHAR and TEXT are the same thing, and CHAR actually has the performance hit.

colanderman 2103 days ago

They are implemented the same, but they aren't the same type. Notably, each parameterization of varchar (or char) is its own type. This can cause issues when trying to change the length parameter. For this reason I prefer to use TEXT with a CHECK constraint.

simnim 2103 days ago

What's wrong with TEXT type for postgres, mysql, etc? In Postgres you don't need to declare a length for varchar either.

dkersten 2102 days ago

Yeah, in fact, Postgres encourages the use of TEXT over VARCHAR. The documentation even states:

There is no performance difference among these three types, apart from increased storage space when using the blank-padded type, and a few extra CPU cycles to check the length when storing into a length-constrained column.

https://www.postgresql.org/docs/current/datatype-character.h...

brobinson 2103 days ago

Postgres also has a TEXT type like this.

Edit: I see mpolun left a similar comment here, but it looks like he has been mostly shadowbanned for about 8 years.

codetrotter 2103 days ago

> Edit: I see mpolun left a similar comment here, but it looks like he has been mostly shadowbanned for about 8 years.

Yeah same. I don't have enough karma to vouch for their comment but I see no big reason that said comment should be dead. Their account is from 2012 and the vast majority of their few comments seem to be contributing to the conversation. Their first dead comment is also from 2012 but at a glance said comment is the only one that stands out as not contributing much to the conversation. And that's being harsh even – I've probably made less substantial comments in the past myself. Yet like 30% of their 3 pages of comments are dead. And looking at their submissions they have ever only submitted 3 stories, 2 of which appear to be from a domain that they themselves control. Hardly enough to be subjected to having so many of their comments killed I think. Though of course there might be other factors at play, but from what I see on their profile page I see nothing bad enough to warrant this.

brobinson 2103 days ago

I have 1500 karma, but I didn't see an option to vouch for it... I have seen that option on other comments, though. Maybe dang can take a look at their account.

alextheparrot 2103 days ago

Just for general interest, how can you see their comment or assert they have been shadowbanned?

codetrotter 2103 days ago

> how can you see their comment

Go to your profile and ensure you have showdead set to yes.

> assert they have been shadowbanned

See https://news.ycombinator.com/threads?id=mpolun and check out how many of their comments are showing as dead in combination with the contents of said comments. (The step above about setting showdead to yes might be required before you follow said link in order to actually see the dead comments). Almost every single one of the dead comments is contributing to the conversation. This is indicative of a shadow ban. HN users would not be downvoting the vast majority of these comments I think.

Zippogriff 2103 days ago

There's always a limit. You either define and manage it yourself or it'll be done for you when some part of your system breaks. In the end if you actually need performance and reliability everything will have a bound (if not fixed) size and larger data will be processed as a stream, anyway.

colanderman 2103 days ago

Postgres tip: define columns as TEXT, but with a CHECK constraint that the length is what you expect. This avoids the problem with varchar(...) that views inherit the underlying column types (including the length specifier), and prevent you from changing the maximum length in the table unless you drop and recreate the view. (There are ways around this through system tables but they are wholly unwieldy.)

simonw 2103 days ago

I really like that - especially since changing CHECK constraints can be done on large tables without having to rebuild the entire table.

hantusk 2103 days ago

Just use TEXT in other databases as well. It really shouldn't matter much in modern dbs

downvoteme1 2103 days ago

Why don’t you use varchar(max) as the range always. The varchar data type specified that the length of this attribute is variable in each record and the memory allocated depends only on the number of actual characters stored in the column.

DaiPlusPlus 2103 days ago

That’s a bad idea: that pushes the burden of data validation entirely on your client or application code. Textual column lengths should be used to enforce sanity checks on data.

I’ve worked on more projects than I care for which had nvarchar(max) columns for storing the contents of a small 3-4 line HTML textarea: most users were expected to type in less than 100 words or copy-paste the output of another program.

One day, that other program had a bug that made it generate about a gigabyte of textual output. That program had a “Copy output” button so the user didn’t realise how much data they were copying. I don’t know how it didn’t timeout when it was inserted, but that user brought the system down for everyone because that gigabyte-sized text value was used in lots of places.

no-s 2103 days ago

>>That’s a bad idea: that pushes the burden of data validation entirely on your client or application code. Textual column lengths should be used to enforce sanity checks on data.

It just doesn't work out that way in practice. For example SQLite, the most popular RDBMS of all time, pisses on strict column type and gives you value type instead with hints and storage classes. For several large systems I architected in SQL Server and Oracle, I gave developers heuristics to follow for column type selection and in some cases strictly re

DaiPlusPlus 2100 days ago

I compare it to "defence-in-depth" (i.e. data validation at every level).

Ever since RDMBS moved-away from XBase-style table-files database textual/string column length limits are conceptually the same thing as having a CHECK CONSTRAINT on the length of a string (something that SQLite does support!).

e.g. https://stackoverflow.com/questions/8252875/how-to-restrict-...

no-s 2100 days ago

I think the CHECK CONSTRAINT is explicit and intentional WRT data validation. However it does not make a different type out of the value. I don't really think there is a one-size-fits-all for this problem anyway. The real problem expressed is accepting too much data - a check constraint requires the entire value to be present before validation. So if too much data shows up and there is a weak link in the data processing system, T.U. as they say...Many libraries provide for controlling how much physical data can be sent without resorting to a large data API.

Most character values in a database are not really text entered by a person, it's typically a symbol for some kind of enumeration or key referring to external semantics, e.g. compass direction "N", "NNE", etc. Or street names or postal codes or country names and so on. You can either model this directly or if it makes no difference, accept whatever you get. Actually examining the data can be interesting; I noticed once in a database with 50 billion stored dollar amounts there were only about 50,000 unique values. This led me to assert the dollar amounts were actually symbols in the system.

Sorry, my comment was truncated in my original post and I didn't notice it in time to edit it.

6nf 2103 days ago

Varchar(max) is fine for some cases but you can't use it for everything because there's some limitations compared to varchar(1000) because the latter is stored in the actual table structure while the former is not.

true_religion 2103 days ago

With postgress it doesn't matter. Chars are just varcars under the hood, so you should always use the latter.

nickpeterson 2103 days ago

Until some user inserts 100MB in a single column of a single row.

simonw 2103 days ago

I'd rather guard against that in my input validation application logic than pick the wrong value when I'm designing my table and have to deal with fixing that later on.

ragnese 2103 days ago

Don't the other ones have TEXT, too?

mpolun 2103 days ago

Same with postgres. I never use specific lengths for text on postgres.