Hacker News new | ask | show | jobs
by SigmundA 98 days ago
>Sure they do, because now your row / index is variable length rather than fixed length. Way more complicated.

Come on its literally a 2 byte per column header in the row so it just sums the column lengths to get the offset, it does the same thing for fixed length except it gets the col length from the schema.

It's not much more complicated than a fixed length column only the column length is stored in row vs schema. I am not sure where you are getting this idea it way more complicated, nor the 3 vs 4 byte thing, the whole row is a variable length structure and designed as such, null values change the row length fixed or variable data type and have to be accounted for since a null takes up no space in the column data its only in the null bitmap.

> what's the point of user-configuration of someone hard-codes 'NY' in a query or in the code

Because it doesn't matter, 'NY' isn't changing just like 11 the int wouldn't change, but 'NY' is way easier to understand and catch mistakes with and search for code without hitting a bunch of nonsense and distinguish when 10 columns are all coded next to each other in a result set.

I prefer my rows to be a little more readable than 1234, 1, 11, 2, 15, 1 ,3 and the users do too.

I have had my fill of transposition bugs where someone accidentally uses the wrong int on a pk id from a different table and still gets a valid but random result that passes a foreign key check almost enough for me to want to use guid's for pk's almost. At least with the coded values it is easier to spot because even with single character code people tend to pick things that make sense for the column values you know 'P' for pending, 'C' for complete etc, vs 1 2 3 4 used over and over across every different column with an auto increment.

1 comments

> Come on its literally...

You're the one saying a 2 character string is somehow a space savings. If we're going to split hairs that finely then you have to know that any row with a variable length string makes the entire row/index variable length and that is a net storage and performance loss. It's worse in every way than a simple integer. I will admit that it ultimately doesn't matter. But I'd also argue using an nvarchar in place of varchar for this also doesn't matter. It's not just premature optimization it's practically useless optimization.

> Because it doesn't matter, 'NY' isn't changing just like 11 the int wouldn't change, but 'NY

That's not what happens but what happens is that somebody renames New York to New Eburacum and now your code doesn't match the value and it just adds more confusion.

But I'll grant you that it's totally fine. It's even more fine if you don't use varchar and instead use char(x).

>You're the one saying a 2 character string is somehow a space savings. If we're going to split hairs that finely then you have to know that any row with a variable length string makes the entire row/index variable length and that is a net storage and performance loss.

The row is always variable lengths as a structure it has flags noting how many columns there are with values and if there is a variable length section or not, only rows with no variable length fields at all has no variable length section and that is a bit flag check in the header.

You are making a non argument, variable length fields can be a space savings over an int with single char codes which is very common, and do not impact performance in any meaningful way. Besides that one could use fixed length chars and still get the other benefits I mentioned while having the same exact space usage and processing as a fixed length ints.

>That's not what happens but what happens is that somebody renames New York to New Eburacum

Changing the descriptive meaning of an entry causes all sorts of problems and even more so if it is a int because it's completely opaque its much harder to see an issue in the system because everything is a bunch of ints that do not correlate in any way to their meaning.

Changing the description to something that has the same meaning worded differently is usually not an issue and still gives good debug visibility to the value. If you and your users consider New Eburacum synonymous with New York, then having the code stay 'NY' should not be an issue and still be obvious when querying the data.

Unless someone is using the short code in a user visible way and it has to be updated. State is a common one that does this and nobody is changing state names or codes because it is a common settled natural key.

In the rare situation this actually needed to be done then one can update existing data, this is a not an issue in practice. You have the be extremely cautious updating the description of a code because much data was entered under the previous description and the meaning that it carries, having the code have some human meaning makes it more obvious to maintainers this should be done with care, many times it would involve deprecating the old one and making a new one with a different code because they have different meanings, having a table instead of a enum allows other columns to have this metadata.

This is not the same issue as say using a SSN for a person ID.