Hacker News new | ask | show | jobs
by davidverhasselt 1880 days ago
There is a downside which I've experienced: if you want a triple-state boolean (null, false, true) then having a boolean column allows for that while a timestamp-as-boolean column does not (you lose the "null" value because that equals `false` in timestamp-as-boolean).

Having a distinction between `null` and `false` can be handy for values that are optional or have a dynamic default. If it's `null` you know it is not explicitly set and could use a fallback value. If it's false you know it's explicitly set as false.

A simple use-case for this is when a user can leave the field blank. This is impossible to model with only a timestamp-as-boolean.

Another use-case is dynamic defaults or fallbacks, e.g. `hidden` of a folder where if `hidden` is nil, you fall back to the parent folder's value.

TL;DR a boolean column actually has 3 states, a timestamp only has 2. Article makes a big deal about there not being any nuance about the fact that a timestamp is superior. I disagree, because you go from 3 states to 2 states, there are cases where you'd want a boolean instead of a timestamp. Ironically OP missed this nuance (or they'll pull a no-true-scotsman).

16 comments

What you describe calls for an Enum, not a nullable boolean which is just another way of passing a hardcoded magic value carrying an implicit meaning.

> a boolean column actually has 3 states, a timestamp only has 2.

Going with your logic a timestamp has billions of states, you just have to arbitrarily assign special meanings to certain dates that won't ever be used. Just like using null as another state I wouldn't call it a good idea, though.

Your statement is false. Many use cases call for optional bools. The great thing is there is no implicit meaning - it's True, False or Unset. It is true that nulls can be abused but in this case, far more elegant than an enum.
There is no way for you to make sure that `null` was "set" intentionally and not due to a bug or other kind of failure.

Meanwhile if you see a `None` value you know for a fact that it was set by your software and if you actually encounter a `null` you know that something went horribly wrong.

Strong type systems and a few overheads in favor of better bug detection/prevention are popular for a reason.

There are many application states in which `null` could mean "not got the data yet" and/or "there is no data".

Just use an enum, it's much more expressive.

Could you supply an example that an enum wouldn't do as well? How is a bool null, false, true different than an enum with null, false, true?

Edit: grammar

Logically they're the same, they're isomorphic. It's just about ergonomics, readability, etc.
If you go with nullable Boolean in the database schema, do you give it a type of nullable boolean in the code?

In which case null would either mean the value on the recordin the database is null, or that that for whatever reasons that parameter was not read from the database.

You will need two types of null.

All of this goes away with an enum or similar solution.

You can decompose the nullable boolean in the database when you read it , but again your task would be easier to treat it like an enum.

I use nulls or undefined in javascript as the initial state for most values for this reason, it's explicit in that it's unset

Ditto for database design too

Tri-state booleans are also very ugly. What happens if you suddenly need a 4th state?

Use enums (or any equivalent) for states that are non-boolean.

Null is a signifier that a value isn't know. Yes, if you're using null to mean something else you should use another type, but nullable-booleans (and nullable fields in general) can be extremely useful. For instance, if you're collecting information, but wont know everything at creation time, the unknown values are null.
> For instance, if you're collecting information, but wont know everything at creation time, the unknown values are null.

I find the idea of 'creating unknown values' to be self-contradictory. I find it much more logical to define a PartialFoo containing only the parts we know up-front, and a fill in the rest later using a function PartialFoo -> Foo.

If we need multiple steps, we can put some Optional fields in the PartialFoo, to avoid lots of intermediate types. This is still annoying, but it keeps all of the 'intialisation headaches' separate from the Foo type itself; so code dealing with Foo doesn't have to care about null checks, missing fields, etc.

> If we need multiple steps, we can put some Optional fields in the PartialFoo, to avoid lots of intermediate types.

You juat described the purpose of SQL's NULL.

If you are using a nullable bool, you do not need a fourth state. True, False, Unset. Very useful in a lot of cases. It's actually not ugly but very elegant.
I don't think it's elegant at all. Especially in SQL dialects where you need to check null state separately from false. So much bug potential...
With an enum you’d have to check the unset state separate as well.
He meant that in some SQL Dialects you cannot check FIELD = NULL, you have to use FIELD IS (NOT) NULL (like Oracle). Therefore you cannot use FIELD inside an IN clause either.
But with an (NOT NULL) enum/integer/string or however you are representing it, every state is a regular case. Not a special case. Null nearly always needs some special treatment.
The obvious solution is to encode it like this: `null` is `null, `yes` is the timestamp when it was set and `false` is a timestamp somewhere in the future. Now t < now() maps nicely to the value of your boolean.

You just have to pick your `false` timestamp somewhere far into the future, let's say something arbitrary like 03:14:07 on Tuesday, 19 January 2038. The software won't be around for that long anyway, so it will never be a problem...

Hehe, that's creative. But I'd say - if you're making it that complex, just use a boolean column and a timestamp column separately instead, if you really need the timestamp, and make it explicit and simple to understand.
And this is why legacy code is such a nightmare. Compound unnecessary optimizations like this over a decade and it’s impossible to understand the data in the DB or how to safely refactor the code without breaking some weird corner case that was handled by using magic values instead of structured data.
Better use something so far in the past that the software certainly didn't exist yet, like 3 January 1970. In fact, you could use all of, say, the '70s to encode lots of flags... gets excited
Store your timestamps in a long int or string.

Positive = timestamp

-1 = false

0 = null

I can’t tell if you’re being ironic or not
Yeah it's mostly a reference to Y2K and Y2038 [1], using representations that seem clever and work now, but will lead to bugs in the future because nobody thinks their software will be around for that long.

[1] https://en.wikipedia.org/wiki/Year_2038_problem

> The latest time since 1 January 1970 that can be stored using a signed 32-bit integer is 03:14:07 on Tuesday, 19 January 2038

> MySQL database's built-in functions like UNIX_TIMESTAMP() will return 0 after 03:14:07 UTC on 19 January 2038

OP missed that nuance, and so will a lot of other developers you work with. Relying on a difference between null and false is one of those things that seems clever when you first do it, but then three years later your project is scattered with comments like "// be very careful checking this, it must be false here null means something else" after you or some other dev you're working with confuses falsy for false
I've been in software development for 27 years, and I've never seen a case where a boolean column or variable being null was anything but a terrible, confusing, fragile, hack.

Remember that in some languages the behavior of null is weird, and can be false, or can be treated as 0.

    $ node
    Welcome to Node.js v14.16.0.
    Type ".help" for more information.
    > null + null
    0
    >
    > 0 == false
    true
    > 0 === false
    false
    >
What has the ridiculousness of Javascript got to do with a database schema?
What good is a database schema if it doesn't have data and isn't used by any applications?
There are languages with nice optional types, which maps nicely to SQL null. Not Javascript though and I wouldn't want to use it server side.
Long time readers of The Daily WTF know that the canonical tri-state boolean is { True, False, FileNotFound } [1]

[1] https://thedailywtf.com/articles/What_Is_Truth_0x3f_

"triple state boolean", what's next, a double precision fp32?
Also known as *bool, "Maybe Bool", "Boolean?", "Optional<Boolean>" etc.
Well, in browsers there's `HTMLMediaElement.canPlayType()` which returns one of the following strings:

- "probably"

- "maybe"

- ""

Reference: https://developer.mozilla.org/en-US/docs/Web/API/HTMLMediaEl...

My goodness. I remember having to explain this to my manager. They really wanted recording + playback in our platform. For as many browsers as possible. Even mobile (iOS, I'm looking daggers at you). In 2015.

AFAIR I just said fuck it and made playback as permissive as possible (i.e., only prevent media playback if canPlayType returned ""). I don't know how advisable that is but the bug got off my back anyway.

I dunno what makes this so difficult, why we can't get at least a definite "yes" even in 2021.

That's like my washing machine which has a Normal setting, a Standard setting, a Wash setting...
Do you mean as in "If the washing machine has a Wash setting, what does it do if it is not set to Wash?"
And what's the difference between normal and standard?
Wat
yeah, but if you really need to store 3 states, why not make 1 first-class concept instead of awkwardly combining two?
Because a nullable bool has literally the same semantics as Maybe<Bool>. Why invent it again but in your app this time?
Long Boolean! Yes, no, don’t know, weekdays only.
If boolean comes from George Boole, then maybe we should call triple state values "Łukasiewiczan", it would make for a funny typing exercise :P https://en.wikipedia.org/wiki/Jan_%C5%81ukasiewicz
The use cases the blog post is discussing aren't tri-state bools though. Either the user did or did not sign in. Either the version was or was not published. Null and false are equivalent here.
If the timestamp is a creation date after your system went on, which it is here, you can always store 0 for false.
-1 can be useful, too, especially if you've already assigned a certain meaning to 0.

Javascript developers are used to certain functions returning -1 if there's no match, so -1 shouldn't feel strange as long as it's well documented.

I wonder why they decided to use -1. Why not undefined or NaN or even null? The -1 produces bugs which go unnoticed for a long time.
On the other hand, there's strcmp()/strncmp(). I can never wrap my head around the fact that strcmp(s1, s2) == 0 (or even better, !strcmp(s1, s2)) means s1 is equal to s2.
If you look at strcmp like "the difference when you compare the strings for sorting", it makes complete sense. If the first mismatch of strcmp(a, b) is smaller than 0, you put the a before b, otherwise you put b before a.

The fact that an equality check can be made by running a comparison function is useful, but that's not all the method does.

For other methods in much C code, a common mindset is that a method returning a value will return the error code, with error code details in errno. The error code for success is 0, which is fitting of course.

Had C implemented booleans, this problem would never have been a problem, because if(int) wouldn't have been a legal expression, but sadly booleans are implemented as integers in the language instead. I strongly dislike languages that do allow implicit casting from integers and such to a boolean, if(var!=0) is much more readable because of the explicit boolean expression.

But then it's empty string vs null value all over again.

Besides it's about an improvement to boolean, not about adding one more optional value (which likely lead to optional values and definitely out of the boolean field).

I'm struggling to think of an example following the OP's example - discrete events that may have not occured yet - where you'd need to differentiate between a certain `false` and an uncertain `null`.
A question like "Do you have any allergies?" probably requires ternary logic: yes/no/unset

Discrete events are usually more binary by nature: a thing either happened or it didn't.

That said, if it's possible for an event to un-happen, you're back in ternary-land: there's now a distinction between un-set and false which may be important to capture.

There's a reason why relational databases use ternary logic when most of the rest of the computing world uses binary logic.

You might argue that you could just create a brand new event, but now you've almost assuredly changed the grain of your table and goofed up the primary key. Your nice normalized table is now a dumb, non-performant endless event log: good luck with indexing that table and tuning those SELECT queries.

The OP does not talk about events, OP talks about state and when it was set. There's no where in the article that OP restricts themselves to discrete events that may or may not have occured.
I'm not understanding the distinction. The only thing you get out of this schema is knowing what happened (implicitly defined by the field), whether it's happened yet, and when it happened. That feels like an event to me.

Fields where there isn't an discrete event don't work. E.g. is_dog_owner can become adopted_dog_at, but is_dog_lover can't become loved_dogs_at.

I'd actually even argue that this is not storing state directly. You derive state from knowing an event has occured in the past: deleted_at (event) => is_deleted (state).

Why not store "loved_dogs_at"? A potential use case could be to send some coupons to "new dog lover" within the last 2 weeks.
Well the article uses the example of swapping `is_published` for `published_at` — with a boolean you could have:

* NULL — never published (e.g. draft) * true — live now * false — previously live but explicitly unpublished

Which you miss if just a date. Similarly he talks about `is_signed_in` — NULL/true/false let's you model the case where a user has never signed in (e.g. an admin created your account but you've never used it) but NULL/timestamp missed this

In my case the null state was "not yet processed" i.e. a decision pending
Perl got this right decades ago with its 'undefined' status for unset variables, so you can tell the difference between false and undef
That make me think more of Javascript, which distinguishes between undefined values (usually means “not set”) and null values.

You’re correct that a Perl scalar can always be set to undef, which is the Perl name for null. But that’s not really unique to Perl. For instance, while a Java boolean can’t be null, a Java Boolean can be.

That depends on the language. In Clojure and ClojureScript, for example, distinguishing between nil and false is not a problem at all.
I'd assume most languages don't have a problem with distinguishing between nil and false. The article explicitly maps nil to false, thus I don't see the relevance of your comment.
the initial state is going to be false before any user input. if the user leaves a field blank, then that still counts as input, just run a check such as 'if blank then null else timestamp' on the insert
0 for never set, timestamp for set time, negate the timestamp for unset time
dates between October 4th and 15th 1582 for special values
You can use negative values for your alternative states.