Hacker News new | ask | show | jobs
by clnhlzmn 1771 days ago
> I mean, you can still use it, but you have to do your own type checking/coercion in code

Can you explain why you wouldn't have to check what you're putting into the database? If you're just stuffing values of unknown type into a statically typed DB you're going to get errors up front and if you stuff values of unknown type into SQLite you might get errors down the road. Either way you probably should know what you're putting into the DB in the first place.

3 comments

It's not so much that one is stuffing random/unknown types into the database without type-checking. Programming language types do not map exactly to database types even in statically typed DBs. ORMs manage this translation for you (imperfectly at times), but for those of those of us work with directly with SQL, we do typecheck, otherwise the INSERTs will fail.

It's more that static database types provide a standard contractual interface that is enforced (agnostic of application and programming language) and has reproducible behavior upon retrieval.

The advantage of static types is the guarantee that if a data point is successfully INSERTed, it can be successfully retrieved in the future.

In a dynamically typed DB you have a problem of standardization across codebases -- every new program/microservice that is written will have to use the exact same typechecking code, or else risk future retrieval issues. Those that do type coercion on the other end are essentially guessing and hoping that the original type was successfully reproduced upon retrieval. Plus if you work with different programming languages, that same code has to be ported to all the different languages. You also find yourselves having to reinvent the wheel a lot -- for instance the SQL DECIMAL type. In SQLite you can either store it as an INTEGER or REAL, and then either store metadata in another field or create a specific function to retrieve the INTEGER and recreate the specific DECIMAL type with the right number of digits (say DECIMAL(18,0)).

On the other hand you can rely on SQL types and get all this for free and have the assurance that it will work impeccably.

SQLite doesn't fit perfectly cleanly into this, but, for me, the big distinction is between schema-on-read and schema-on-write. This captures the fact that you always need to have some sort of schema constraint when you're working with data. The question is, do you apply those constraints when you're writing the data, or when you're reading it?

Schema-on-write is useful when you know exactly what the schema should be ahead of time, and it's static. That happens quite often in OLTP and business intelligence applications, but that's not always what you're doing.

Schema-on-read is very useful in those situations where the schema constraints you need can't be known ahead of time, or might vary from situation to situation. At that point you're kind of stuck delaying the schema bits (including making sure everything is an appropriate type) until the last minute. This comes up in, for example, big data applications such as data lakes.

Schema-on-read vs Schema-on-write are definitely relevant concepts.

I think SQLite sort of straddles the two -- it's sort of a loose schema-on-write (with type affinity to its base types), and schema-on-read (with respect to more complex types which is inferred from what is stored in base types)

Agreed. Which kind of makes it even more complicated to talk about. Whether its approach offers the best of both worlds, or the worst of both words, depends on your application, and the dividing line doesn't cleanly follow any of the distinctions we normally make among types of application.
> Either way you probably should know what you're putting into the DB in the first place.

Yeah, well, one good way of finding that out in practice is that reasonable RDBMS systems tell you that when you try to stuff the stuff in.