Hacker News new | ask | show | jobs
by Alex3917 2142 days ago
> If you find yourself implementing your own validation and schema code for JSON fields, I'd say it's a sign that you should probably stop and migrate the data to Django models instead.

Don't agree. Where JSON fields with validators are useful is when you would otherwise have inherited models for different variants of the same type of thing. It's a lot easier to understand and query one model with 50 Cerberus (or whatever) schema validators than it is to understand and query 50 Django models. With the JSON approach it leaves one part of the codebase that's complicated to understand, but you mostly shouldn't need to understand it. Whereas when you have tons of different models and each one needs to be used in different places and is queried in different ways and has its own serializers, the mess tends to spread throughout the entire codebase.

2 comments

I am skeptical (with an open mind) of the JSONField pattern. We’ve used it in a few places where you have something like inheritance, so you want to handle all Event objects the same at the top level, and then dispatch differently based on Type. This works fine.

My concern is that this is a pattern which experts advocate, but less precise/experienced engineers are likely to break when it comes time for a data migration.

With normal SQL fields, if you change the schema you need to create a migration, and that gives an easy point to check for datamigrations too. If any commit can silently change the write-schema, it’s a lot harder to police.

So I suspect if your team skews towards very experienced engineers the JSONField pattern probably gains in value. If that’s right it’s probably best to caveat the recommendation.

Interested to know your experiences with schema changes though.

I’m also interested to note that you are pointing out a general issue with Django I’ve experienced - the Models spread through the whole system. You can solve that in other ways, say by having a repository layer that used normal (non-JSONField) models and maps them through to a POPO. I’m not sure you _need_ the JSONField pattern to get the separation you’re looking for.

> If any commit can silently change the write-schema, it’s a lot harder to police.

In addition to schema validators, you really also need a data integrity async task that runs once a day to catch these kinds of issues. They're not that hard to fix if you catch them early, it's if you realize there's a data integrity issue years later and the original team isn't even there anymore that you get bigger problems. If you have a lot of data, a reasonable solution is just having it run on everything from the last 24 hours plus a random sampling of data older than that.

> Interested to know your experiences with schema changes though.

Haven't done a ton, so admittedly there could be complex cases I'm not seeing. I think it helps though, in cases like the example I gave of using it for form-like things, to set it up like:

  QuizModel
  QuizQuestionModel (FK to QuizModel)

  QuizResponseModel (FKs to QuizModel, UserModel)
  QuizQuestionAnswerModel (FKs to QuizResponseModel, QuizQuestionModel)
This way you have four models rather than an indefinite number. But because it's not just one big model with lists of JSON objects for questions and responses, this makes running migrations and data integrity checks much simpler to write and easier to reason about.

The idea being also that you have JSON schema validators for the different types of JSON blobs that get stored in QuizQuestionModel and QuizQuestionAnswerModel.

Like I said, there are some cases where using JSON fields makes sense. Good examples would be cases where you wish to retain the structure of the data, but don't have control over it. Sounds like you're describing a case like that.
I’ve used it more for generating forms, e.g. for when there are different types of questions or modules can ask or show the user —- multiple choice, free response, upload an image, watch a video, etc. So I’ve used JSON here both for storing the questions and responses, and then have used validators for each question type and response type. The downside is that it’s hard to do migrations. But in most cases you can just save the rendered HTML each user is shown and that covers you for having an audit trail, rather than maintaining backwards compatibility indefinitely. And if you do need to maintain compatibility forever, the schema validators enable that.

Having done or seen almost the exact same use for JSON at multiple startups, for whatever downsides there are I think it’s 10x better than needing to try to understand a bunch of models, each with their own queries and logic.