Hacker News new | ask | show | jobs
by pointyhatuk 4815 days ago
Ugh another thing I really don't want in the database. Seriously, stuff like this will knacker your scalability over time.

I only say this because I've been there, with SQL Server's XML processing stuff, then spent nearly 2 years getting rid of it.

4 comments

Can you elaborate on the scaling issues? Being able to query into JSON fields (including functional indexes) is a great helper for denormalization which is very good for actually increasing scalability.

For me, the native JSON support is a very handy tool to have in the toolbelt for parts of our application that have a very loose schema.

Well your database is a black box technically speaking. It's very hard to scale it horizontally and it is very expensive to scale it vertically as time goes on.

Logic suggests that you should keep as much processing functionality outside something which can't be scaled cheaply or easily and push it to cheaper front end servers.

On this basis, anything which implies more work than collecting and shifting the data over the wire shouldn't really be in the database. Parsing / processing JSON is one of those things that's going to eat CPU/memory.

Fundamentally there's nothing wrong with storing JSON inside the database and processing it externally, but processing it inside the database is a big risk.

I've seen the same thing over the years with XML in the database and more recently people adding CLR code to SQL Server stored procedures.

Hmm, I'd have to disagree with you on that. It is true that JSON processing could reduce raw scalalability in the sense that a query that uses JSON may be slower than one without. However, having JSON processing in the database simplifies quite a bit. For instance, imagine an app that processes and emits JSON and also uses a normalized database. This functionality now makes it possible to move some of the JSON processing closer to the database. In some cases, this may not be the best idea, and in others, it's a win. As with many things, having the choice is not the problem; it's how one chooses, given the choice. I can see the benefits of XML in the database, but I can also see how it could be misused. The key, as always, is to apply judicious thought to the problem.
Actually, these functions seem to support designs where the database is normalised, but front ends are emitting JSON.
Actually people USE JSON blob fields to increase scalability (as far as back as FriendFeed).

Sure, doing this hampers normalization and single server speed, but if the queries are parallelizable with shards et al, what would hamper scalability?