| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by hendzen 3425 days ago

I have some actual experience with KDB and MongoDB so I'm going to have to call bullshit.

How does KDB handle replication and failover? Or even high insert/update rates to datasets that exceed the size of memory? How do you shard KDB?

KDB doesn't support unicode text. Do you plan to only have English speaking users?

Yes, KDB excels at its relatively well defined niche of transforming and aggregating "smallish" (say 10 TB or less) numerical time series data. It would be a horrible choice for the backing store of a high throughput CRUD application...

What is it with KDB zealots thinking that KDB is the best database for every task? I swear, KDB is the Scientology of databases.

2 comments

geocar 3425 days ago

> How does KDB handle replication and failover?

With -r and (in my case) SO_REUSEPORT. Most people use a dedicated gateway (have seen custom tomcat stuff and haproxy).

Meanwhile, MongoDB doesn't actually replicate reliably (acking then losing anyway) and failover can crash cascade in the naïve configuration.

> Or even high insert/update rates to datasets that exceed the size of memory?

This is literally the KDB tickerplant model. Have an RDB that flushes out regularly (daily) to an HDB.

You can also just write to a log `:log upsert ...

> How do you shard KDB?

Same way you shard anything else? By picking a key and directing the query to the appropriate server. h[(first md5 k) mod count h] "query..."

> KDB doesn't support unicode text.

UTF8 is fine.

The number of times I've needed the first 5 code points (and not the first 5 bytes or the first 5 characters) in my life is zero. All that half-baked Unicode support in various languages (like MongoDB) just makes people think that they've solved a problem that they really haven't.

> Yes, KDB excels at its relatively well defined niche of transforming and aggregating "smallish" (say 10 TB or less) numerical time series data. It would be a horrible choice for the backing store of a high throughput CRUD application...

I use it in one of those big CRUD databases (digital marketing and tele-lead tracking).

> What is it with KDB zealots thinking that KDB is the best database for every task? I swear, KDB is the Scientology of databases.

Because it solves problems they have.

Even when I don't use KDB I use a similar architecture because it's the correct architecture, because I've had these problems for a lot longer than I've had KDB.

If it doesn't solve every problem I have, that's because I have work to do, not because it isn't great at the problems it does solve, and I don't shout at my hammer because it isn't a spoon.

However MongoDB doesn't solve any problem I've ever had: I've never needed a bag of objects/filesytem that loses data, or a binary blob that I cannot query. It's so famously "web scalable" it has made a joke of the very idea of being scalable.

link

hendzen 3424 days ago

> With -r and (in my case) SO_REUSEPORT. Most people use a dedicated gateway (have seen custom tomcat stuff and haproxy).

So as a KDB user you need to implement your own HA solution. That is strictly worse than MongoDB replication, even with its now-fixed bugs. Do you really think your homemade multi-master KDB system would pass Jepsen?

> This is literally the KDB tickerplant model. Have an RDB that flushes out regularly (daily) to an HDB.

Wat? That only works if data is immutable once written. Tweets are liked/deleted/etc. You could store an immutable log of user actions, but then you would have to reconstruct the current snapshot every time someone loads a timeline. It's entirely possible for someone to like/delete/RT an old tweet. Financial data is naturally partitioned because the order book clears at the end of every trading day - this doesn't apply to CRUD apps.

> UTF8 is fine

I think you misunderstand what I mean by unicode support. Does KDB support locale specific collations? Does it support normalization/canonicalization? Being able to index by code point is about 1% of the needed solution to build an i18n-proof product. Obviously that doesn't matter when you are dealing with normal KDB datasets like market data where e.g. asian names are represented with numbers.

> I use it in one of those big CRUD databases (digital marketing and tele-lead tracking).

Were you using it to store clickstream data? Or some other kind of immutable stream of events? That isn't really applicable to general CRUD applications.

Like I said - KDB is great for analyzing immutable streams of events. It's not a general purpose database for building CRUD applications. MongoDB tries to be a reasonable enough solution for many use cases, while KDB focuses on excelling at a small number. Both are valid approaches to building a database...

link

geocar 3424 days ago

> So as a KDB user you need to implement your own HA solution. That is strictly worse than MongoDB replication, even with its now-fixed bugs.

You simply cannot take MongoDB in its (near) default configuration, put it on AWS, and handle Twitter volumes.

I think "this broken tool is better than your working tool" represents a certain kind of madness that I can't argue with.

> Wat? That only works if data is immutable once written. Tweets are liked/deleted/etc. You could store an immutable log of user actions, but then you would have to reconstruct the current snapshot every time someone loads a timeline. It's entirely possible for someone to like/delete/RT an old tweet. Financial data is naturally partitioned because the order book clears at the end of every trading day - this doesn't apply to CRUD apps.

I don't know what your experience level is, but Financial data typically has many subscribers to that tickerplant, and build-up indexes an views representing the queries that consumers are actually going to be interested in. This is covered in the most basic of KDB tutorials[1]

When a user loads a timeline, ideally you want to hit a single machine on a single query to a machine near the viewing user. Processes representing tweet consumers subscribe to the ticker plant, and build-up the indexes of what information they're going to need to publish. You're also going to need to have a fast index-by-publisher as well, so that when a subscriber wants to follow someone, we don't need a replay -- again, more indexes, but at least these can be "centrally" located.

This isn't even a remotely difficult problem to solve with the right tools.

[1]: http://code.kx.com/wiki/Startingkdbplus/tick

> I think you misunderstand what I mean by unicode support. Does KDB support locale specific collations? Does it support normalization/canonicalization? Being able to index by code point is about 1% of the needed solution to build an i18n-proof product. Obviously that doesn't matter when you are dealing with normal KDB datasets like market data where e.g. asian names are represented with numbers.

If I misunderstand you, it is because you are unclear.

JavaScript, C and C++ don't actually support "locale specific collations" even though there are well-maintained and well-distributed collation and localisation libraries that people can use.

That "iasc" doesn't know the difference between Chinese and American spellings for a word is irrelevant. I can solve the problems I have with my tools, and building sort keys on my symbol tables for each locale means that the user-visible aspects of sorting remain instantaneous, instead of being tricked into doing stupid shit like x.toLocaleString(user.getLocale()) which is slow at Twitter scale.

This is part of what I mean by "engineered correctly": The tools that are available to us can trick us into thinking certain problems are solved when they aren't.

> Were you using it to store clickstream data? Or some other kind of immutable stream of events? That isn't really applicable to general CRUD applications.

Tele-lead means (outbound) phone calls for the purpose of lead-generation, so I have phone calls and the results of those calls in KDB. It's not "big data" by any stretch of the imagination.

> Like I said - KDB is great for analyzing immutable streams of events. It's not a general purpose database for building CRUD applications. MongoDB tries to be a reasonable enough solution for many use cases, while KDB focuses on excelling at a small number. Both are valid approaches to building a database...

MongoDB is not a valid approach full stop: Build dogshit and then try to pepper over the bad press with "the new version isn't dogshit anymore" every few years is negligent at best, and pays dividends with the fact that it makes it easy to identify inexperienced engineers.

That KDB is not as accessible as MongoDB is Kx's problem, and not KDB's problem.

link

gd1 3425 days ago

>KDB doesn't support unicode text.

Unicode (from 2011):

http://code.kx.com/wiki/Cookbook/Unicode

link

hendzen 3424 days ago

I replied to a sibling with more details. Indexing by code point is only the smallest (and easiest to solve) part of the problem of dealing with non ASCII text.

link