| HN Mirror

> So as a KDB user you need to implement your own HA solution. That is strictly worse than MongoDB replication, even with its now-fixed bugs.

You simply cannot take MongoDB in its (near) default configuration, put it on AWS, and handle Twitter volumes.

I think "this broken tool is better than your working tool" represents a certain kind of madness that I can't argue with.

> Wat? That only works if data is immutable once written. Tweets are liked/deleted/etc. You could store an immutable log of user actions, but then you would have to reconstruct the current snapshot every time someone loads a timeline. It's entirely possible for someone to like/delete/RT an old tweet. Financial data is naturally partitioned because the order book clears at the end of every trading day - this doesn't apply to CRUD apps.

I don't know what your experience level is, but Financial data typically has many subscribers to that tickerplant, and build-up indexes an views representing the queries that consumers are actually going to be interested in. This is covered in the most basic of KDB tutorials[1]

When a user loads a timeline, ideally you want to hit a single machine on a single query to a machine near the viewing user. Processes representing tweet consumers subscribe to the ticker plant, and build-up the indexes of what information they're going to need to publish. You're also going to need to have a fast index-by-publisher as well, so that when a subscriber wants to follow someone, we don't need a replay -- again, more indexes, but at least these can be "centrally" located.

This isn't even a remotely difficult problem to solve with the right tools.

[1]: http://code.kx.com/wiki/Startingkdbplus/tick

> I think you misunderstand what I mean by unicode support. Does KDB support locale specific collations? Does it support normalization/canonicalization? Being able to index by code point is about 1% of the needed solution to build an i18n-proof product. Obviously that doesn't matter when you are dealing with normal KDB datasets like market data where e.g. asian names are represented with numbers.

If I misunderstand you, it is because you are unclear.

JavaScript, C and C++ don't actually support "locale specific collations" even though there are well-maintained and well-distributed collation and localisation libraries that people can use.

That "iasc" doesn't know the difference between Chinese and American spellings for a word is irrelevant. I can solve the problems I have with my tools, and building sort keys on my symbol tables for each locale means that the user-visible aspects of sorting remain instantaneous, instead of being tricked into doing stupid shit like x.toLocaleString(user.getLocale()) which is slow at Twitter scale.

This is part of what I mean by "engineered correctly": The tools that are available to us can trick us into thinking certain problems are solved when they aren't.

> Were you using it to store clickstream data? Or some other kind of immutable stream of events? That isn't really applicable to general CRUD applications.

Tele-lead means (outbound) phone calls for the purpose of lead-generation, so I have phone calls and the results of those calls in KDB. It's not "big data" by any stretch of the imagination.

> Like I said - KDB is great for analyzing immutable streams of events. It's not a general purpose database for building CRUD applications. MongoDB tries to be a reasonable enough solution for many use cases, while KDB focuses on excelling at a small number. Both are valid approaches to building a database...

MongoDB is not a valid approach full stop: Build dogshit and then try to pepper over the bad press with "the new version isn't dogshit anymore" every few years is negligent at best, and pays dividends with the fact that it makes it easy to identify inexperienced engineers.

That KDB is not as accessible as MongoDB is Kx's problem, and not KDB's problem.