Hacker News new | ask | show | jobs
by kiyoto 3874 days ago
While I'm happy to hear about a great success story of a great piece of open source software, Elasticsearch has done a great disservice by making application developers lazy about learning the ins and outs of various analytical/transactional/storage backend systems.

Echoing other commenters, Elasticsearch is hardly the best tool for many kinds of analytics. In fact, it is strictly not a good tool for several use cases. For starters:

1. It's not good at joining two or more data sources

2. It's not good at complex analytical processing like window functions (for example to calculating session length based on the deltas of consecutive timestamps partitioned by user_id and ordered by time).

Of course, it's also good at many things like simple filtering and aggregation against "real-time" data. Being in-memory really helps with performance, and with right tools, it's horizontally scalable. Elastic's commercial support is also not to be discounted.

However, as an old OLAP fart who spent years optimizing KDB+ queries, I am deeply concerned about the willful ignorance of data processing systems that I see among Elasticsearch fans. Just take my word for it and study Postgres (with c_store extension) and other real databases, in-memory or otherwise, open-source or proprietary, so that you won't be shooting yourself (or future co-workers) in the foot, trying to shoehorn Elasticsearch and its ilk into suboptimal workloads (To be fair, I see a similar tendency among Splunk zealots).

1 comments

> Of course, it's also good at many things like simple filtering and aggregation against "real-time" data.

And also fulltext search at scale, which is basically its primary use case.

PostgreSQL's fulltext search isn't quite at the same level. The last time I looked into its capabilities, it didn't fully support TF-IDF. (I don't think it keeps track of corpus frequencies for terms.) Interestingly, I think SQLite's fulltext support does include TF-IDF, but I could be misremembering.

I mean, the Elasticsearch docs are pretty clear that joining doesn't work well (or really, at all). I'm not sure how being clear about the trade offs of your software is "doing a disservice." Sometimes you don't need to store relational data. Sometimes you do need to store relational data, but the other benefits of Elasticsearch outweight shoehorning relational data into what is effectively a document database.

If your only complaint is that people misuse software... Well... Yeah. It's been happening for a while now. We should help educate others. I'm not sure your approach is the most constructive.