Hacker News new | ask | show | jobs
by gt565k 2612 days ago
WOW. Hahahaha.

This is a massive misconfiguration of an elastic search cluster. 50k indices? 500 documents per index?

500 records per index at 5shards/index is 100 records per shard.

Yeah, let's shard our data so much that we introduce tremendous amounts of disk i/o overhead!!!

Author should learn how to properly configure an ES cluster before posting ridiculous benchmarks like this.

What an utter pile of garbage benchmark this is.

3 comments

Oh fuck me, I didn't even realize they used a single instance (node).

To expand a little bit, the whole point of using multiple shards per index in an ES cluster is so that the shards spread across multiple nodes (servers) and distribute the load (disk i/o) and handle redundancy. ES automatically scales and reshuffles its shards across multiple nodes in the cluster to handle fault-tolerance as well. If one or more nodes go down, the cluster still has all of the data through replica shards etc...

Either way, in this particular case, the data is so small, having 5 shards per index with 50k indices results in 250k shards for 5GBs of data.

5GB / 250k shards = 20kb per shard.

You have shards of size ~ 20kb ... total cluster misconfiguration.

Isn't that exactly what they're trying to demonstrate though? That all this arcana you have to invoke to get a stable ES cluster barely breaks a sweat on Redisearch?

The specific test deployment was multitenant anyway-- you can't account or optimize for what tenants are going to index.

I'm not familiar with RediSearch, but I'm just trying to point out that you can't misconfigure ES and then benchmark against a misconfigured cluster. This is comparing apples to oranges. Not to mention I'm not sure of the feature difference between the 2 search engines, but I'd bet ES is much more feature rich, thus its use cases are vastly different. If you are just comparing text search, sure, maybe redis is faster. But at that point, so is a simple sql database, when compared to a misconfigured ES cluster.
I'm not familiar with Redisearch either but I agree, there's definitely a bent to this in that they made something that isn't ES, then compare it to ES via a a benchmark that shows how poorly ES performs at...being something other than ES.

The impression I got was that they were trying to demonstrate for two specific workloads, how much more a single node of RS can do than a single node of ES, and that we should extrapolate the savings and performance if scaled out from there.

A properly deployed ES cluster versus a single RS node isn't a fair comparison either. It's a strained comparison in any case.

"The specific test deployment was multitenant anyway-- you can't account or optimize for what tenants are going to index."

So in other words:

"If your specific use case is supporting 50,000 customers each having around 500 documents and only needing basic text search queries and relevance is not a major concern, RedisLabs Search might give you better performance than ElasticSearch!"

(This is assuming there isn't a different way to configure ElasticSearch to work for this scenario, that gives similar performance.)

I wouldn't call that "arcana", that's just ES and lucene basics
this is how comparison benchmarks are done when you need to reach certain results. I've even had it done to me at my job!

When you point out the flawed methodology you come across like a luddite or sour grapes or whatever else.

You just say "artificial tests produce artificial results, bye".