Hacker News new | ask | show | jobs
by FridgeSeal 2646 days ago
Talk about perfect timing!

I was looking for something just like this for a project in my team. We had been using this setup where a huge chunk of the data was being stored in triplicate: some of it in ES, some more of it in another database and finally the whole dataset in our data warehouse.

Hopefully I can use this to only provide the index + full text capability and just use the warehouse itself as the main db because the query performance is similar enough and the warehouse is criminally underused for what we pay for it.

2 comments

What's preventing you from using ES for everything? Slow writes?
Write speed is fine, it’s more the fact that the dataset is reasonably large, and to run an instance with enough capacity and nodes (even with spill to disk), is silly expensive.
Being in a simular situation what do you consider "reasonably large"?
250-300GB.

Not large by absolute standards sure, but large enough to cause issues.

I’m sure there’s some kind of solution that involves re-architecting the ES cluster and indices and re-architecting the data flows and stuff. But if our options are go through all that, or seriously slim down our architecture and costs by just running Sonic + our data warehouse, I’m definitely going to give it a go. After all, worst comes to worst we can go down the re-architecting ES route if Sonic doesn’t work out.

¯\_(ツ)_/¯

I’d be curious what your expectations and constraints are, but from my experience of running clusters in the double digit TB-Size my ballpark figure for that amount of data would be 2 medium size data nodes and a small tiebreaker. Alternatively, if you can live with the reduced resilience and availability, even a single node might just do. Depends on the expectated churn though, ES really does not like document updates.
That does not sound like a good idea. You can't even maintain a quorum of 2 replicas with n=3 on a cluster like that. Losing one data node would be disastrous.
From what I've learned, running any cluster on fewer than four nodes is not really recommended.
Is that double digit TB on ElasticSearch?
Curious to know if you considered https://www.algolia.com/ as well?
Algolia is a beautiful product but it’s expensive. At just 1 mil items you’re already paying 500 a month