Hacker News new | ask | show | jobs
by PedroBatista 150 days ago
I really never understood how people could store very important information in ES like it was a database.

Even if they don't understand what ES is and what a "normal" database is, I'm sure some of those people run into issues where their "db" got either corrupted of lost data even when testing and building their system around it. This is and was general knowledge at the time, it was no secret that from time to time things got corrupted and indexes needed to be rebuilt.

Doesn't happen all the time, but way greater than zero times and it's understandable because Lucene is not a DB engine or "DB grade" storage engine, they had other more important things to solve in their domain.

So when I read stories of data loss and things going South, I don't have sympathy for anyone involved other than the unsuspecting final clients. These people knew or more or less knew and choose to ignore and be lazy.

6 comments

> I really never understood how people could store very important information in ES like it was a database.

I agree.

Its been a while since I touched it, but as far as I can remember ES has never pretended to be your primary store of information. It was mostly juniors that reached for it for transaction processing, and I had to disabuse them of the notion that it was fit for purpose there.

ES is for building a searchable replica of your data. Every ES deployment I made or consulted sourced its data from some other durable store, and the only thing that wrote to it were replication processes or backfills.

I've managed a 100+ node cluster for years without seeing any corruption. Where are you getting this from?
I'm actually struggling to imagine exactly what warrants a 100+ node cluster of ES?
we had something like this to scale out for higher throughput. just in the 10's of thousands requests per second required 100+ nodes simply because each query would have a expensive scatter and gather
They market it as a general purpose store. Successfully, even though hc cs wizards wouldn’t touch it ever, c suite likes it

Best example is IoT marketing, as if it can handle the load without bazillion shards, and since when does a text engine want telemetry

Neither the blogpost(beside consistency which most people don't care much about) nor your post describe any issue.

> things got corrupted and indexes needed to be rebuilt.

How is postgres and elastic any different here?

I’ve no experience with Elastic but what they’re getting at I think is indexes in Elastic is actually your data because that’s all it does due to the purpose it was built for, whereas in Postgres indexes are, well, indexes — that is, derived data, not the source of truth.
But if data is corrupt, how is rebuilding index fixing anything. What kind of corruption are we talking about.
We only used it on top of the primary databases, just like many other components for scaling or auxiliary functionalities. Not sure how others use it
usually in companies, people have a main durable store of information that is then streamed to other databases that store a transformation of this data with some augmentation.

these new data stores don't usually require that level of durability or reliability.