Hacker News new | ask | show | jobs
by AndyNemmity 1549 days ago
Okay, but why? I am using Thanos today. It works, it's complex, when it breaks, it's a bit of a challenge to fix, but it happens. It doesn't break often.

It does the job. Mimir, which is based on Cortex, using either Mimir, or Cortex, what benefit am I getting?

I get asked every few months about moving off of Thanos to Cortex, and today now Mimir, and I don't have any substantial reason to do so. It feels like moving for the sake of moving.

I need to see some real reasoning as to why I am going to add value to move everything to Mimir.

2 comments

Sounds like Thanos is working well for you, so in your position I wouldn't change anything.

There are a bunch of other reasons why people might choose Mimir; perhaps they have out grown some of the scalability limits, or perhaps they want faster high cardinality queries, or a different take on multi-tenancy.

Do remember Cortex (on which Mimir is based) predates Thanos as a project; Thanos was started to pursue a different architecture and storage concept. Thanos storage was clearly the way forward, so we adopted it. The architectures are still different: Thanos is "edge"-style IMO, Mimir is more centralised. Some people have a preference for one over the other.

That's fair, thanks for the input. The only reason we implemented Thanos in the first place was a particular feature that we needed at the time of implementation. Now using it in an extremely large environment, I haven't seen any scalability limits. Speed of queries isn't a driver of anything.

Multi Tenancy certainly is, but we have our own custom multi tenancy solution over top of it we built ourselves. I'd like to get rid of that ultimately, but we're not utilizing whatever multi tenant features exist at the moment. Perhaps that will be a driver.

Appreciate your thoughts.

We were struggling with Cortex a couple years ago, then we tried VictoriaMetrics and haven't look back. It goes pretty much unattended with just monitoring disk space to make sure we still have room to continue pouring in metrics. When a component crashes (not often) it recovers pretty much without noticing.