Hacker News new | ask | show | jobs
by Florin_Andrei 2263 days ago
I'm worried about this statement:

> Local storage is explicitly not production ready at this time.

https://cortexmetrics.io/docs/getting-started/getting-starte...

But I want a scale-out, multitenant implementation of Prometheus with local storage that's ready for prod. What are my options then? VictoriaMetrics?

4 comments

I suggest checking out M3DB[1]. My team & I use it to serve metrics for all of Uber, we have ~1500 hosts across various clusters. It's serving us quite well.

[1]: https://github.com/m3db/m3

The only one I know with "non-experimental" local-storage is VictoriaMetrics. But the big thing there is that data in VM is not replicated, so when you lose a disk/node, you lose that data.

Having said that, both Thanos and Cortex have experimental local-storage modes that are pretty good. You could also try them for now while they get production ready.

M3 provides local storage but is not experimental, on top of that with cluster replication which VictoriaMetrics does not provide, and has a kubernetes operator to help scale out a cluster.

Disclosure: I work on the TSDB underlying M3 (M3DB) at Uber. Still worth checking out though!

> data in VM is not replicated, so when you lose a disk/node, you lose that data

The vmstorage component in VictoriaMetrics Server - is it RAID0-like (stripping) or RAID1-like (mirroring)?

https://github.com/VictoriaMetrics/VictoriaMetrics/tree/clus...

It is easy to implement RAID1-like replication in VictoriaMetrics: just set up independent VictoriaMetrics instances (single-node or clusters) and replicate all the incoming data simultaneously to these instances. This can be done either via providing multiple `remote_write->url` values in Prometheus configs or via providing multiple `-remoteWrite.url` command-line flags in vmagent [1]. Then query multiple VictoriaMetrics replicas via Promxy [2].

[1] https://github.com/VictoriaMetrics/VictoriaMetrics/blob/mast...

[2] https://github.com/jacksontj/promxy

It is more like RAID0. VictoriaMetrics shards time series among available vmstorage nodes. I.e. each vmstorage node contains a part of data stored in the cluster. This is usually named shared nothing architecture [1].

As for data replication, VictoriaMetrics offloads this task to the underlying storage, since the replication is hard to make properly [2]. Proper replication must be able to perform the following tasks additionally to copying the data to multiple nodes:

* To heal the data (aka to return back replication factor) after a node becomes permanently unavailable. The healing process mustn't degrade cluster performance and it must properly handle other cases mentioned below.

* To gracefully handle temporary unavailability of nodes.

* To survive network partitioning when nodes are temporarily split into multiple isolated subnetworks.

* To handle data corruption.

* To continue accepting new data at normal rate when a part of nodes are unavailable.

* To continue serving incoming requests with acceptable latency when a part of nodes are unavailable.

* To replicate data among multiple availability zones (AZ), so the cluster should continue accepting new data and serving requests if a single AZ becomes unavailable.

I'm unsure whether popular systems that claim replication support can handle all the cases mentioned above. The only system that seems to handle these cases properly is GCP persistent disks based on Colossus storage [3]. That's why it is recommended storing VictoriaMetrics data on GCP persistent disks.

[1] https://en.wikipedia.org/wiki/Shared-nothing_architecture

[2] https://github.com/VictoriaMetrics/VictoriaMetrics/tree/clus...

[3] https://medium.com/google-cloud/persistent-disks-and-replica...

There are a bunch of different solutions out there; Thanos, Influx, federated Prometheus etc.

The local Cortex storage works pretty well but we have a very high bar for production worthiness. Right now I'd recommend using Bigtable of DynamoDB, and if you're on premise Cassandra. In the future the block storage will allow you to run minio.

Thanos is probably one of the other popular choices. It's being heavily used in production by a number of companies, but I don't think they've branded it at "Prod ready" in a 1.0 release though.
Thanos doesn't have production support for local storage either. The only stable storage providers for it are google, amazon, and azure's object stores.

https://thanos.io/storage.md/

Interestingly, it looks like Cortex's support for local storage and object stores comes from using Thanos's storage engine. So once it's production ready in Thanos it will probably be production-ready in Cortex shortly thereafter.

https://cortexmetrics.io/docs/operations/blocks-storage/

I think for Cortex your safest storage options now are Bigtable, DynamoDB, or Cassandra.

I may have misinterpreted what they meant by local storage! I was reading that as having a local copy of the TSDB available to Prometheus, (eg: how Thanos works) versus Cortex which doesn't store metrics locally (IIRC).

What you said is correct and makes sense. Though, I would suspect either choice works with any S3 compatible API that can run on local storage, but I know that isn't necessarily what's meant by "local storage".

"local storage" = I don't want to install yet another gizmo just to store data, nor do I want to use an external service for that

Batteries included.