|
|
|
|
|
by aprdm
2160 days ago
|
|
> Datadog has incredible integration with clouds (AWS and other), databases (postgresql, mysql, cassandra) and middleware (haproxy, kafka). It can capture all the metrics from all these out of the box with minimal effort, whereas you have to crawl hundreds of broken plugins for prometheus to get one third of that. Interesting, I haven't had this experience. I monitor the DBs and middleware you mention and the OSS plugins + OSS grafana boards worked quite out of the box. For what is worth we have around ~20 different technologies for DB and middleware. We aren't using cloud since we have our own datacenters so there could be a big difference in usage. As far as prometheus doesn't scale I don't know I agree. We have more than 5k hosts currently on it and is working fine. We do use some strategies like recorded queries and federation which are well documented. |
|
The cloud does make a difference. Just seeing the daily S3 usage per bucket was life changing. Immediately found that backups were not expiring after a while as they should, costing more and more money. ^^
Do you know how many metrics you are ingesting in prometheus? storage size? and how many tags per host? We were reaching 1 TB of memory usage (mmap) on our server with 1500 hosts. Prometheus was literally grinding to a halt or crashing, was forced to cut down some metrics and stick to the absolute minimum tags.
Try prometheus_tsdb_head_series and prometheus_tsdb_storage_blocks_bytes or du command on the directory.