|
No worries, no hard feelings, I was just surprised that what I thought was a specific response was assumed to be a generic-ish bot response. (then again, I didn't spell out that the game I contribute to is Beyond All Reason.) I do totally sympathize with the feeling of being overwhelmed by AI slop. After digging into Traceway documentation, it looks like you were looking to primarily use OTEL for ingestion? Or would you say that's a misreading of the documentation and you actually support metrics, logging etc easily? It looks easy to setup via docker, I might try the SQLite version just to get a taste for how it works and how easily data can be ingested. For myself, I was initially interested in the Loki/Prometheus/Grafana stack but it wasn't going to fit in the 4GB of RAM I had available on a Raspberry Pi that was already hosting two services that consumed a GB of RAM each. So when I found VictoriaMetrics (a) happily ran in 200MB of RAM (b) was used by CERN (c) had excellent, comprehensive documentation with plenty of examples (d) supported so many different ingestion and export/reporting APIs that I would be able to set up everything I wanted for my homelab without any shim scripts or one-off API converters and (e) offered a basic reporting UI with sane defaults (auto-detecting rate vs sum for a graph) even without having to set up Grafana, I was blown away and grateful that such a useful thing existed. Same for VictoriaLogs, it was just easy to set up once I put my mind to it, because the documentation for everything was very clear, and they clearly had "sane defaults + configurable options" once you needed something slightly different. Having sane support for backfills and tolerating duplicates was also nice. "Throw us your data in one of these shapes , we'll sort it out" was just nice to finally see rather than digging through pages of Prometheus documentation for what the edge cases could be if I sent duplicates or the data was from a month ago. I just have a homelab of random docker container across a few nodes thrown together with underpowered hardware, but VictoriaMetrics met me where I was and made it trivial to experiment using the nodes I had rather than have to migrate to bigger nodes, and it was very well behaved at idle, steady-state, and "I want to trickle-feed a million data points via http calls" loads. I don't yet need OTEL, I don't have cattle, I have homelab pets and very little time to play with them. I just want to either scrape metrics or fire metrics at some sort of endpoint that can figure out what I meant if I get close enough. But VictoriaMetrics was so easy to get working because the documentation was laid out as "here's the starter command line options, here's how you ingest data in a variety of input URLs, here's how you retrieve your data via a variety of output URLs, if you want specialty stuff that's described farther down the page..." it was about as hard as falling off a log. It just became the obvious place to base anything else around because it had so many connectors and sane defaults. So when the Beyond All Reason infrastructure team asked "is there a infrastructure and application metrics solution for a handful of nodes that is self-hosted, easy to set up and won't break the bank or require babysitting?" I had one recommendation: VictoriaMetrics (+ Grafana) Admittedly I do sort of wish for unified metrics and logs and traces, but that's merely a platonic ideal dream state for me. In reality I can see that both I and an organization generally sets up metrics, or logs, or traces, in a piecemeal fashion. An organization (in my limited experience) generally doesn't think about all three at once, and so the "do one thing and do it well" becomes a nice simplification of scope rather than a mark against VictoriaMetrics or VictoriaLogs not having the whole enchilada under one common roof. I have not personally worked on scaling it horizontally yet, and I didn't set it up myself, but (a) I observe the Beyond All Reason VictoriaMetrics server has 8 GB of RAM, 3 vCPU and appears to serve 75k active time series (14.5 billion data points, ingest about 5 thousand data points per second) without complaint. The resource usage graphs are flat, humming quietly and (b) I did appreciate that the vmagent and vlagent do send to multiple targets easily (tested this with vlagent) , making "active -> standby fail-over" easy to setup -- all ingestion agents would multiplex to all sinks and you were done, any sink "should" have the same data. |