Hacker News new | ask | show | jobs
by dijit 1762 days ago
If I have a regret in my observability stack I think it’s got to be influxdb.

I bought in to the TICK stack and planned on using an enterprise support contract when going to production, but every interaction with InfluxData the company has felt a bit sleazy. Trying to push very hard to the cloud offering for example.

That’s bad enough, but the documentation and observability of the database is quite poor, and it’s trivially easy to “vanish” all your data and lock your instance up for hours or days by changing the retention policy of a database. (Not making it much different).

Now of course it’s not TICK at all. More like “TI” as kapacitor and chonograph (dashboarding and alerting respectively) are deprecated products and rolled in to the main offering.

Added to that they completely changed the query language.

I have to say; pick something better if you can. TimescaleDB or Prometheus (which uses openTSDB) are promising.

2 comments

I looked into TimescaleDB, but didn't find a lot of support for monitoring agents that push data into Postgres. TimescaleDB is built on Postgres and uses the same mechanisms for ingesting data.

There's a plugin for Telegraf that looks promising, but it hasn't been merged yet.

Is anyone else using TimescaleDB? If so, what do you use to push monitoring data to it?

I recently built a janky system which runs nmon (plus a few custom rpi stats) every two minutes or so and pushes the output file to a watch folder. Another service uploads to server if network available. The server then ingests into timescale. Have been running it on two rpis and a few aws servers for the past few months.

Edit: I’m using grafana but was considering checking out apache superset.

How is the data sent into timescale? Do you run psql to load the data?
python
The solution I came up with but haven’t implemented yet is to use the collectd mqtt output plugin to get the data onto my broker (I use mqtt for other purposes, many of which should also end up in timescale) and then an mqtt to Postgres/Timescale bridge.
Yes, I can see how that could work. I may do something similar where I send my data to one system that can then forward it into Postgres/TimescaleDB.
Good luck waiting for the telegraf merge! I was watching a PR for another tsdb for 2 years before I switched jobs and stopped caring. I believe the PR is still open.
I used 1.x for my push-monitoring stack at my last job. (For cases where "pull" is practical, I would always use Prometheus. Prometheus also has "push" now, by the way.) They went into 2.0 mode and kind of neglected 1.x, and I kind of forgot about it. At the time, I was most familiar with an internal monitoring system at Google, and I found I couldn't do queries that I expected to be able to do. I even mentioned it on HN and some influx folks told me that what I wanted to do was too weird to support. (It's not. I was collecting byte counters from fiber CPEs, and wanted to have bandwidth charts based on topology tags I stored with the data -- imagine a SQL table like (serial_number text not null, time timestamp not null, locality text not null, bytes_sent int64 not null, bytes_received int64 not null). The problem was that timestamps would not be aligned between records in the same locality group -- I sampled these occasionally throughout the day and not all at the same instant. And, they were counters, not deltas, so the query would have to do the delta across each serial number, and then aggregate across all devices in a locality. Very possible to do, I literally had that chart with the other monitoring system. But not possible with the influx v1 querying, as far as I could tell.)

I set up 2.x for myself recently, and they have really done a lot of work. The OSS offering has most of the features that cloud/enterprise would. It was easy to set up -- they don't have any instructions for installing it in Kubernetes, and haven't updated their Helm charts for 2.x, but it was like 3 minutes to write a manifest (https://github.com/jrockway/jrock.us/tree/master/production/...) myself, which I prefer 99.9% of the time anyway. The new query language is incredibly verbose, but I see the steps that I remember having with Google's internal system, align, delta, aggregate... all possible. (I had to scratch my head a lot, though, to make it work. And I really am not able to reason about what operations it's doing, what's indexed or not indexed, why I ingest my data as rows but process it as columns, etc.) The performance is good, and it worked well for my use case of pushing data from my Intranet of Stuff. Generally I like it and I don't think they are being shady in any way. It's on my list of something to set up at work to collect various pieces of time series data outside of the Prometheus ecosystem (CI runtimes, etc.).

The reason I picked InfluxDB over TimescaleDB for my personal stuff is because InfluxDB has an HTTP API with built-in authentication. I already a ton of HTTP services exposed to the Internet, and I understand them well. (Yup, I have SSO and rate limiting and all that stuff for my personal projects ;) I can give each of my devices an API key from their web interface, and I make an HTTP request to write data. Very simple. (They have a client library, but honestly my main target is a Beaglebone, and it doesn't have enough memory to compile their client library. I've never seen "go build" run out of memory, but their client makes that happen. I shouldn't develop on my IoT device, of course, but it's just easier because it has Emacs and gopls, and all the sensors connected to the right bus. Was easier to just manually make the API calls than to cross-compile on my workstation and push the release build to the actual device.) TimescaleDB doesn't have that, because it's just Postgres. So I'd basically have to expose port 5432 to the world, create Postgres users for every device, generate a password, store that somewhere, etc. Then to ingest data, I'd connect to the database, tune my connection pool, retry failed requests manually, etc. Using HTTP gets me all that for free; I can just configure retries in Envoy.

But... SQL queries are a lot easier to figure out than FluxQL queries, and I already have good tools for manipulating raw data in Postgres (DataGrip is my preferred method), so I think I will likely be revisiting TimescaleDB. Honestly, I'd pay for a managed offering right now if they had a button in Google Cloud Console that was "Create Instance and by the way this just gets added to your GCP bill for 10% more than a normal Cloud SQL instance".