| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by cheald 3491 days ago
	Influx + Telegraf + Grafana is such a simple, sweet stack. No work to maintain, trivial to set up, I can ship just about anything I want into it, and reporting is fast. With alerting in place now, I'm even happier than ever. A huge thank you to the Grafana team for solving a huge pain point!

2 comments

mattkrea 3491 days ago

What kind of volume are you sending into Influx? It crashed on me probably 5 times a day with only 100 requests per second.

link

cheald 3491 days ago

Right now it looks like it's around 50/sec. A lot of data points get rolled up by Telegraf on individual machines, and then it's shipped in via the UDP line protocol. I've written much larger volumes, though, and never had an issue with stability.

link

user5994461 3491 days ago

If I may ask. How is UDP doing for you?

I checked my graphite setup once. We had 27% of metrics lost over UDP. That was bad.

pro-tip: "netstat -anus" and look at the error counters.

link

cheald 3491 days ago

About 4% err-to-received ratio. That's probably due to untuned UDP buffer sizes though; despite dropped packets, we're getting enough information to provide the information we need.

link

hoov 3491 days ago

Was this an older build? We had serious issues at first, but our setup is pretty stable these days.

link

mattkrea 3491 days ago

Last I tried was 1.0.

I love everything about using Influx but it would die and never restart and every time it would some crash on semacquire. I'll have to try it again since I need to check out this Grafana update anyway.

link

scrollaway 3491 days ago

There's some setup involved if you're sending a decent amount of traffic to it.

The two game changers are using the UDP line protocol instead of HTTP, and making sure you are batch-processing inputs. Fixing these settings is the difference between an instances that crashes all the time, and a purring one.

link

agnivade 3490 days ago

sending data in batches gives serious performance improvements. Don't send metrics directly to influx from your app. Send them to an intermediary like statsD which will aggregate them and send it.

Shameless plug - I recently published a log router in Golang. It sends data to influx too ! (github.com/agnivade/funnel)

link

mattkrea 3491 days ago

Thank you. I'll check this out.

link

piranha 3491 days ago

I use riemann in front if influx, which collects data and forwards it once a second. Works nicely, especially given that I aggregate some more high volume metrics before sending them to influx.

link

RRRA 3491 days ago

What transport are you using to secure telegraf into influxdb?

(Haven't tried telegraf yet, setuping a prometheus at the moment)

link

alfalfasprout 3491 days ago

Not sure what you mean "secure telegraph into influxdb" but we've had great success with this stack for monitoring by just embedding an HTTP server into each application that needs to be monitored. We keep the HTTP server separate from any others used by the application (i.e. it runs on a separate thread) so performance isn't impacted.

link

RRRA 3491 days ago

My use case is one where I have servers in different datacenters and would want to have a simple, but secure, way to fetch metrics for graphing and alerts.

So, I meant encryption in transport, authentication, etc. as many solutions work well if you're monitoring "in the clear" from the backend, but not so much over the internet.

link

cheald 3491 days ago

We're deployed on AWS in multiple regions with VPNs set up between VPCs. No particular attention paid to securing the transport between Telegraf and Influx at the moment since a) it's either in an internal VPC or secured via ipsec, and b) our monitoring data is low-value enough that it doesn't warrant its own secure transport.

link

agnivade 3490 days ago

IIRC, Influx supports https too. So you just have to setup some certs and switch to https in the client.

link