Hacker News new | ask | show | jobs
by mattkrea 3490 days ago
What kind of volume are you sending into Influx? It crashed on me probably 5 times a day with only 100 requests per second.
3 comments

Right now it looks like it's around 50/sec. A lot of data points get rolled up by Telegraf on individual machines, and then it's shipped in via the UDP line protocol. I've written much larger volumes, though, and never had an issue with stability.
If I may ask. How is UDP doing for you?

I checked my graphite setup once. We had 27% of metrics lost over UDP. That was bad.

pro-tip: "netstat -anus" and look at the error counters.

About 4% err-to-received ratio. That's probably due to untuned UDP buffer sizes though; despite dropped packets, we're getting enough information to provide the information we need.
Was this an older build? We had serious issues at first, but our setup is pretty stable these days.
Last I tried was 1.0.

I love everything about using Influx but it would die and never restart and every time it would some crash on semacquire. I'll have to try it again since I need to check out this Grafana update anyway.

There's some setup involved if you're sending a decent amount of traffic to it.

The two game changers are using the UDP line protocol instead of HTTP, and making sure you are batch-processing inputs. Fixing these settings is the difference between an instances that crashes all the time, and a purring one.

sending data in batches gives serious performance improvements. Don't send metrics directly to influx from your app. Send them to an intermediary like statsD which will aggregate them and send it.

Shameless plug - I recently published a log router in Golang. It sends data to influx too ! (github.com/agnivade/funnel)

Thank you. I'll check this out.
I use riemann in front if influx, which collects data and forwards it once a second. Works nicely, especially given that I aggregate some more high volume metrics before sending them to influx.