| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mattkrea 3490 days ago
	What kind of volume are you sending into Influx? It crashed on me probably 5 times a day with only 100 requests per second.

3 comments

cheald 3490 days ago

Right now it looks like it's around 50/sec. A lot of data points get rolled up by Telegraf on individual machines, and then it's shipped in via the UDP line protocol. I've written much larger volumes, though, and never had an issue with stability.

link

user5994461 3490 days ago

If I may ask. How is UDP doing for you?

I checked my graphite setup once. We had 27% of metrics lost over UDP. That was bad.

pro-tip: "netstat -anus" and look at the error counters.

link

cheald 3490 days ago

About 4% err-to-received ratio. That's probably due to untuned UDP buffer sizes though; despite dropped packets, we're getting enough information to provide the information we need.

link

hoov 3490 days ago

Was this an older build? We had serious issues at first, but our setup is pretty stable these days.

link

mattkrea 3490 days ago

Last I tried was 1.0.

I love everything about using Influx but it would die and never restart and every time it would some crash on semacquire. I'll have to try it again since I need to check out this Grafana update anyway.

link

scrollaway 3490 days ago

There's some setup involved if you're sending a decent amount of traffic to it.

The two game changers are using the UDP line protocol instead of HTTP, and making sure you are batch-processing inputs. Fixing these settings is the difference between an instances that crashes all the time, and a purring one.

link

agnivade 3489 days ago

sending data in batches gives serious performance improvements. Don't send metrics directly to influx from your app. Send them to an intermediary like statsD which will aggregate them and send it.

Shameless plug - I recently published a log router in Golang. It sends data to influx too ! (github.com/agnivade/funnel)

link

mattkrea 3490 days ago

Thank you. I'll check this out.

link

piranha 3490 days ago

I use riemann in front if influx, which collects data and forwards it once a second. Works nicely, especially given that I aggregate some more high volume metrics before sending them to influx.

link