Hacker News new | ask | show | jobs
NGINX syslog-ing without breaking the bank or patching the code (syshero.org)
78 points by cassianoaquino 4595 days ago
10 comments

Relevant comment: https://news.ycombinator.com/item?id=6799328

" CloudFlare generates 50gb/s of logs globally and have handled collecting this volume in two ways. Historically the logs are sent to a local syslog-ng through the use of a PIPE and the forwarded to central logger. This can be done with nginx with no patches by just treating the PIPE as file. Just make sure you do a little buffering inside nginx.

access_log /dev/nginx_access log_format_name buffer=64k flush=10s;

Since this is a pipe there is still some blocking IO, but no worse off then writing to local file."

If the pipe consumer (i.e. your syslog server) stalls, then it will be much worse than writing to a local file, as it could cause a denial of service.

I would still configure nginx to write to a file and have the consumer tail it to avoid this situation. Properly-configured log rotation can keep the file size within reasonable bounds.

Related question - how do you configure logrotate to honour size immediately? In my (limited) experience, logrotate runs once overnight. If you have a log file that expands by 1gb per hour, and you want to rotate it every 500mb, how do you instruct logrotate to handle it?

Do you just crontab logrotate for short intervals, and will that trigger "daily" rotations each time for other configurations?

There's no reason why you can't run logrotate more than once a day. You can run a separate instance every 15 minutes if you like, with a separate configuration file that only rotates the short-lived logs.
Ok that's the (easy) thing I was missing. Instead of using the main logrotate.conf (that includes logrotate.d/*), use a fully separate logrotate.conf. Thanks.
>If the pipe consumer (i.e. your syslog server) stalls, then it will be much worse than writing to a local file, as it could cause a denial of service.

But how likely is that to happen in any circumstance that doesn't also bring down nginx anyways? Obviously in the case of a large setup like that they don't care, it just gets removed from the pool regardless of why it is failing.

Extremely likely. I've seen the rsyslog daemon die randomly many times, for example. Whereas in 6+ years of using Nginx I've never seen it fall over a single time. Web serving is usually a primary purpose for a server. You do not want your web site to be down just because the machine's syslog daemon died.
Isn't this a reason to replace the syslog daemon rather than the webserver?
It's a reason to keep things as loosely coupled as possible. And when I say I've seen rsyslog crash many times, I'm talking about over the course of years and thousands of servers.
Our notions of "extremely likely" differ then. I have never had syslogd crash. If you have, you should be switching to a different syslogd.
Writing to a pipe is different from writing to a local file, only because you have to be aware of what's going on behind the scenes.

If the other side of the pipe closes (e.g. syslog-ng), your process will hang once the I/O buffer fills up, which is generally 4KB.

You especially need to keep this in mind, when syslog-ng restarts (pipe closed, pipe reopened). You'll probably glance quickly to see that syslog-ng is running, and the pipe is present, but wonder why nginx is not serving traffic.

Or worse, be fooled into thinking nginx is fine because it was serving traffic up until it served 50 requests (or however many reqs it takes to generate 4,000 bytes of access logs), then stopped.

This seems dangerous -- if rsyslog crashes then the pipe is left without a consumer and all writes to it block indefinitely, exactly the opposite of how you would expect syslog to work.
If you were using systemd, I'd imagine you'd tell it to restart your syslogger automagically, and use ExecStartPost or ExecReload to tickle nginx.
I'm a little concerned that these days to go-to answer for a crashing daemon is to have another daemon that watches it and restarts it.

Who watches the watchers? What happened to writing things that don't break?

Whenever I hear this complaint the first thing I think about is this Douglas Adams quote:

  The major difference between a thing that might go wrong 
  and a thing that cannot possibly go wrong is that when a 
  thing that cannot possibly go wrong goes wrong it usually 
  turns out to be impossible to get at or repair.
Personally when doing operations related stuff I like to assume that everything will eventually break - though, hopefully not all at once.
That quote is entirely inapplicable though. It is in fact much easier to fix the simple, do one thing and do it well unix style applications. They are not hard to get at or repair. This is one area where I think the linux philosophy winning out over the unix philosophy has really cost us a lot.
If your PID 1 is screwed, the ordering of syslogd/nginx restarts is the least of your worries.
Perhaps, but there are plenty of scenarios where restarting won't help (e.g. daemon misconfiguration).
Yes, and it will still hang in the amount of time it takes to restart rsyslog.
You can just send an USR1 signal to nginx when you restart rsyslog (hopefully with some kind of supervisor). With Upstart, you could use a post-start script in the rsyslog service configuration.
"Devops" can be considered a philosophy, a methodology, or a movement. It doesn't mean "ops monkey smart enough to code."

Referring to a human as a "devop" is equivalent to calling someone an "agile".

It seems like a reasonable shortening of "devops engineer." If we can call people "quants," I don't suppose "devop" is all that much sillier.
Operations Engineer/Developer
Actually it usually means a dev who knows ops stuff as well.

Realistically its just a job title that means paying technical architects less money...

s/philosophy/affliction
I suggested moving syslog support into open source nginx, and it's been accepted. Maybe show some support for it here http://trac.nginx.org/nginx/ticket/409
Nice work!

We write the logs to disk and then use rsyslogs imfile feature to read from there. Your approach has the advantage of not requiring disk writes.

BTW, we're not in Toronto, but we're hiring and happy to accept remote workers from that timezone :)

http://www.bashton.com/jobs/

put your logs in /dev/shm

oh and don't remember to rotate them!

If you need guaranteed log delivery, I wouldn't do that. Unread logs won't survive a power failure or system crash.
Aren't you also not ensuring delivery when using remote syslogging in default mode? I believe this is all logged via UDP, so if the network or syslog host is overloaded your syslog messages will be silently dropped.
In default mode, sure. But modern syslog daemons also support TCP transports. And both rsyslog and syslog-ng have commercial versions that buffer logs to disk (though double-buffering isn't necessary here where the source is a log file already on disk).
That's correct. rsyslog also supports TCP though.
Writing to /var/log doesn't get you guaranteed delivery either, unless you're calling fsync() after every write.
You can also write a pretty simple program to do essentially `tail -F /var/log/nginx/access.log | logger --server some.host`. I'm doing something like that. It works, but I still don't like it. The idea of nginx hanging while trying to write to FIFOs I like even less.
The issue of the article the OP was a response to is that the traffic they are expecting is too big for the disk nginx is running on to handle the log files.

Your solution works for centralising the log, but it doesn't solve the local i/o issue as the file is still being written (which was the problem of the original article).

Easily solved by writing the logs to a filesystem in RAM (e.g. tmpfs). Have your little program truncate them as often as you need.
Working on a similar problem in our stack and didn't know about the limitations on NGINX. I used syslog-ng years ago but now we're on rsyslog. I feel like this is squarely a devops problem and one where the software producing logs is deficient.
Use Logstash it's 10 times better than 'quick' hack for those problems.
How are you using Logstash? I have NGINX write the log, then the log-shipper suck that log file up. If you have a better work flow, please share. :)