Hacker News new | ask | show | jobs
by megous 1954 days ago
Hmm, it ocured to me that you can probably get a nice list of robot user-agents by querying all UAs that accessed the robots.txt file. I don't think normal browsers touch that file.

Also a thing to do on the cheap, if you want more usable logs is to do JSON logging[1] (one object per line). This is trivial to import into PostgreSQL and also trivial to query via tools like jq, as is.

[1] Example: https://stackoverflow.com/questions/25049667/how-to-generate...

1 comments

Logging JSON directly from nginx is what I currently do, and then the log output is ingested straight into ElasticSearch. One neat thing you can do is also log return headers from an upstream HTTP server, such as a username for example or any application-specific piece of data. That way you can interleave your HTTP access logs with application data and have everything available for querying in one index.