Parsing and centralizing logs from Ruby/Rails and nginx - Our Recipe
15 Dec 2014Following my recent post Measure, Monitor, Observe and Supervise I got a ton of emails, Facebook messages, twitter DM’s and more, asking me to talk about HOW we do it, the chef cookbooks, the pipeline, how everything is wired up etc…etc.
This post has been a long time coming, but still, better late than never I guess, so in this post I will be open sourcing the entire pipeline. How it’s done, the wiring, the cookbooks, everything.
While this post title “rails applications”, you will likely find it useful for any sort of web application running multiple servers, in the cloud or on premise.
First, I’d like to describe how this looks from a birds-eye view
- User hits application servers and the servers generate logs. Typically we look at a couple of logs. Nginx access logs and rails server logs
- Rsyslog is watching those 2 files and ships them to logstash over UDP when the file changes
- Logstash Checks the input and directs it to the right place (filter in logstash)
- Logstash outputs everything into elasticsearch
- Kibana is used to look at the graphs and dashboard
This is one example of a dashboard for our rails servers:
This has a lot of moving parts so lets dive into them one by one.
Logstasher gem
We use the logstaher gem in order to
generate JSON logs from the rails application.
We use the gem as is with a few custom fields added to it, nothing out of the
ordinary.
Logstasher gem will simply save a JSON version of the log file on the application server, it WILL NOT send the logs directly to logstash.
So for example you will have a /var/log/your-app-name/logstash_production.log
which will be a JSON version of /var/log/your-app-name/production.log
that you are used to.
I encourage you to read more into the Logstasher documentation, it’s pretty good and straightforward.
To give you an example of how the logs are separated, look at this image:
Rsyslog shipper
The rsyslog shipper is a coniguration file under
/etc/rsyslog.d/PRIORITY-CONFIG_FILE_NAME
on your App servers.
For example: /etc/rsyslog.d/22-nginx-access
.
We use chef to manage the template, here’s the base template:
NOTE: Do not forget to open up the ports in the firewall only to the servers
you need (or to the security group on Amazon)
$ModLoad imfile # Load the imfile input module
$InputFileName /var/log/nginx/access.log
$InputFileTag nginx-access:
$InputFileStateFile nginx-access-statefile
$InputRunFileMonitor
:syslogtag , isequal , "nginx-access:" @your-logstash-server:5545
& ~
Explaining the file a bit more:
- This is a working example of sending nginx logs to logstash over UDP
- ‘/var/log/nginx/access.log` is where nginx stored access logs by default, if yours doesn’t store it there, change it
nginx-access
is the tag given to that file input by rsyslog, you can identify the input by the tag and handle ityour-logstash-server
needs to be domain/ip/ec2-internal-address- If you use
@
when sending the logs, it will be sent over UDP, using@@
will send the logs over TCP. & ~
Tells Rsyslog to stop handling this file down the chain.
Note:
1. This works with rsyslog version 5.x, if you are using 7.x or any other version it will likely be different.
2. You only need $ModLoad imfile
per syslog config, if you use 2 configuration files, only the first one will need to include this file.
One gotcha that I wasn’t aware of at all until sharing this shipper as open
source is that if you use TCP from rsyslog and for some reason logstash is down
or something happens to the channel (firewall etc…) it might take down all
your servers to a halt.
This is due to the fact that rsyslog will save a queue of tcp tries and
eventually choke under the load.
Following that tip, we moved all of our shippers to using UDP if you have that
option you should do the same IMHO.
Chef cookbook
The main part of this is really our logger cookbook, this installs the main server and generates the inputs.
As part of this blog post, I am open sourcing it as well, you can check it out on Github.
The main part in the cookbook is the part that filters the input based on where it’s coming from.
input {
udp {
port => "5544"
type => "syslog"
}
}
output {
elasticsearch {
host => "127.0.0.1"
}
}
filter {
if [type] == 'syslog' {
grok {
match => {
message => "%{COMBINEDAPACHELOG}"
program => "nginx-access"
}
}
}
}
You can see here that the grok filter is COMBINEDAPACHELOG
, it’s being
extracted from the streamed line into logstash.
This is a saved filter that is part of what’s being supplied with logstash.
For Rails, it’s a bit more tricky as you can see here:
input {
udp {
port => "5545"
type => "rails"
codec => json {
charset => "UTF-8"
}
}
}
filter {
if [type] == 'rails' {
grok {
pattern => "<%{POSINT:syslog_pri}>%{SYSLOGTIMESTAMP:timestamp} %{SYSLOGHOST:logsource} %{SYSLOGPROG}: %{GREEDYDATA:clean_message}"
}
mutate {
remove => [ "message" ]
}
json {
source => "clean_message"
}
}
}
You can see here, this is a bit more tricky. Since rsyslog ships the logs
into logstash, it adds some custom fields into it, making it an invalid JSON,
so the first part is cleaning up the message and parsing it using the JSON
encoder.
Note: Logstash has an input called syslog
, this might be better here than using TCP/UDP and cleaning up.
I haven’t tested it on our setup yet, but once I do, I will update the cookbook and the blog here.
Thank you’s
I want to thank @kesor6 for helping me a ton with
this, helping fixing bugs, pair programming at the wee hours of the night.
Operations Israel Facebook group, amazing
people helping out with problems and giving tips when needed.