Parsing and centralizing logs from Ruby/Rails and nginx - Our Recipe

Following my recent post Measure, Monitor, Observe and Supervise I got a ton of emails, Facebook messages, twitter DM’s and more, asking me to talk about HOW we do it, the chef cookbooks, the pipeline, how everything is wired up etc…etc.

This post has been a long time coming, but still, better late than never I guess, so in this post I will be open sourcing the entire pipeline. How it’s done, the wiring, the cookbooks, everything.

While this post title “rails applications”, you will likely find it useful for any sort of web application running multiple servers, in the cloud or on premise.

First, I’d like to describe how this looks from a birds-eye view

This is one example of a dashboard for our rails servers:

This has a lot of moving parts so lets dive into them one by one.

Logstasher gem

We use the logstaher gem in order to
generate JSON logs from the rails application.

We use the gem as is with a few custom fields added to it, nothing out of the
ordinary.

Logstasher gem will simply save a JSON version of the log file on the application server, it WILL NOT send the logs directly to logstash.

So for example you will have a /var/log/your-app-name/logstash_production.log which will be a JSON version of /var/log/your-app-name/production.log that you are used to.

I encourage you to read more into the Logstasher documentation, it’s pretty good and straightforward.

To give you an example of how the logs are separated, look at this image:

Application server log separation

Rsyslog shipper

The rsyslog shipper is a coniguration file under
/etc/rsyslog.d/PRIORITY-CONFIG_FILE_NAME on your App servers.

For example: /etc/rsyslog.d/22-nginx-access.

We use chef to manage the template, here’s the base template:

NOTE: Do not forget to open up the ports in the firewall only to the servers
you need (or to the security group on Amazon)

$ModLoad imfile   # Load the imfile input module

$InputFileName /var/log/nginx/access.log
$InputFileTag nginx-access:
$InputFileStateFile nginx-access-statefile
$InputRunFileMonitor
:syslogtag , isequal , "nginx-access:" @your-logstash-server:5545
& ~

Explaining the file a bit more:

Note:
1. This works with rsyslog version 5.x, if you are using 7.x or any other version it will likely be different.
2. You only need $ModLoad imfile per syslog config, if you use 2 configuration files, only the first one will need to include this file.

One gotcha that I wasn’t aware of at all until sharing this shipper as open
source is that if you use TCP from rsyslog and for some reason logstash is down
or something happens to the channel (firewall etc…) it might take down all
your servers to a halt.

This is due to the fact that rsyslog will save a queue of tcp tries and
eventually choke under the load.

Following that tip, we moved all of our shippers to using UDP if you have that
option you should do the same IMHO.

Chef cookbook

The main part of this is really our logger cookbook, this installs the main server and generates the inputs.

As part of this blog post, I am open sourcing it as well, you can check it out on Github.

The main part in the cookbook is the part that filters the input based on where it’s coming from.

input {
  udp {
    port => "5544"
    type => "syslog"
  }
}

output {
  elasticsearch {
    host => "127.0.0.1"
  }
}

filter {
  if [type] == 'syslog' {
    grok {
      match => {
        message => "%{COMBINEDAPACHELOG}"
        program => "nginx-access"
      }
    }
  }
}

You can see here that the grok filter is COMBINEDAPACHELOG, it’s being
extracted from the streamed line into logstash.

This is a saved filter that is part of what’s being supplied with logstash.

For Rails, it’s a bit more tricky as you can see here:

input {
  udp {
    port => "5545"
    type => "rails"
    codec => json {
      charset => "UTF-8"
    }
  }
}

filter {
  if [type] == 'rails' {
    grok {
      pattern => "<%{POSINT:syslog_pri}>%{SYSLOGTIMESTAMP:timestamp} %{SYSLOGHOST:logsource} %{SYSLOGPROG}: %{GREEDYDATA:clean_message}"
    }
    mutate {
      remove => [ "message" ]
    }
    json {
            source => "clean_message"
    }
  }
}

You can see here, this is a bit more tricky. Since rsyslog ships the logs
into logstash, it adds some custom fields into it, making it an invalid JSON,
so the first part is cleaning up the message and parsing it using the JSON
encoder.

Note: Logstash has an input called syslog, this might be better here than using TCP/UDP and cleaning up.
I haven’t tested it on our setup yet, but once I do, I will update the cookbook and the blog here.

Thank you’s

I want to thank @kesor6 for helping me a ton with
this, helping fixing bugs, pair programming at the wee hours of the night.

Operations Israel Facebook group, amazing
people helping out with problems and giving tips when needed.