Collect, parse and visualize your logs with LumberMill, Elasticsearch and Kibana on CentOS, Part II

In the first part of this howto, I described how to install and configure the different components to build a simple log analyzer.
Today I’d like to show how rsyslog/syslog-ng can be set up to ship apache logs to LumberMill and how to load-balance multiple LumberMill instances via haproxy.


Per default CentOS uses rsyslog as syslog daemon. If you want to stick with this, follow this to set up rsyslog to send log events to LumberMill:

#Filewatcher module. Only needs to be loaded once.
$ModLoad imfile

# Filewatcher for nginx access log
$InputFileName /var/log/nginx/access.log
$InputFileTag httpd-access:
$InputFileStateFile /tmp/state-httpd-access
$InputFileSeverity info
$InputFilePollInterval 10 
if $syslogtag contains 'httpd-access' then @@your.server.ip:5151;

restart rsyslog:

service rsyslog restart


Somehow I like syslog-ng better. If you want to replace rsyslogd with it:

# Install syslog-ng
yum install syslog-ng
# Stop rsyslogd and start syslog-ng
service rsyslog stop; service syslog-ng start
# Deactivate rsyslod at startup
chkconfig rsyslog off; chkconfig syslog-ng on

syslog-ng configuration

Open up /etc/syslog-ng/syslog-ng.conf with your editor of choice and add the following:

# Filewatcher for apache logs
source s_apache {
    file("/var/log/httpd/*" follow_freq(1) flags(no-parse));
# LumberMill destination
destination d_lumbermill { tcp( "your.server.ip" port(5151) ); };
# Send apache logs to LumberMill
log { source(s_apache); destination(d_lumbermill); flags(final);};

restart syslog-ng:

service syslog-ng restart

If you get an error message, complaining about a missing afsql module, you can safely ignore this. Seems to be a bug in the CentOS syslog-ng rpm.


Now configure LumberMill to listen for syslog messages.
Create a config file in /opt/LumberMill/conf/syslog.conf with these contents:

# A simple TCP Server.
- TcpServer:
    port: 5151

# Print some event statistics
- SimpleStats:
    interval: 5
      - StdOutSink
      - ElasticSearchSink

# Send received events to stdout for debugging
- StdOutSink

# Send received events to es
- ElasticSearchSink:
    replication: async
    nodes: ["your.elasticsearch.server"]
    index_name: perftest
    batch_size: 500
    store_interval_in_secs: 10

Start LumberMill with the above configuration:

pypy /opt/LumberMill/lumbermill/ -c /opt/LumberMill/conf/syslog.conf

Now load some pages from your webserver and LumberMill should print messages like this:

{   'data': '<134>May  9 15:03:55 centos6 httpd-access: - - [09/May/2014:15:03:46 +0200] "GET /img/someimage.png HTTP/1.0" 200 7359 "-" "ApacheBench/2.3" "-"',
    'event_type': 'Unknown',
    'lumbermill': {   'event_id': 'fdb6fbd70beac623f2a63d2935ac798e',
                       'event_type': 'Unknown',
                       'received_by': '',
                       'received_from': '',
                       'source_module': 'TcpServerTornado'}}


Using haproxy to loadbalance between two or more LumberMill instances adds some stability to your logging backends.
The only major drawback here is that the originating ip address from the connecting server to the LumberMill process will always be the loadbalancer.

To install haproxy via yum, the epel repositories are needed:
for i586:

rpm -Uvh

for x86_64:

rpm -Uvh

Install haproxy:

yum install -y haproxy

Edit /etc/haproxy/haproxy.cfg and add the following lines:

frontend  main
    bind *:5151
    default_backend             LumberMills
backend LumberMills
    # round robin balancing between the backends
    balance     roundrobin
    server  first check
    server  second check

Restart haproxy:

service haproxy restart

Now you only need to change the ports LumberMill is listening on to 5152.
Edit /opt/LumberMill/conf/syslog.conf:

# A simple TCP Server.
- TcpServer:
    port: 5152
Dieser Beitrag wurde unter /dev/administration veröffentlicht. Setze ein Lesezeichen auf den Permalink.

Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert.