Collect, parse and visualize your logs with LumberMill, Elasticsearch and Kibana on CentOS, Part II

In the first part of this howto, I described how to install and configure the different components to build a simple log analyzer.
Today I’d like to show how rsyslog/syslog-ng can be set up to ship apache logs to LumberMill and how to load-balance multiple LumberMill instances via haproxy.


Per default CentOS uses rsyslog as syslog daemon. If you want to stick with this, follow this to set up rsyslog to send log events to LumberMill:

#Filewatcher module. Only needs to be loaded once.
$ModLoad imfile

# Filewatcher for nginx access log
$InputFileName /var/log/nginx/access.log
$InputFileTag httpd-access:
$InputFileStateFile /tmp/state-httpd-access
$InputFileSeverity info
$InputFilePollInterval 10
if $syslogtag contains ‚httpd-access‘ then @@your.server.ip:5151;

restart rsyslog:
[bash]service rsyslog restart[/bash]


Somehow I like syslog-ng better. If you want to replace rsyslogd with it:
[bash]# Install syslog-ng
yum install syslog-ng
# Stop rsyslogd and start syslog-ng
service rsyslog stop; service syslog-ng start
# Deactivate rsyslod at startup
chkconfig rsyslog off; chkconfig syslog-ng on

syslog-ng configuration

Open up /etc/syslog-ng/syslog-ng.conf with your editor of choice and add the following:
# Filewatcher for apache logs
source s_apache {
file("/var/log/httpd/*" follow_freq(1) flags(no-parse));
# LumberMill destination
destination d_lumbermill { tcp( "your.server.ip" port(5151) ); };
# Send apache logs to LumberMill
log { source(s_apache); destination(d_lumbermill); flags(final);};

restart syslog-ng:
[bash]service syslog-ng restart[/bash]

If you get an error message, complaining about a missing afsql module, you can safely ignore this. Seems to be a bug in the CentOS syslog-ng rpm.


Now configure LumberMill to listen for syslog messages.
Create a config file in /opt/LumberMill/conf/syslog.conf with these contents:
[bash]# A simple TCP Server.
– TcpServer:
port: 5151

# Print some event statistics
– SimpleStats:
interval: 5
– StdOutSink
– ElasticSearchSink

# Send received events to stdout for debugging
– StdOutSink

# Send received events to es
– ElasticSearchSink:
replication: async
nodes: ["your.elasticsearch.server"]
index_name: perftest
batch_size: 500
store_interval_in_secs: 10[/bash]

Start LumberMill with the above configuration:
[bash]pypy /opt/LumberMill/lumbermill/ -c /opt/LumberMill/conf/syslog.conf[/bash]

Now load some pages from your webserver and LumberMill should print messages like this:
[bash]{ ‚data‘: ‚<134>May 9 15:03:55 centos6 httpd-access: – – [09/May/2014:15:03:46 +0200] "GET /img/someimage.png HTTP/1.0" 200 7359 "-" "ApacheBench/2.3" "-"‘,
‚event_type‘: ‚Unknown‘,
‚lumbermill‘: { ‚event_id‘: ‚fdb6fbd70beac623f2a63d2935ac798e‘,
‚event_type‘: ‚Unknown‘,
‚received_by‘: ‚‘,
‚received_from‘: ‚′,
’source_module‘: ‚TcpServerTornado‘}}[/bash]


Using haproxy to loadbalance between two or more LumberMill instances adds some stability to your logging backends.
The only major drawback here is that the originating ip address from the connecting server to the LumberMill process will always be the loadbalancer.

To install haproxy via yum, the epel repositories are needed:
for i586:
[bash]rpm -Uvh[/bash]
for x86_64:
[bash]rpm -Uvh[/bash]

Install haproxy:
[bash]yum install -y haproxy[/bash]

Edit /etc/haproxy/haproxy.cfg and add the following lines:
[bash]frontend main
bind *:5151
default_backend LumberMills
backend LumberMills
# round robin balancing between the backends
balance roundrobin
server first check
server second check

Restart haproxy:
[bash]service haproxy restart[/bash]

Now you only need to change the ports LumberMill is listening on to 5152.
Edit /opt/LumberMill/conf/syslog.conf:
[bash]# A simple TCP Server.
– TcpServer:
port: 5152


Dieser Beitrag wurde unter /dev/administration veröffentlicht. Setze ein Lesezeichen auf den Permalink.

Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert.