scapy with pypy [FIX]

While writing a sniffer plugin for my GambolPutty project, I tried using the infamous scapy library to do the sniffing.
As it is listed in the compatibility list for pypy (see here) I thought it to be a good choice.
Still, I stumbled over some problems…

Setting an interface for sniffing

sniff(prn=self.parsePacket, iface='eth0')

resulted in the following error (see here):

Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib64/pypy-2.2.1/lib-python/2.7/", line 551, in __bootstrap_inner
  File "/opt/src/gambolputty/input/", line 345, in run
    sniff(prn=self.parsePacket, iface='eth2', filter="tcp and port 80")
  File "/usr/lib64/pypy-2.2.1/site-packages/scapy/", line 561, in sniff
    s = L2socket(type=ETH_P_ALL, *arg, **karg)
  File "/usr/lib64/pypy-2.2.1/site-packages/scapy/arch/", line 455, in __init__
    self.ins.bind((iface, type))
  File "<string>", line 1, in bind
error: unknown address family

Setting a filter for sniffing

sniff(prn=self.parsePacket, filter='tcp and port 80')

resulted in the following error:

Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib64/pypy-2.2.1/lib-python/2.7/", line 551, in __bootstrap_inner
  File "/opt/src/gambolputty/input/", line 345, in run
    sniff(prn=self.parsePacket, filter="tcp and port 80") # iface='eth2',
  File "/usr/lib64/pypy-2.2.1/site-packages/scapy/", line 561, in sniff
    s = L2socket(type=ETH_P_ALL, *arg, **karg)
  File "/usr/lib64/pypy-2.2.1/site-packages/scapy/arch/", line 463, in __init__
    attach_filter(self.ins, filter)
  File "/usr/lib64/pypy-2.2.1/site-packages/scapy/arch/", line 135, in attach_filter
    s.setsockopt(SOL_SOCKET, SO_ATTACH_FILTER, bpfh)
  File "<string>", line 1, in setsockopt
error: [Errno 22] Invalid argument

These errors popped up on a vagrant 64bit machine with CentOS6.5 and pypy-2.2 and pypy-2.4.

After some testing I found a solution which at least worked for me.

Here is the patch:

---	2014-10-09 15:03:46.401539030 +0000
+++	2014-10-09 14:51:42.639351837 +0000
@@ -18,8 +18,13 @@
 from scapy.supersocket import SuperSocket
 import scapy.arch
 from scapy.error import warning
+import ctypes
+    import __pypy__
+    is_pypy = True
+except ImportError:
+    is_pypy = False
 # From bits/ioctls.h
 SIOCGIFHWADDR  = 0x8927          # Get hardware address    
@@ -128,11 +133,9 @@
     # XXX. Argl! We need to give the kernel a pointer on the BPF,
     # python object header seems to be 20 bytes. 36 bytes for x86 64bits arch.
-    if scapy.arch.X86_64:
-        bpfh = struct.pack("HL", nb, id(bpf)+36)
-    else:
-        bpfh = struct.pack("HI", nb, id(bpf)+20)  
-    s.setsockopt(SOL_SOCKET, SO_ATTACH_FILTER, bpfh)
+    str_buffer = ctypes.create_string_buffer(bpf)
+    fprog_addr = struct.pack('HL', nb, ctypes.addressof(str_buffer))  
+    s.setsockopt(SOL_SOCKET, SO_ATTACH_FILTER, fprog_addr)
 def set_promisc(s,iff,val=1):
     mreq = struct.pack("IHH8s", get_if_index(iff), PACKET_MR_PROMISC, 0, "")
@@ -451,7 +454,7 @@
         self.ins = socket.socket(socket.AF_PACKET, socket.SOCK_RAW, socket.htons(type))
         self.ins.setsockopt(socket.SOL_SOCKET, socket.SO_RCVBUF, 0)
-        if iface is not None:
+        if iface is not None and not is_pypy:
             self.ins.bind((iface, type))
         if not nofilter:
             if conf.except_filter:

To apply it, save it as scapy_pypy.patch in the same dir as the file of scapy package (e.g. /usr/lib64/pypy-2.2.1/site-packages/scapy/arch/).
Then just execute:

patch < scapy_pypy.patch

Hope this is helpful for someone else ;)

Veröffentlicht unter /dev/administration | Hinterlasse einen Kommentar

Seems like an interesting read:

Publiziert am von donnerdiebel | Hinterlasse einen Kommentar

Collect, parse and visualize your logs with LumberMill, Elasticsearch and Kibana on CentOS, Part II

In the first part of this howto, I described how to install and configure the different components to build a simple log analyzer.
Today I’d like to show how rsyslog/syslog-ng can be set up to ship apache logs to LumberMill and how to load-balance multiple LumberMill instances via haproxy.


Per default CentOS uses rsyslog as syslog daemon. If you want to stick with this, follow this to set up rsyslog to send log events to LumberMill:

#Filewatcher module. Only needs to be loaded once.
$ModLoad imfile

# Filewatcher for nginx access log
$InputFileName /var/log/nginx/access.log
$InputFileTag httpd-access:
$InputFileStateFile /tmp/state-httpd-access
$InputFileSeverity info
$InputFilePollInterval 10 
if $syslogtag contains 'httpd-access' then @@your.server.ip:5151;

restart rsyslog:

service rsyslog restart


Somehow I like syslog-ng better. If you want to replace rsyslogd with it:

# Install syslog-ng
yum install syslog-ng
# Stop rsyslogd and start syslog-ng
service rsyslog stop; service syslog-ng start
# Deactivate rsyslod at startup
chkconfig rsyslog off; chkconfig syslog-ng on

syslog-ng configuration

Open up /etc/syslog-ng/syslog-ng.conf with your editor of choice and add the following:

# Filewatcher for apache logs
source s_apache {
    file("/var/log/httpd/*" follow_freq(1) flags(no-parse));
# LumberMill destination
destination d_lumbermill { tcp( "your.server.ip" port(5151) ); };
# Send apache logs to LumberMill
log { source(s_apache); destination(d_lumbermill); flags(final);};

restart syslog-ng:

service syslog-ng restart

If you get an error message, complaining about a missing afsql module, you can safely ignore this. Seems to be a bug in the CentOS syslog-ng rpm.


Now configure LumberMill to listen for syslog messages.
Create a config file in /opt/LumberMill/conf/syslog.conf with these contents:

# A simple TCP Server.
- TcpServer:
    port: 5151

# Print some event statistics
- SimpleStats:
    interval: 5
      - StdOutSink
      - ElasticSearchSink

# Send received events to stdout for debugging
- StdOutSink

# Send received events to es
- ElasticSearchSink:
    replication: async
    nodes: ["your.elasticsearch.server"]
    index_name: perftest
    batch_size: 500
    store_interval_in_secs: 10

Start LumberMill with the above configuration:

pypy /opt/LumberMill/lumbermill/ -c /opt/LumberMill/conf/syslog.conf

Now load some pages from your webserver and LumberMill should print messages like this:

{   'data': '<134>May  9 15:03:55 centos6 httpd-access: - - [09/May/2014:15:03:46 +0200] "GET /img/someimage.png HTTP/1.0" 200 7359 "-" "ApacheBench/2.3" "-"',
    'event_type': 'Unknown',
    'lumbermill': {   'event_id': 'fdb6fbd70beac623f2a63d2935ac798e',
                       'event_type': 'Unknown',
                       'received_by': '',
                       'received_from': '',
                       'source_module': 'TcpServerTornado'}}


Using haproxy to loadbalance between two or more LumberMill instances adds some stability to your logging backends.
The only major drawback here is that the originating ip address from the connecting server to the LumberMill process will always be the loadbalancer.

To install haproxy via yum, the epel repositories are needed:
for i586:

rpm -Uvh

for x86_64:

rpm -Uvh

Install haproxy:

yum install -y haproxy

Edit /etc/haproxy/haproxy.cfg and add the following lines:

frontend  main
    bind *:5151
    default_backend             LumberMills
backend LumberMills
    # round robin balancing between the backends
    balance     roundrobin
    server  first check
    server  second check

Restart haproxy:

service haproxy restart

Now you only need to change the ports LumberMill is listening on to 5152.
Edit /opt/LumberMill/conf/syslog.conf:

# A simple TCP Server.
- TcpServer:
    port: 5152
Veröffentlicht unter /dev/administration | Hinterlasse einen Kommentar

Install pypy-2.4 on CentOS 6 (including easy_intall and pip)

To install pypy-2.4 on CentOS:

For i586:

tar -xvjf pypy-2.4-linux_i686-portable.tar.bz2
mv pypy-2.4-linux_i686-portable /usr/lib/pypy-2.4
ln -s /usr/lib/pypy-2.4/bin/pypy /usr/bin/pypy
export PYTHONPATH=/usr/lib/pypy-2.4/site-packages
unzip ./
cd distribute-0.7.3 
pypy ./ install
/usr/lib/pypy-2.4/bin/easy_install pip

For x86_64:

tar -xvjf pypy-2.4-linux_x86_64-portable.tar.bz2
mv pypy-2.4-linux_x86_64-portable /usr/lib64/pypy-2.4
ln -s /usr/lib64/pypy-2.4/bin/pypy /usr/bin/pypy
export PYTHONPATH=/usr/lib64/pypy-2.4/site-packages
unzip ./
cd distribute-0.7.3 
pypy ./ install
/usr/lib64/pypy-2.4/bin/easy_install pip

To use pip with pypy:

pypy -m pip install <module>
Veröffentlicht unter /dev/administration | Hinterlasse einen Kommentar

Python murmur hash package mmh3 and „undefined symbol: __gxx_personality_v0“ [FIX]

Got this error while trying to use this package with pypy-2.2.1.
Solved it by downloading the tar from here and building/installing it with:

PYTHONPATH=/usr/lib64/pypy-2.2.1/site-packages;CFLAGS="-lstdc++" /usr/lib64/pypy-2.2.1/bin/pypy install
Veröffentlicht unter /dev/administration | Hinterlasse einen Kommentar

Logstash webhdfs plugin

While working on my GambolPutty project, I also did some testing with the awesome logstash tool.
One of the requirements here at dbap was that events should be analyzed at one single point, then stored in elasticsearch for real time analysis and in hdfs for analysis via hive and mahout.
So for some testing, I wrote a webhdfs output plugin for logstash. Find it here if you think it useful for you ;)

Veröffentlicht unter /dev/administration | Hinterlasse einen Kommentar

80th ;)

Veröffentlicht unter /dev/video | Hinterlasse einen Kommentar

Collect, parse and visualize your logs with LumberMill, Elasticsearch and Kibana on CentOS, Part I

With the arrival of lucene as search platform for all kinds of data, Admins around the world started to put their log data into these datastores.
Logshippers were created, that received raw logdata, parsed this data and then stored it in a lucene driven search platform. Solr was quite popular for this kind of job, but had a major drawback: real time indexing was not as easy as one could wish for. Setting up a cluster of redundant nodes for replication was also not for the faint hearted, although this got easier with subsequent releases of solr. But then elasticsearch came to the rescue. Fast real time indexing, easy clustering and lots of nice features more. Together with the incredible powerful log manager logstash, indexing large amounts of log events in near-real time became a reality. And with kibana as visualization frontend, analysing log data nearly became a fun task to do.

In this how-to I used LumberMill as an alternative to logstash, mostly because I am the one developing LumberMill ;)
Since I’m more fluent in Python than in Ruby and we already had a simple shipper to a solr backend written in Python, I just added some functionality to it. Still, Logstash is way more powerful. But if the features LumberMill provides suffice your needs, feel free to read on ;)

Installing elasticsearch

The box you are running elasticsearch on should have at least 1GB ram.
For compatibility reasons, install the oracle jre for i586:

wget -O jre-7u45-linux-i586.rpm --no-cookies --no-check-certificate --header "Cookie:" ""
rpm -i jre-7u45-linux-i586.rpm

or for x86_64:

wget -O jre-7u45-linux-x64.rpm --no-cookies --no-check-certificate --header "Cookie:" ""
rpm -i jre-7u45-linux-x64.rpm

Activate the jre via alternatives:

alternatives --install /usr/bin/java java /usr/java/latest/bin/java 2000
alternatives --set java /usr/java/latest/bin/java

Now install elasticsearch. Friendly as those people are, they provide an rpm for our convenience ;)

rpm --import
cat > /etc/yum.repos.d/elasticsearch.repo << EOF
name=Elasticsearch repository for 1.3.x packages
yum -y install elasticsearch

Next, we install the es head plugin.
Elasticsearch-head is a web front end for browsing and interacting with an Elastic Search cluster.

/usr/share/elasticsearch/bin/plugin -install mobz/elasticsearch-head

We need to restart elasticsearch to load the new plugin:

/etc/init.d/elasticsearch restart

If elasticsearch complains that it „Can’t start up: not enough memory“, edit /etc/sysconfig/elasticsearch and adjust ES_HEAP_SIZE. Default is 256m, increase this to a value that will let you start es successfully.

Open up the head plugin in you favorite browser by visiting http://your_server:9200/_plugin/head/

If you have trouble connecting, check your iptables rulebase.

Installing pypy (optional)

I heartly recommend running LumberMill with pypy. The performance boost is more than worth the little afford involved in installing pypy.
For CentOS you can follow these simple steps described here.

Installing LumberMill

via pip

pip install LumberMill


If easy_install is not present on your system, install it via:

yum install python-setuptools

Install the city version of the maxmind geolocation databases:

mkdir /usr/share/GeoIP
cd /usr/share/GeoIP
wget ""
gunzip GeoLiteCity.dat.gz

Clone the github repository to /opt/LumberMill (or any other location that fits you better :):

git clone /opt/LumberMill

Install the dependencies with pip:

cd /opt/LumberMill
python install

and for pypy:

pypy install

Now you can give LumberMill a testdrive with:

python /opt/LumberMill/lumbermill/ -c /opt/LumberMill/conf/example-tcp.conf

or for pypy:

pypy /opt/LumberMill/lumbermill/ -c /opt/LumberMill/conf/example-tcp.conf

Again, if you have connection problems, check your iptables rulebase. By default LumberMill will listen on port 5151.

To check, if indexing works without problems, send some logdata to LumberMill.
Just open up another shell and execute:

python /opt/LumberMill/scripts/ -c 100 localhost 5151

LumberMill should now show the incoming events in its statistics output.
If you open up the elasticsearch-head plugin, you should see that a new
lumbermill index was created.

Installing Kibana

For this how-to I choose a very simple setup for kibana.

First we need a webserver. I choose nginx, since it is lightweight and fast.
For i586:

rpm -i

For x86_64:

rpm -i

Get kibana:

mkdir -p /var/www/html/kibana
mkdir -p /var/www/log/
touch /var/www/log/kibana-error.log
chown nginx:nginx -R /var/www/
git clone /var/www/html/kibana

Configure nginx:

rm -f /etc/nginx/conf.d/default.conf
echo "server {
   listen 80;
   root /var/www/html/kibana/src;
   index index.html index.htm;
   error_log  /var/www/log/ error;
" > /etc/nginx/conf.d/kibana.conf

Restart nginx:

/etc/init.d/nginx restart

Now just open a browser with your servers ip address as url. You should see the kibana welcome page.
Clicking on the „Sample Dashboard“ link will take you to a preconfigured dashboard, showing the sample data you send during spam_tcp test.

Well, that’s about it for today. In the next howto, i’d like to show how syslog-ng can be configured to send data to LumberMill and how to loadbalance LumberMill instances via haproxy.

Veröffentlicht unter /dev/administration | Hinterlasse einen Kommentar

and has gone…

Veröffentlicht unter /dev/video | Hinterlasse einen Kommentar

Make sure rpms from master are installed on clone (rpm based systems)

Get list of installed rpms on master (without version info) and rsync it to clone:

rpm -qa --queryformat "%{NAME}\n"|sort -n > master_rpms.txt
rsync -av ./master_rpms.txt $clone:/tmp/

Get list of installed rpms on clone (without version info):

rpm -qa --queryformat "%{NAME}\n"|sort -n > /tmp/clone_rpms.txt

To just list the missing rpms:

grep -Fxvf /tmp/clone_rpms.txt /tmp/master_rpms.txt

To yum install the missing rpms on clone:

for P in $(grep -Fxvf /tmp/clone_rpms.txt /tmp/master_rpms.txt); do PA="$PA $P"; done; yum install -y $PA
Veröffentlicht unter /dev/administration | Verschlagwortet mit | Hinterlasse einen Kommentar