Pipestat

ay| 2009/04/28 17:51

Some days it feels like reading and analysing logs is the only thing I do at work.
And when I analyse logs it’s usually something like


zcat -f *log* | grep -v something | awk '/something else/ { print $3 }' | sed 's/something/else/g' | sort | uniq -c | sort -gr > output

And when this does not fully comply to what i need to do i fall back to perl and write a ugly oneliner of the type


tail -f foo | perl -lane 's/foo(.*)/bar$1bar/g'

So when my wife and daughter got a visit from Mr Noro I had some spare time and wrote tailstat. It’s a small perlprogram that does what I manualy do every day

It has a few different modes

Plain

[root@wildbill logs]# for i in foo foo bar bar foo foobar;do echo $i;done | pipestat
foo : (50.0%) (107546.3 hits/s) (3 hits / 6)
bar : (33.3%) (71697.5 hits/s) (2 hits / 6)
foobar : (16.7%) (35848.8 hits/s) (1 hits / 6)

Parsed 6 lines in 0.00 secs (46863.7 lines/s)

a single regexp


[root@wildbill logs]# for i in foo foo bar bar foo foobar;do echo $i;done | pipestat foo
foo : (66.7%) (100462.4 hits/s) (4 hits / 6)
: (33.3%) (50231.2 hits/s) (2 hits / 6)

Parsed 6 lines in 0.00 secs (25317.7 lines/s)

More real examples,


[root@wildbill logs]# cat www-access_log | pipestat 'css' 'jpe?g' 'gif' 'png'
jpe?g : (30.3%) (24590.5 hits/s) (122851 hits / 405830)
css : (11.7%) (9494.6 hits/s) (47434 hits / 405830)
gif : (6.6%) (5355.4 hits/s) (26755 hits / 405830)
png : (0.4%) (341.3 hits/s) (1705 hits / 405830)
<rest> : (51.0%) (41451.2 hits/s) (207085 hits / 405830)

Parsed 405830 lines in 5.00 secs (81230.8 lines/s)

Or if you just want field number 5…


[root@wildbill logs]# cat www-access_log | pipestat --field=9 --limit=5
200 : (88.2%) (77506.3 hits/s) (350721 hits / 397549)
HTTP/1.1" : (3.3%) (2858.5 hits/s) (12935 hits / 397549)
404 : (2.9%) (2530.8 hits/s) (11452 hits / 397549)
301 : (2.4%) (2092.8 hits/s) (9470 hits / 397549)
403 : (1.5%) (1321.3 hits/s) (5979 hits / 397549)
<limited> : (1.8%) (1545.2 hits/s) (6992(1311) hits(uniq) / 397549)

Parsed 397549 lines in 4.53 secs (87795.5 lines/s)

This takes a apache accesslogg and print out the 9th whitespace seperated field and writes out the 5 most common hits, the rest of the hits are sumed up in

But as you can see there is something in the logs that confuses the fieldseperator

So then we go into matchmodus.


[root@wildbill logs]# cat www-access_log | pipestat -m 'HTTP/1.1" (d+)'
200 : (92.3%) (218082.1 hits/s) (375934 hits / 407426)
404 : (3.1%) (7229.9 hits/s) (12463 hits / 407426)
301 : (2.5%) (5910.1 hits/s) (10188 hits / 407426)
403 : (1.4%) (3339.1 hits/s) (5756 hits / 407426)
302 : (0.4%) (873.1 hits/s) (1505 hits / 407426)
405 : (0.0%) (37.1 hits/s) (64 hits / 407426)
206 : (0.0%) (19.1 hits/s) (33 hits / 407426)
304 : (0.0%) (4.6 hits/s) (8 hits / 407426)
<rest> : (0.4%) (855.7 hits/s) (1475 hits / 407426)

Parsed 407426 lines in 1.72 secs (236328.4 lines/s)

Or we could run it on the live log for say 60 seconds and what article is the most popular

[ay@joanie ~]$ varnishlog -i RxURL -I artid | pipestat --runtime=60 --limit 5 -m 'artid=(d+)'
542689 : (7.9%) (1.5 hits/s) (91 hits / 1154)
542739 : (7.5%) (1.4 hits/s) (86 hits / 1154)
542568 : (4.4%) (0.8 hits/s) (51 hits / 1154)
542723 : (4.1%) (0.8 hits/s) (47 hits / 1154)
542718 : (3.7%) (0.7 hits/s) (43 hits / 1154)
<limited> : (72.4%) (13.5 hits/s) (835(244) hits(uniq) / 1154)
<rest> : (0.1%) (0.0 hits/s) (1 hits / 1154)

Parsed 1154 lines in 61.80 secs (18.7 lines/s)

Have fun. The code is here

Tshark to the rescue

ay| 2009/03/29 00:30

In my line of work as a sysadmin at one of the largest sites in Norway, it happens once in a while that I have to inspect HTTP traffic for some more or less urgent reason.

One of the tools I really love working with is tshark.
Tshark is the console version of wireshark and enables you to sniff and dissect just about any protocol in realtime.

One of the problems I had recently was to identify webtraffic originating from our webserver.
Over the years code has accumulated server initiated fetches. Stuff like file_get_content("http://somesite/someurl) in the presentation code. This is bad since it creates external dependencies to deliver a page and keeps apache/nginx/lighttd threads/processes busy


tshark -i eth0 -n -aduration:60 -zhttp,tree -zhttp_srv,tree -T fields -e http.host -e http.request.uri -e http.request.method -R http -tad 'src host 10.0.0.144 and (dst port 80 or dst port 443)'

This roughly says:

  • Listen on the eth0 interface for 60 seconds.
  • Write out two different sets of statistics about the traffic.
  • Write out the “Host:” header, the URL and the request method. (GET/POST).
  • Try to interpret the traffic as a HTTP.
  • write timestamps in a readable format (not used)
  • Only look at traffic from my IP to port 80 (HTTP) and port 443 (HTTPS)

This little trick helped me identify loads of external dependencies and pinpointed some ugly code that needed some care.

And while I was at it. I figured out I could do something similar with mysql queries. Instead of turning on full Query-logging in mysql (which probably means a restart of a running production mysql) I could just sniff it


tshark -i eth0 -aduration:60 -d tcp.port==3306,mysql -T fields -e mysql.query 'port 3306'

Which roughly says:

  • Listen on eth0 for 60 seconds
  • Interpret port 3306 as mysql
  • write out queries
  • Only look at traffic on port 3306

Have fun.

Other usefull options to -T fields -e
http.response.code
http.server
http.content_type
ip.src
ip.dst
tcp.port
http.user_agent

10 Years of Blogging

ay| 2008/10/18 21:41

I’ve been blogging since 27 august 1998. Time to celebrate with.. yet another blogpost. During these 10 years on the internet I’ve written 47 blogposts. That averages to 4.7 posts a year.

I’m pretty good.

Since my last post, I’ve changed jobs (now working at VG Multimedia http://www.vg.no as head of IT Operations) , got married (with Elisabeth) and got a daughter (Aurora, now 10 months).

The joy of the information society

I’ll keep you posted :-)

Ok.We’re back

ay| 2006/12/02 20:19

We are back home. And have been for a while. This is posted wirh my beloved E70

Moving Towards Norway… Slowly…

ay| 2006/07/19 07:36

We have surfed Hawaii and went to a Katie Melua concert in Chicago and are now on our way to New York in a SUV. Right now in a Motel close to Woodstock. 24th july we’ll be landing in Oslo, Gardermoen.

Mr Lava Lava

ay| 2006/07/08 10:31

We have arrived Hawaii, and enjoyed a couple days in Honolulu and Waikiki Beach. The last two days we’ve spent on one of the other islands, the bigges one, the one that’s acctually called Hawai’i (or just”The Big Island”) and hiking to one of the worlds largest and most active volcanoes and walking a meter or two from flowing lava.. Exiting. Pictures will be up soon.

Leaving Australia

ay| 2006/07/04 10:00

We have completed all exams, we have celebrated, we have cried and we are now regretibly leaving Australia to return to our jobs back in Norway.

We will miss our friends and aparment here. Hope you all come visit us in the future.

Fortunatly we are making a detour through Hawaii, Chicago, New York and London. Flying through USA gives us two pieces of 32kg luggage instead of one piece of 20kg, so “aloha”, here we go.

This is written in the airport in Sydney, waiting for our plane out of the country.

Music I Need To Have a Closer look at

ay| 2006/07/02 09:47

Japanese Visual Kai

Japan X

Dir en Grey

Moi dix Mois

Frensh Goth

Eths

One Exam Left

ay| 2006/06/24 17:43

4 down, 1 to go. It’s the end of this semester and by the looks of it, I’ll be Master of Information Technology by tuesday.

Since New Zeeland, which was this summer, autumn has passed and it’s now winter. We’ve did a short trip to Fiji which was pretty cool. Pictures are comming up.

I also modified the site slightly, now using the vslide theme for wordpress. It’s got this neat slideshow for a header and cool buttons to press to collapse postings.

In New Zealand

ay| 2006/02/17 08:04

Elisabeth and me are currently spending our “summer” holydays in New Zealand.
We started in Auckland, been working our way down to the south island and are now in Queenstown, the “action capital”.

We have been skydiving in Taupo, diving in the Fjords and tomorrow we are going to do three bungy jump from 43m, 110m and 134m meters.

Later.