Extracting log file contents, with sed and awk.

Posted by Ryan on October 28, 2008 in Linux, Solaris

Extract data from a file that contains a list of information. Lets say I have a logfile /tmp/logfile that contains hostnames in the following format (See below). All I want is the hostname, nothing else.  Typing grep and the hostname will return the entire line that contains that hostname.  However using sed, awk, sort and uniq we can get narrow it down to only unique instances of a hostname in alphabetical order.  See below.

cat /tmp/logfile

list 1 get-host1-now
list 2 get-host3-now
list 3 get-host4-now
list 4 get-host1-now
list 5 get-host5-now
list 6 get-host6-now
list 7 get-host1-now

awk ‘{ print $3 }’ /tmp/logfile | sed ’s/get-/get- /’ | sed ’s/-now/ -now/’ | awk ‘{ print $2 }’ | sort | uniq

Explaination:

awk ‘{ print $3 }’ /tmp/logfile (this will find “get-host1-now” from the 3rd column, columns are separated by whitespace)

sed ’s/get-/get- /’ (this adds whitespace after “get-”, eg. get- host1-now)

sed ’s/-now/ -now/’ (this adds whitespace before “-now”, eg. get- host1 -now)

awk ‘{ print $2 }’  (this prints the 2nd column, thanks to the previous steps, is host1)

| sort (will put everything in alphabetical order, this is important for the “uniq” operation)

| uniq (will only print 1 instance of reoccurring data, without the “sort, this wouldn’t have worked since host1 does not re-occur immediately after the 1st occurrance)

Hope this helps!

Additional info:

http://student.northpark.edu/pemente/sed/sed1line.txt  (Sed info)

http://analyser.oli.tudelft.nl/regex/index.html.en (RegEx info)

http://www.linuxfocus.org/English/September1999/article103.html (Awk info)

Write a Comment on Extracting log file contents, with sed and awk.

Subscribe

Follow comments by subscribing to the Extracting log file contents, with sed and awk. Comments RSS feed.

More

Read more posts by Ryan

Common Linux commands and syntax Exporting full X windows session via ssh