Name:Very simple awk samples

If you have a data file inputfile.txt arranged in columns delimited by whitespace:


1 2 3 4 5

1000 2000 3000 4000 5000

a b c d e  


and you want to extract the earth-shaking results in the 4th column to another file, you can print field 4 using awk:

 awk '{print $4}' inputfile.txt > outputfile.txt

or, if you want to extract both columns 2 and 4, try:

 awk '{print $2,$4}' inputfile.txt > outputfile.txt

If you want to print the lines verbatim, the unsplit line of text counts as field 0:

 awk '{print $0}' inputfile.txt > outputfile.txt

The fields count as variables, so you can, for example, do arithmetic on the numbers:

 awk '{print $2-$4}' inputfile.txt > outputfile.txt

Or, if you're going to read in other files, you may want to save your fields in variables first of all:

 awk '{test1=$2; test2=$4; print test1-test2;}' inputfile.txt > outputfile.txt

 (Note while variables can be defined on the fly, sometimes you will want to define them in advance in order to force awk to read a field in as the correct type-  there may be instances in which you'll be reading in a number as a string, and awk by default would like to read that number as a number.)

Awk can only take one file argument on the command line.  If you want to read in other files and extract fields from them at the same time then print everything out, you'll need to use the getline command:

 awk '{test1=$2; test2=$4; getline $0 < "inputfile2.txt"; print test1,test2,$2,$4;}' inputfile.txt > outputfile.txt

Besides getline, I find I use gsub with regular expressions quite a bit to remove unwanted extensions on filenames...  You could use sed on this too.  In this example, I'm going to replace all instances of .imh in my lines of input text with .fits:

awk '{gsub(/.imh/,".fits",$0); print $0;}' listoffilenames.txt > correctedlistoffilenames.txt

This post is getting long, so just one more basic thing for now...  Often I want to skip the header line in a file.  To avoid reading some lines, you can specify a condition on the built-in variable NR (which is the number of lines read) out in front of the brackets that define the program:

 awk 'NR>1{print $0;}' inputfile.txt > inputfile_without_the_header.txt

Note that awk is a quick-and-dirty language, so don't be surprised when minor things crop up in complicated tasks where awk doesn't quite do what you expect.  Try to stick to simple scripting, but don't be surprised if the ease of use tempts you occasionally to write some rather grand schemes!