Pranav Kulkarni

awk Programming Basics

Data-driven scripting language for data extraction and reporting. Line-by-line or record-by-record processing of text files.

Blocks

Working with a single file

BEGIN {
    print(""runs at the start of all records"")
}
{
    print(""run for each input line"")
}
END {
    print(""runs at the end of all records"")
}

Working with multiple files

BEGINFILE { print ""-: "" FILENAME "" :-"" }
ENDFILE { }

Find matching lines in a file

awk '$1 ~ /pattern/ { ... }' file.txt

Find matching lines conditionally in a file

awk '{if($1 ~ /pattern) { ... }}' file.txt

Control Statements

if-else branch statement

if ( condition ) {
    print(""do something"")
} else {
    print(""do something else"")
}

Ternary-operator statement

print(a == b ? ""true"" : ""false"")

Loop statement

for(i = 0 ; i < 100 ; i += 1) {
    print(i)
}

Continue statement

next          # continue to the next record
nextfile      # continue to the next file

Exit statement

exit          # terminate
exit(1)       # terminate with exit code

Control variables

Usually used by setting them in BEGIN block or awk’s -v CLI option.

FS - Input Field Separator

String or regex denotes how input fields will be split.

cat file.csv | awk -v FS=, '{ print $3 }'

RS - Input Record Separator

String or regex indicates how input lines will be split.

awk 'BEGIN { RS='\0' } { print }'

OFS - Output Field Separator

String or regex shows how output fields will be joined.

awk 'BEGIN { OFS="","" } { print $1,$2 }'

ORS - Output Record Separator

String or regex added after each print output. The default is a "newline" character.

awk -v ORS='\r\n' '{ print }'

Supporting variables

NF - Number of fields

It gives the number of fields in the current record

ARGV - Arguments vector (same as C)

All CLI parameters not directly consumed by awk

ARGC - Arguments count (same as C)

Number of CLI parameters not directly consumed by awk

Consuming shell environment variables

Consume by explicitly defining a variable at CLI

This can be achieved using environment variable value using -v option: -v <key>=<value>:

awk -v USER=""$USER"" 'BEGIN { print(USER) }'

Consume directly via ENVIRON: ENVIRON[""""]:

awk 'BEGIN { print ENVIRON[""PATH""] }'