awk Programming Basics
Data-driven scripting language for data extraction and reporting. Line-by-line or record-by-record processing of text files.
Blocks
Working with a single file
BEGIN {
print(""runs at the start of all records"")
}
{
print(""run for each input line"")
}
END {
print(""runs at the end of all records"")
}
Working with multiple files
BEGINFILE { print ""-: "" FILENAME "" :-"" }
ENDFILE { }
Find matching lines in a file
awk '$1 ~ /pattern/ { ... }' file.txt
Find matching lines conditionally in a file
awk '{if($1 ~ /pattern) { ... }}' file.txt
Control Statements
if-else branch statement
if ( condition ) {
print(""do something"")
} else {
print(""do something else"")
}
Ternary-operator statement
print(a == b ? ""true"" : ""false"")
Loop statement
for(i = 0 ; i < 100 ; i += 1) {
print(i)
}
Continue statement
next # continue to the next record
nextfile # continue to the next file
Exit statement
exit # terminate
exit(1) # terminate with exit code
Control variables
Usually used by setting them in BEGIN block or awk’s -v CLI option.
FS - Input Field Separator
String or regex denotes how input fields will be split.
cat file.csv | awk -v FS=, '{ print $3 }'
RS - Input Record Separator
String or regex indicates how input lines will be split.
awk 'BEGIN { RS='\0' } { print }'
OFS - Output Field Separator
String or regex shows how output fields will be joined.
awk 'BEGIN { OFS="","" } { print $1,$2 }'
ORS - Output Record Separator
String or regex added after each print
output. The default is a "newline" character.
awk -v ORS='\r\n' '{ print }'
Supporting variables
NF - Number of fields
It gives the number of fields in the current record
ARGV - Arguments vector (same as C)
All CLI parameters not directly consumed by awk
ARGC - Arguments count (same as C)
Number of CLI parameters not directly consumed by awk
Consuming shell environment variables
Consume by explicitly defining a variable at CLI
This can be achieved using environment variable value using -v
option: -v <key>=<value>
:
awk -v USER=""$USER"" 'BEGIN { print(USER) }'
Consume directly via ENVIRON: ENVIRON[""""]:
awk 'BEGIN { print ENVIRON[""PATH""] }'