Do you know that Unix (and, by extension, all popular Linux distros) come with a scripting language for text processing?
That’s right.
The Unix/Linux system extensively uses AWK, a powerful text processing and pattern-scanning language. It offers a flexible and effective method for modifying data files, extracting data, and producing reports.
In this Unix awk tutorial, we’ll introduce the AWK language, the syntax of the language, and how you can use regular expressions to add power to your data extraction and manipulation operations.
Table Of Content
- A Quick Introduction to the AWK Language
- Prerequisites to Using AWK Command in UNIX and Linux
- AWK Command Syntax
- The AWK Command in Unix and Linux: How Does It Operate?
- AWK Conditional and Loop Statements
- Understanding the AWK Pattern
- AWK Actions
- Examples of AWK Commands
- Conclusion
- FAQ’s
A Quick Introduction to the AWK Language
Before going into the details of the AWK command in Unix, you should know the origins of the name.
The name AWK represents the initials of the three authors, Alfred Aho, Peter Weinberger, and Brian Kernighan. It was created in the 1970s at Bell Labs and has become a crucial component of the Unix toolkit.
The AWK command processes each input data line individually, then takes actions based on user-defined patterns and rules. Typical AWK operational processes involve scanning the input file or stream, looking for patterns, and performing appropriate actions.
The core building blocks of AWK commands are pattern-action pairs. An action describes the instructions to be carried out when a match occurs, whereas a pattern indicates which lines or records should match. The default action of AWK is to print the full line if no action is provided.
AWK is generally used for text manipulation, arithmetic calculations, and control flow of instructions by leveraging built-in functions and variables. Thanks to these built-in functionalities, AWK is a flexible tool for data extraction, processing, and reporting chores.
Common Use Cases of AWK
Since its introduction, AWK has been a popular part of Unix toolkits because of its outstanding capabilities in the following areas:
Text Processing and Filtering
AWK is most commonly used to modify the contents of a file, extract particular fields or columns from a file, and filter lines depending on specific criteria.
Data Analysis and Reporting
AWK enables you to generate statistics, do calculations on data, and create customized reports depending on specific requirements.
File Formatting and Reformatting
AWK is often used to modify the field separation devices, restructure the file’s structure, or convert data between other formats.
General System Administration
Automating repetitive tasks, analyzing log files, and producing structured output for the next step are common uses for AWK in shell scripts.
To use the AWK command in Unix or Linux, simply use the awk command, followed by the script or command-line arguments, and then specify the input file or data stream. If you regularly carry out an operation, you can create AWK scripts and save them as separate files.
Prerequisites to Using AWK Command in UNIX and Linux
The following are the prerequisites to using AWK command in Unix and Linux systems.
- It is essential to have a fundamental grasp of how to use the Linux command line. Navigating directories, running commands, and handling files all fall under this category.
- Since AWK mostly works with text files, it’s helpful to be familiar with the fundamental processes such as creating, reading, writing, and adding text files using commands like cat, echo, touch, rm, and mv.
- Regular expressions are a core component of AWK used to create patterns for text matching. You can maximize AWK’s pattern-matching capabilities if you understand regular expressions.
- AWK is especially helpful for processing structured data, where delimiters like spaces, tabs, or commas separate fields or columns. When executing actions on particular information items, it’s crucial to comprehend how data is arranged in fields and records.
- AWK has a well-defined language syntax. You’ll need a fundamental knowledge of the AWK syntax, including creating patterns, actions, variables, and functions to efficiently use the language.
- Although AWK is not a full-fledged programming language, it would be helpful to have some familiarity with basic programming concepts when creating advanced AWK scripts. These concepts include variables, conditional statements (if-else), loops (for, while), and functions.
- You’ll need access to a Linux system or a system similar to Linux, such as macOS, to practice and apply AWK commands. This might be a virtual computer, a local installation, or an online Linux terminal.
AWK Command Syntax
Let’s start this guide with a look at the AWK command syntax. A typical command has the following format:
awk [options] ‘selection_criteria {action}’ input-file > output-file
The options are used to indicate file locations or the inclusion of a file tool. Here’re are some common options you can include with your AWK commands:
- -F [separator] is a file separator specification tool. The divider, by default, is a white space.
- -f [filename] indicates the location of the awk script’s file. Instead of using the first argument on the command line, it reads the awk program source from the given file.
- -v is used to give a variable a value.
The AWK Command in Unix and Linux: How Does It Operate?
The primary function of any awk command is to simplify text manipulation and information retrieval tasks in Linux. As such, an AWK command in Unix and Linux looks for lines matching the user-supplied patterns by scanning a series of input lines in sequence.
Users can define an action for each pattern to be applied to each line that meets the pattern. This makes AWK very flexible, and an AWK script can perform multiple operations on files. For instance, users may quickly analyze complicated log files using AWK and generate an easy-to-read report(s).
Common AWK Operations
Another aspect of using AWK is to understand what you can do with its capabilities. Here’re some common operations users can apply various operations on text or input streams:
- Scan a file, line by line.
- Split the file or input line into fields.
- The input line or fields are compared to the designated pattern(s).
- Carry out different formatting actions on the matching lines.
- Style the output.
- Perform string and math operations.
- Utilize output loops and flow control.
- Transform the data and files into the desired structure.
- Create reports with formatting.
AWK Conditional and Loop Statements
AWK has proper conditional and looping statements to add more power to your text manipulation operations.
You get basic control flow statements like if-else and loop statements like while, for, and break. Users can combine these statements using {} brackets.
Let’s look at these statements in more detail.
The if-else statement
The “if-else” statement checks if the condition stated in the brackets is true. If it is, the statement that follows the if statement is carried out. Often, the else sentence is not required.
Here’s an example
# awk -F ',' '{if($2==$3){print $1","$2","$3} else {print "No Duplicates"}}' redswitches.txt'
When there are no duplicate responses in a line, the result indicates No Duplicates and displays the lines containing duplicates.
The while Statement
As long as the given condition remains true, the while statement continuously executes a target statement. The body of the loop continues to run as long as the condition is satisfied.
For example, the following command tells awk to display each input field on a separate line:
# awk '{i=0; while(i<=NF) { print i ":"$i; i++;}}' redswitches.txt
The for Statement
Similar to the C language, the for statement enables users to decide the number of times a loop should run.
Here’s an AWK for statement:
# awk 'BEGIN{for(i=1; i<=10; i++) print "The square of", i, "is", i*i;}'
The break Statement
The break statement instantly terminates a while or for statement. You can use the continue statement to start the subsequent iteration.
The following awk command in Unix tells the processor to skip to the following record and look for patterns from the top. Awk is informed that the input was finished via the exit statement.
Here is an example of a break statement:
# awk 'BEGIN{x=1; while(1) {print "Example"; if ( x==7 ) break; x++; }}'
After five iterations of the command above, the loop is broken.
Understanding the AWK Pattern
You need to place a pattern before an action in an AWK statement to make a selection from the text or input stream. Here are some examples of patterns for expressions:
- Regular expressions.
- Arithmetic relational expressions.
- String-valued expressions.
- Arbitrary Boolean combinations of the expressions above.
Let’s go into the details of these patterns.
Regular Expression Pattern
The most basic type of expression that comprises a string of letters separated by slashes is regular expression patterns. It could consist of a string of characters, numbers, or both.
In the following example, the statement prints all lines beginning with “U” Since it uses a regular expression pattern, it can detect the required string, even if It is a component of a longer word.
# awk '$1 ~ /^U/ {print $0}' redswitches.txt
Relational Expression Patterns
Relational expression patterns are an additional subset of awk patterns. You can use all allowed relational operators, =, ==,!=, >=, and >, in the relational expression patterns.
Here is an example of an awk relational expression:
# awk 'BEGIN { a = 10; b = 10; if (a == b) print "a == b" }'
Special Expression Patterns
The BEGIN and END are two unique expression patterns that signal the start and finish of programs, respectively. Before the first record is processed, the input matches the BEGIN pattern. After the final record has been processed, the END pattern matches the input’s end.
For example, you could tell awk to show a message at the start and end of the method:
# awk 'BEGIN { print "List of debtors:" }; {print $1, $2}; END {print "End of the debtor list"}' redswitches.txt
AWK Actions
AWK statements contain pattern-action pairs that AWK uses to achieve the purpose of the script.
AWK patterns contain expressions, control statements, compound statements, input and output statements, and deletion statements, enclosed in curly braces {}.
Let’s see a sample AWK action
# awk '{action}'
Now, let’s use this syntax in a simple awk script:
# awk '{print "Redswitches is dedicated hosting provider"}'
As you can see, the simple AWK command in Unix and Linux directs awk to display the provided string value every time it’s executed. You can use Ctrl+D to end the program.
Examples of AWK Commands
While many users consider it a simple text processing tool, AWK, in fact, is a great programming language. As such, the applications go well beyond manipulating data and generating formatted outputs.
To illustrate, we’ll now discuss some AWK use cases that demonstrate the flexibility and power of the language.
Calculations
You can easily carry out mathematical computations in an awk command. For example:
# df | awk '/\/dev\/loop/ {print $1"\t"$2 + $3}'
Filtering
You can filter the output using the awk command by setting a line-length restriction. For example:
# awk 'length($0) > 8' /etc/shells
In this example, the /etc/shells system file was processed using awk, and the output was filtered only to include the lines with more than 8 characters.
Counting
You can use a simple AWK statement to count the characters on a line and print the result along with the number.
Here’s a sample AWK command:
# awk '{ print "The number of characters in line", NR,"=" length($0) }' redswitches.txt
Conclusion
The AWK command in Unix is a very flexible and effective tool for text processing and pattern scanning. It is an invaluable tool for various data manipulation jobs because of its capacity to analyze data line by line, match patterns, and carry out actions according to user-defined rules.
AWK allows users to extract specific information from files, filter data based on criteria, do calculations and statistical analysis, and create custom reports, all thanks to its vast built-in functions and variables. It offers a practical method to format data, automate repetitive activities, and produce structured output for additional processing.
In the Unix/Linux environment, the AWK command is often used and integrated into the Unix toolkit. It is a popular option for many system administrators, developers, and data analysts who work with textual data due to its simplicity, power, and versatility.
If you’re interested in using AWK for your text management operations, you need a dependable platform for your Linux OS. RedSwitches offers the bare metal servers you need to run your favorite Linux distributions. Get in touch with our engineers and get free consultations for setting up your bare metal infrastructure.
FAQ’s
Q1: What is the basic syntax of an AWK command?
A: An AWK command has the following basic syntax: the file named awk ‘pattern {action}’ file in the pattern identifies the lines or records that must match, and the action identifies the series of instructions that must be carried out when a match occurs.
Q2: How can I specify multiple patterns and actions in an AWK command?
A: Semicolons (;) are used to separate the patterns and actions pairs. awk ‘pattern1 {action1}; pattern2 {action2}’ file.
Q3: How can I print specific columns or fields from a file using AWK?
A: AWK’s “print” statement allows you to output particular columns or fields. You can specify the field name or the column number within the print statement. For instance, awk “{print $1, $3}’ file” will print the first and third columns
Q4: Can AWK perform arithmetic calculations?
A: Yes, AWK supports arithmetic calculations. In AWK expressions, you can use mathematical operators such as +, -, *, /, and %. For example, awk ‘{total = $1 + $2; print total}’ file calculates the sum of the first two columns.
Q5: How can I use AWK to filter lines based on a condition?
A: AWK allows you to specify conditions using relational operators (>, <, ==, etc.) and logical operators (&&, ||, !). For example, the awk ‘$1 > 10 {print}’ file will print the lines where the first column is greater than 10.
Q6: Can I use AWK with command-line options?
A: Yes, AWK provides several command-line options to customize its behavior. For example, the -F option specifies the field separator, the -v option allows you to assign variable values, and the -f option lets you specify an AWK script file.
Q7: Does AWK support regular expressions?
A: Yes, AWK supports regular expressions for pattern matching. You can use regular expressions to match patterns within fields or lines of the input file.
Q8: Are there any resources available to learn more about AWK?
A: Yes, various resources are available to learn more about AWK. The official AWK documentation, AWK tutorials, and online forums can help understand its features and usage.