General Introduction ¶
This file documents awk , a program that you can use to select particular records in a file and perform operations upon them.
Copyright © 1989, 1991, 1992, 1993, 1996–2005, 2007, 2009–2023
Free Software Foundation, Inc.
This is Edition 5.3 of GAWK: Effective AWK Programming: A User’s Guide for GNU Awk , for the 5.3.0 (or later) version of the GNU implementation of AWK.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with the Invariant Sections being “GNU General Public License”, with the Front-Cover Texts being “A GNU Manual”, and with the Back-Cover Texts as in (a) below. A copy of the license is included in the section entitled “GNU Free Documentation License”.
- The FSF’s Back-Cover Text is: “You have the freedom to copy and modify this GNU manual.”
Short Table of Contents
- Foreword to the Third Edition
- Foreword to the Fourth Edition
- Preface
- 1 Getting Started with awk
- 2 Running awk and gawk
- 3 Regular Expressions
- 4 Reading Input Files
- 5 Printing Output
- 6 Expressions
- 7 Patterns, Actions, and Variables
- 8 Arrays in awk
- 9 Functions
- 10 A Library of awk Functions
- 11 Practical awk Programs
- 12 Advanced Features of gawk
- 13 Internationalization with gawk
- 14 Debugging awk Programs
- 15 Namespaces in gawk
- 16 Arithmetic and Arbitrary-Precision Arithmetic with gawk
- 17 Writing Extensions for gawk
- Appendix A The Evolution of the awk Language
- Appendix B Installing gawk
- Appendix C Implementation Notes
- Appendix D Basic Programming Concepts
- Glossary
- GNU General Public License
- GNU Free Documentation License
- Index
Table of Contents
- Foreword to the Third Edition
- Foreword to the Fourth Edition
- Preface
- History of awk and gawk
- A Rose by Any Other Name
- Using This Book
- Typographical Conventions
- 1 Getting Started with awk
- 1.1 How to Run awk Programs
- 1.1.1 One-Shot Throwaway awk Programs
- 1.1.2 Running awk Without Input Files
- 1.1.3 Running Long Programs
- 1.1.4 Executable awk Programs
- 1.1.5 Comments in awk Programs
- 1.1.6 Shell Quoting Issues
- 2.1 Invoking awk
- 2.2 Command-Line Options
- 2.3 Other Command-Line Arguments
- 2.4 Naming Standard Input
- 2.5 The Environment Variables gawk Uses
- 2.5.1 The AWKPATH Environment Variable
- 2.5.2 The AWKLIBPATH Environment Variable
- 2.5.3 Other Environment Variables
- 3.1 How to Use Regular Expressions
- 3.2 Escape Sequences
- 3.3 Regular Expression Operators
- 3.3.1 Regexp Operators in awk
- 3.3.2 Some Notes On Interval Expressions
- 4.1 How Input Is Split into Records
- 4.1.1 Record Splitting with Standard awk
- 4.1.2 Record Splitting with gawk
- 4.5.1 Whitespace Normally Separates Fields
- 4.5.2 Using Regular Expressions to Separate Fields
- 4.5.3 Making Each Character a Separate Field
- 4.5.4 Working With Comma Separated Value Files
- 4.5.5 Setting FS from the Command Line
- 4.5.6 Making the Full Line Be a Single Field
- 4.5.7 Field-Splitting Summary
- 4.6.1 Processing Fixed-Width Data
- 4.6.2 Skipping Intervening Fields
- 4.6.3 Capturing Optional Trailing Data
- 4.6.4 Field Values With Fixed-Width Data
- 4.7.1 More on CSV Files
- 4.7.2 FS Versus FPAT : A Subtle Difference
- 4.10.1 Using getline with No Arguments
- 4.10.2 Using getline into a Variable
- 4.10.3 Using getline from a File
- 4.10.4 Using getline into a Variable from a File
- 4.10.5 Using getline from a Pipe
- 4.10.6 Using getline into a Variable from a Pipe
- 4.10.7 Using getline from a Coprocess
- 4.10.8 Using getline into a Variable from a Coprocess
- 4.10.9 Points to Remember About getline
- 4.10.10 Summary of getline Variants
- 5.1 The print Statement
- 5.2 print Statement Examples
- 5.3 Output Separators
- 5.4 Controlling Numeric Output with print
- 5.5 Using printf Statements for Fancier Printing
- 5.5.1 Introduction to the printf Statement
- 5.5.2 Format-Control Letters
- 5.5.3 Modifiers for printf Formats
- 5.5.4 Examples Using printf
- 5.8.1 Accessing Other Open Files with gawk
- 5.8.2 Special Files for Network Communications
- 5.8.3 Special File name Caveats
- 5.9.1 Using close() ’s Return Value
- 6.1 Constants, Variables, and Conversions
- 6.1.1 Constant Expressions
- 6.1.1.1 Numeric and String Constants
- 6.1.1.2 Octal and Hexadecimal Numbers
- 6.1.1.3 Regular Expression Constants
- 6.1.2.1 Standard Regular Expression Constants
- 6.1.2.2 Strongly Typed Regexp Constants
- 6.1.3.1 Using Variables in a Program
- 6.1.3.2 Assigning Variables on the Command Line
- 6.1.4.1 How awk Converts Between Strings and Numbers
- 6.1.4.2 Locales Can Influence Conversion
- 6.2.1 Arithmetic Operators
- 6.2.2 String Concatenation
- 6.2.3 Assignment Expressions
- 6.2.4 Increment and Decrement Operators
- 6.3.1 True and False in awk
- 6.3.2 Variable Typing and Comparison Expressions
- 6.3.2.1 String Type versus Numeric Type
- 6.3.2.2 Comparison Operators
- 6.3.2.3 String Comparison Based on Locale Collating Order
- 7.1 Pattern Elements
- 7.1.1 Regular Expressions as Patterns
- 7.1.2 Expressions as Patterns
- 7.1.3 Specifying Record Ranges with Patterns
- 7.1.4 The BEGIN and END Special Patterns
- 7.1.4.1 Startup and Cleanup Actions
- 7.1.4.2 Input/Output from BEGIN and END Rules
- 7.4.1 The if - else Statement
- 7.4.2 The while Statement
- 7.4.3 The do - while Statement
- 7.4.4 The for Statement
- 7.4.5 The switch Statement
- 7.4.6 The break Statement
- 7.4.7 The continue Statement
- 7.4.8 The next Statement
- 7.4.9 The nextfile Statement
- 7.4.10 The exit Statement
- 7.5.1 Built-in Variables That Control awk
- 7.5.2 Built-in Variables That Convey Information
- 7.5.3 Using ARGC and ARGV
- 8.1 The Basics of Arrays
- 8.1.1 Introduction to Arrays
- 8.1.2 Referring to an Array Element
- 8.1.3 Assigning Array Elements
- 8.1.4 Basic Array Example
- 8.1.5 Scanning All Elements of an Array
- 8.1.6 Using Predefined Array Scanning Orders with gawk
- 8.5.1 Scanning Multidimensional Arrays
- 9.1 Built-in Functions
- 9.1.1 Calling Built-in Functions
- 9.1.2 Generating Boolean Values
- 9.1.3 Numeric Functions
- 9.1.4 String-Manipulation Functions
- 9.1.4.1 More about ‘ \ ’ and ‘ & ’ with sub() , gsub() , and gensub()
- 9.2.1 Function Definition Syntax
- 9.2.2 Function Definition Examples
- 9.2.3 Calling User-Defined Functions
- 9.2.3.1 Writing a Function Call
- 9.2.3.2 Controlling Variable Scope
- 9.2.3.3 Passing Function Arguments by Value Or by Reference
- 9.2.3.4 Other Points About Calling Functions
- 10 A Library of awk Functions
- 10.1 Naming Library Function Global Variables
- 10.2 General Programming
- 10.2.1 Converting Strings to Numbers
- 10.2.2 Assertions
- 10.2.3 Rounding Numbers
- 10.2.4 The Cliff Random Number Generator
- 10.2.5 Translating Between Characters and Numbers
- 10.2.6 Merging an Array into a String
- 10.2.7 Managing the Time of Day
- 10.2.8 Reading a Whole File at Once
- 10.2.9 Quoting Strings to Pass to the Shell
- 10.2.10 Checking Whether A Value Is Numeric
- 10.2.11 Producing CSV Data
- 10.3.1 Noting Data file Boundaries
- 10.3.2 Rereading the Current File
- 10.3.3 Checking for Readable Data files
- 10.3.4 Checking for Zero-Length Files
- 10.3.5 Treating Assignments as File names
- 11.1 Running the Example Programs
- 11.2 Reinventing Wheels for Fun and Profit
- 11.2.1 Cutting Out Fields and Columns
- 11.2.2 Searching for Regular Expressions in Files
- 11.2.3 Printing Out User Information
- 11.2.4 Splitting a Large File into Pieces
- 11.2.5 Duplicating Output into Multiple Files
- 11.2.6 Printing Nonduplicated Lines of Text
- 11.2.7 Counting Things
- 11.2.7.1 Modern Character Sets
- 11.2.7.2 A Brief Introduction To Extensions
- 11.2.7.3 Code for wc.awk
- 11.3.1 Finding Duplicated Words in a Document
- 11.3.2 An Alarm Clock Program
- 11.3.3 Transliterating Characters
- 11.3.4 Printing Mailing Labels
- 11.3.5 Generating Word-Usage Counts
- 11.3.6 Removing Duplicates from Unsorted Text
- 11.3.7 Extracting Programs from Texinfo Source Files
- 11.3.8 A Simple Stream Editor
- 11.3.9 An Easy Way to Use Library Functions
- 11.3.10 Finding Anagrams from a Dictionary
- 11.3.11 And Now for Something Completely Different
- 12 Advanced Features of gawk
- 12.1 Allowing Nondecimal Input Data
- 12.2 Boolean Typed Values
- 12.3 Controlling Array Traversal and Array Sorting
- 12.3.1 Controlling Array Traversal
- 12.3.2 Sorting Array Values and Indices with gawk
- 13.1 Internationalization and Localization
- 13.2 GNU gettext
- 13.3 Internationalizing awk Programs
- 13.4 Translating awk Programs
- 13.4.1 Extracting Marked Strings
- 13.4.2 Rearranging printf Arguments
- 13.4.3 awk Portability Issues
- 14.1 Introduction to the gawk Debugger
- 14.1.1 Debugging in General
- 14.1.2 Debugging Concepts
- 14.1.3 awk Debugging
- 14.2.1 How to Start the Debugger
- 14.2.2 Finding the Bug
- 14.3.1 Control of Breakpoints
- 14.3.2 Control of Execution
- 14.3.3 Viewing and Changing Data
- 14.3.4 Working with the Stack
- 14.3.5 Obtaining Information About the Program and the Debugger State
- 14.3.6 Miscellaneous Commands
- 15.1 Standard awk ’s Single Namespace
- 15.2 Qualified Names
- 15.3 The Default Namespace
- 15.4 Changing The Namespace
- 15.5 Namespace and Component Naming Rules
- 15.6 Internal Name Management
- 15.7 Namespace Example
- 15.8 Namespaces and Other gawk Features
- 15.9 Summary
- 16.1 A General Description of Computer Arithmetic
- 16.2 Other Stuff to Know
- 16.3 Arbitrary-Precision Arithmetic Features in gawk
- 16.3.1 Arbitrary Precision Arithmetic is On Parole!
- 16.3.2 Arbitrary Precision Introduction
- 16.4.1 Floating-Point Arithmetic Is Not Exact
- 16.4.1.1 Many Numbers Cannot Be Represented Exactly
- 16.4.1.2 Be Careful Comparing Values
- 16.4.1.3 Errors Accumulate
- 16.4.1.4 Floating Point Values They Didn’t Talk About In School
- 17.1 Introduction
- 17.2 Extension Licensing
- 17.3 How It Works at a High Level
- 17.4 API Description
- 17.4.1 Introduction
- 17.4.2 General-Purpose Data Types
- 17.4.3 Memory Allocation Functions and Convenience Macros
- 17.4.4 Constructor Functions
- 17.4.5 Managing MPFR and GMP Values
- 17.4.6 Registration Functions
- 17.4.6.1 Registering An Extension Function
- 17.4.6.2 Registering An Exit Callback Function
- 17.4.6.3 Registering An Extension Version String
- 17.4.6.4 Customized Input Parsers
- 17.4.6.5 Customized Output Wrappers
- 17.4.6.6 Customized Two-way Processors
- 17.4.11.1 Variable Access and Update by Name
- 17.4.11.2 Variable Access and Update by Cookie
- 17.4.11.3 Creating and Using Cached Values
- 17.4.12.1 Array Data Types
- 17.4.12.2 Array Functions
- 17.4.12.3 Working With All The Elements of an Array
- 17.4.12.4 How To Create and Populate Arrays
- 17.4.14.1 API Version Constants and Variables
- 17.4.14.2 GMP and MPFR Version Information
- 17.4.14.3 Informational Variables
- 17.6.1 Using chdir() and stat()
- 17.6.2 C Code for chdir() and stat()
- 17.6.3 Integrating the Extensions
- 17.7.1 File-Related Functions
- 17.7.2 Interface to fnmatch()
- 17.7.3 Interface to fork() , wait() , and waitpid()
- 17.7.4 Enabling In-Place File Editing
- 17.7.5 Character and Numeric values: ord() and chr()
- 17.7.6 Reading Directories
- 17.7.7 Reversing Output
- 17.7.8 Two-Way I/O Example
- 17.7.9 Dumping and Restoring an Array
- 17.7.10 Reading an Entire File
- 17.7.11 Extension Time Functions
- 17.7.12 API Tests
- Appendix A The Evolution of the awk Language
- A.1 Major Changes Between V7 and SVR3.1
- A.2 Changes Between SVR3.1 and SVR4
- A.3 Changes Between SVR4 and POSIX awk
- A.4 Extensions in Brian Kernighan’s awk
- A.5 Extensions in gawk Not in POSIX awk
- A.6 History of gawk Features
- A.7 Common Extensions Summary
- A.8 Regexp Ranges and Locales: A Long Sad Story
- A.9 Major Contributors to gawk
- A.10 Summary
- B.1 The gawk Distribution
- B.1.1 Getting the gawk Distribution
- B.1.2 Extracting the Distribution
- B.1.3 Contents of the gawk Distribution
- B.2.1 Compiling gawk for Unix-Like Systems
- B.2.1.1 Building With MPFR
- B.3.1 Installation on MS-Windows
- B.3.1.1 Installing a Prepared Distribution for MS-Windows Systems
- B.3.1.2 Compiling gawk for PC Operating Systems
- B.3.1.3 Using gawk on PC Operating Systems
- B.3.1.4 Using gawk In The Cygwin Environment
- B.3.1.5 Using gawk In The MSYS Environment
- B.3.2.1 Compiling gawk on OpenVMS
- B.3.2.2 Compiling gawk Dynamic Extensions on OpenVMS
- B.3.2.3 Installing gawk on OpenVMS
- B.3.2.4 Running gawk on OpenVMS
- B.3.2.5 The OpenVMS GNV Project
- B.4.1 Defining What Is and What Is Not A Bug
- B.4.2 Submitting Bug Reports
- B.4.3 Please Don’t Post Bug Reports to USENET
- B.4.4 What To Do If You Think There Is A Performance Issue
- B.4.5 Where To Send Non-bug Questions
- B.4.6 Reporting Problems with Non-Unix Ports
- C.1 Downward Compatibility and Debugging
- C.2 Making Additions to gawk
- C.2.1 Accessing The gawk Git Repository
- C.2.2 Adding New Features
- C.2.3 Porting gawk to a New Operating System
- C.2.4 Why Generated Files Are Kept In Git
- C.5.1 Problems With The Old Mechanism
- C.5.2 Goals For A New Mechanism
- C.5.3 Other Design Decisions
- C.5.4 Room For Future Growth
- D.1 What a Program Does
- D.2 Data Values in a Computer
- ADDENDUM: How to use this License for your documents