Linux awk命令是处理文本文件的一个应用程序,几乎所有Linux系统都自带这个程序。它依次处理文件的每一行,并读取里面的每一个字段。
Syntax
awk <options> 'Program' Input-File1 Input-File2 ...
awk -f PROGRAM-FILE <options> Input-File1 Input-File2 ...
Key
-F FS
--field-separator FS
Use FS for the input field separator (the value of the 'FS'
predefined variable).
-f PROGRAM-FILE
--file PROGRAM-FILE
Read the awk program source from the file PROGRAM-FILE, instead
of from the first command line argument.
-mf NNN
-mr NNN
The 'f' flag sets the maximum number of fields, and the 'r' flag
sets the maximum record size. These options are ignored by
'gawk', since 'gawk' has no predefined limits; they are only for
compatibility with the Bell Labs research version of Unix awk.
-v VAR=VAL
--assign VAR=VAL
Assign the variable VAR the value VAL before program execution
begins.
-W traditional
-W compat
--traditional
--compat
Use compatibility mode, in which 'gawk' extensions are turned off.
-W lint
--lint
Give warnings about dubious or non-portable awk constructs.
-W lint-old
--lint-old
Warn about constructs that are not available in the original
Version 7 Unix version of awk.
-W posix
--posix
Use POSIX compatibility mode, in which 'gawk' extensions are
turned off and additional restrictions apply.
-W re-interval
--re-interval
Allow interval expressions, in regexps.
-W source=PROGRAM-TEXT
--source PROGRAM-TEXT
Use PROGRAM-TEXT as awk program source code. This option allows
mixing command line source code with source code from files, and is
particularly useful for mixing command line programs with library
functions.
--
Signal the end of options. This is useful to allow further
arguments to the awk program itself to start with a '-'. This
is mainly for consistency with POSIX argument parsing conventions.
'Program'
A series of patterns and actions: see below
Input-File
If no Input-File is specified then awk applies the Program to
"standard input", (piped output of some other command or the terminal.
Typed input will continue until end-of-file (typing 'Control-d')
awk 的基本功能是在文件中搜索包含模式的行(或其他文本单元)。
当一行匹配时,awk 对该行执行特定操作。
假设我们有一个文件,其中每一行都是一个名字后跟一个电话号码。在 AWK 中,第一个字段称为,第二个字段称为2,依此类推。
所以一个检索 Linux 电话号码的 AWK 程序是:
> awk $1 == "Linux" {print $2} numbers.txt
这意味着如果第一个字段匹配Linux,则打印第二个字段。
在 awk 中,$0是整行参数
告诉awk做什么的Program语句;由一系列规则组成。每个规则指定一个要搜索的模式,以及在找到该模式时要执行的一项操作。
为了便于阅读,awk 程序中的每一行通常是一个单独的Program语句,如下所示:
pattern { action }
pattern { action }
...
例如显示包含字符串123或abc或some text的样本文件的行:
awk '/123/ { print $0 }
/abc/ { print $0 }
/some text/ { print $0 }' samplefile
用斜杠 (/) 括起来的正则表达式是一个 awk 模式,它匹配其文本属于该集合的每个输入记录。
awk 模式可以是以下之一:
/Regular Expression/ - Match =
Pattern && Pattern - AND
Pattern || Pattern - OR
! Pattern - NOT
Pattern ? Pattern : Pattern - If, Then, Else
Pattern1, Pattern2 - Range Start - end
BEGIN - Perform action BEFORE input file is read
END - Perform action AFTER input file is read
特殊模式 BEGIN 和 END 可用于在读取第一个输入行之前和最后一个输入行之后捕获控制。BEGIN 和 END 不与其他模式组合。
具有特殊含义的变量名:
CONVFMT conversion format used when converting numbers
(default %.6g)
FS regular expression used to separate fields; also
settable by option -Ffs.
NF number of fields in the current record
NR ordinal number of the current record
FNR ordinal number of the current record in the current
file
FILENAME the name of the current input file
RS input record separator (default newline)
OFS output field separator (default blank)
ORS output record separator (default newline)
OFMT output format for numbers (default %.6g)
SUBSEP separates multiple subscripts (default 034)
ARGC argument count, assignable
ARGV argument array, assignable; non-null members are
taken as filenames
ENVIRON array of environment variables; subscripts are
names.
例子
打印ls - l
列表中的第五列($5)
> $ ls -l | awk {print $5}
打印行号 (NR),然后是短划线和空格(-
),然后是 samplefile.txt 中每一行的第一项 ($1):
> $ awk {print NR "-" $1 } samplefile.txt
打印samplefile.txt 中每一行的第一项 (和倒数第三项(NF-2) :
> $ awk {print $1, $(NF-2) } samplefile.txt
删除空行
> awk 'NF > 0' data.txt
与grep比较
对以下文件列表运行grep Dec将返回以粗体显示的 3 行,因为它匹配不同位置的文本:
-rw-r--r-- 7 rumenz rumenz 12043 Jan 31 09:36 Linux.pdf
-rw-r--r-- 3 rumenz rumenz 1024 Dec 01 11:59 README
-rw-r--r-- 3 rumenz rumenz 5096 Nov 14 18:22 Linux.txt
对同一个文件列表运行awk $6 == “Dec”(第 6 列 = Dec)
> $ ls -l /tmp/demo | awk '$6 == "Dec"'
打印最长行的长度:
> awk '{ if (length($0) > max) max = length($0) }
END { print max }' data
打印从 0 到 100 的七个随机数
> awk 'BEGIN { for (i = 1; i <= 7; i++)
print int(101 * rand()) }'
打印FILES
使用的字节总数:
> ls -lg FILES | awk '{ x += $5 } ; END { print "total bytes: " x }'
打印目录中所有 .png
文件的平均文件大小:
> ls -l *.png | gawk '{sum += $5; n++;} END {print sum/n;}'
打印所有的登录名:
> awk -F: '{ print $1 }' /etc/passwd | sort
计算文件的行数:
> awk 'END { print NR }' data
打印数据文件中的偶数行。如果是NR % 2 == 1
它将打印奇数行。
> awk 'NR % 2 == 0' data
转载请注明:IT运维空间 » linux » Linux awk命令:文本和数据进行处理的编程语言
发表评论