Teaching:
FeedbackThis is an old revision of the document!
cat [file]...
outputs input files one after another or outputs the
standard input if no files are specified.
Name comes from concatenate.
cat
numbers lines with the -n
switch and outputs non-printable characters
as ^x
, M-x
, … with the -v
switch1).
paste file_1 [file2]...
reads round-robin one line from each input
file and outputs them separated by a tab character, then repeats until the end
of the longest file.
fold [-w width] [file]...
outputs input files (or standard input)
forcing a line break whenever a line would exceed width (that defaults to 80).
With the -s
switch fold
breaks lines on spaces (or at width if there are no spaces).
column [-x] [file]...
works just like cat
if the longest line in the files (or standard input) would not fit twice within the terminal width.
Else, it prints the input in as many columns as fit the terminal, filling column first (or, with -x
, rows first).
column -t [file]...
does a completely different thing: it detects columns in input (by a separator that defaults to whitespace) and outputs the input as a table.
od [-t x1]
, hexdump [-C]
, and xxd
show binary files.
Exercise 1 Print any file with cat
. Print two files at once with cat
.
(You may use /etc/SUSE-brand
and /etc/os-release
if you cannot come up with any other file.)
Exercise 2 Run the cat
command, input some text, then press Return followed by Ctrl+d.
Exercise 3 Cat /usr/share/doc/mpich/user.pdf
with and without -v
switch.
Exercise 4 Paste a file with itself.
Exercise 5 Display a binary file /usr/share/themes/Breeze/assets/line-h.png
.
echo text
outputs text followed by a newline (unless -n
is specified).
The -e
switch turns backslash escapes into corresponding characters, e.g., \t
becomes a tab and \n
a newline (cf. manual).
printf format [arguments]...
works roughly the same as the printf
function in C.
figlet [text]
outputs text or the standard input by using ascii-art font.
cowsay [text]
makes a cow say the text (or the standard input).
Exercise 6 Try echo -e 'foo\n\nbaz'
and echo -e '\n\n one \033[A \033[A two \033[B \033[B \n \033[1;31m red \033[0m'
ANSI escape codes are well summarized here
Exercise 7 Try printf "|%4.2f|%3s|%-20s|\n|%4.2f|%3s|%-20s|\n" 3.1428 pi circumference/radius 9.8 g gravity
Exercise 8 Install cowsay
and figlet
with sudo zypper -q in -y figlet cowsay
.
Try figlet wololo
and cowsay moo
seq [from [step]] to
generates a sequence of numbers starting from from incrementing it by step until it does not exceed to.
from and step default to 1.
With the -w
switch, seq
makes all number of equal width (e.g., seq -w 8 11
outputs 08, 09, 10 and 11).
Exercise 9 Generate a sequence of numbers from 1 to 15.
Exercise 10 Generate a sequence of numbers from 64 to 1024 with step of 64.
K&R C[1] [2]printf("Please type your name:\n") scanf("%s", name); | Pythonprint("Please type your name:") name = input() |
Did you ever wonder how does a program know where to read input from and where the print-like functions should output?
In UNIX world a program expects to have three files already open upon
start – standard input, standard output and standard error.
These are called the standard streams.
The standard input/output library of the C programming language – stdio.h
–
bases on this concept. C was created by one of the authors of UNIX.
Basic I/O functions in most programming languages by default read from the
standard input, and output data to standard output.
The rationale of standard error stream is to convey information on what went
wrong while executing a program. Programming languages usually offer dedicated
functions to output data to standard error.
In Unix-like, as well as POSIX-compatible systems, the operating system is
responsible for abstracting files away – the user should not worry about
details of accessing a file.
When the user wants to open a file, the user provides the file name and gets an
identifier – a file descriptor in return.
(A file descriptor is in fact an index in an array of files maintained for the
process by the OS.)
To do standard operations such as reading or writing data, the user just tells
which operation shall be executed, on which file descriptor, and the user shall
provide the details of the operations (such as where to put data read from
the file and how many bytes shall be read).
A child process inherits all file descriptors from its parent.
The three standard streams are the files represented by first three file descriptors – 0 is always used for standard input, 1 for stand output and 2 for standard error.
The files do not need to be ordinary files – Unix-like systems abstract almost
everything with a file.
For instance, a terminal device is a file (even if it were a real teletype).
By default, a shell opens the terminal as file 0, 1 and 2.
POSIX-compatible shells can replace standard streams with files specified by the user.
command > filename
command 2> filename
command &> filename
Warning: this is a Bash extension
command >> filename
/dev/null
is a device that discards any data written to it.
Exercise 11 The date
command outputs the current date. Redirect its output to a file.
Exercise 12 Append a new date to the file from the previous exercise.
Exercise 13 Try the cat /etc/motd /etc/shadow
command. Redirect the standard error to a file.
Exercise 14 Try the find /var/spool/
command (find
will be discussed later on). Redirect the standard error to the /dev/null
.
Exercise 15 Redirect standard output of the find /var/spool/
command to one file and the standard error to another file.
Exercise 16 Redirect standard output and the standard error of the find /var/spool/
command to the same file.
command < filename
command << delimiter
(here documents)
command <<< string
(here string) Warning: this is a Bash extension
Exercise 17 Create a file containing print("hello " + __file__)
. Run the python
command redirecting input from the file.
Exercise 18 Use hexdump -C
to display a hex dump of arbitrary text passed as a here document.
Include a multi-byte character in the text.
Exercise 19 bc
is a simple calculator. Use it to calculate sqrt(2.0000)
.
Then use it to calculate sqrt(2.0000)
in non-interactive mode.
Exercise 20 Use bc
to calculate sqrt(2.0000)
in non-interactive mode and redirect its output to a file.
POSIX documentation on redirection
Bash documentation on redirection
Every redirection consists of: [file_number]operator word
The file_number defaults to 0 if the operator contains <
, else it defaults to 1.
So command < file
is the same as command 0< file
, and command >> file
is the same as command 1>> file
.
Stream numbers from 0 to 9 are always safe to use. Consult documentation of your shell for other numbers.
The operator may be one of:
< | opens word for reading and replaces file_number with the file |
> | if a file word exists and the noclobber option is set2), then fails; else opens word for writing, truncates it and replaces file_number with the file |
>| | opens word for writing, truncates it and replaces file_number with the file (even if word exists) |
>> | opens word for appending and replaces file_number with the file |
<> | opens word for reading and writing and replaces file_number with the file |
<< | 1) creates a temporary file 2) reads an input line 3) if the line is word, go to step 6 4) if there are no quotes (a pair of " or ' ) in word, performs the expansion3) on the line 5) writes the line to the temporary file 6) opens the temporary file for reading 7) replace file_number with the file The command is run once this is done |
<<- | same as << , but after step 2 adds a step: 2a) erase all leading tab characters ( \t ) warning: spaces are not erased |
<<< | warning: this is a Bash extension 1) creates a temporary file 2) writes word to it 3) writes a newline to it 4) opens the temporary file for reading 5) replace file_number with the file The command is run once this is done |
<& | if word is a number: duplicates a readable descriptor number word to the number file_number if word/ is - : closes a descriptor number file_number |
>& | if word is a number: duplicates a writeable descriptor number word to the number file_number if word/ is - : closes a descriptor number file_number |
&> | warning: this is a Bash extension warning: this does not allow providing file_number (in fact, & is the file number) opens word for writing, truncates it and replaces streams 1 and 2 with the file |
&>> | warning: this is a Bash extension warning: this does not allow providing file_number (in fact, & is the file number) opens word for appending and replaces streams 1 and 2 with the file |
Exercise 21 Run cat /etc/motd
with closed standard output descriptor.
Run find /var/spool/
with closed standard error descriptor.
Exercise 22 Copy a large text file (e.g., /etc/services
) to file.
Run hexdump
redirecting input from file using the <>
operator,
and duplicate standard input to standard output. Check what happens then.
Warning: do not use <>
twice with the same file for standard input and
standard output (unless you dare to face the consequences).
Exercise 23 Swap standard input with standard output of the
cat /etc/motd /etc/shadow
command. Test if you did this correctly by adding
|rev
at the end (that will put standard output backwards).
The exec
command can be used to manipulate standard streams of the current shell.
exec 3>&1 # copies the current standard output to descriptor 3 exec 1>myFile # replaces standard output with myFile date # writes a date to standard output (which is now myFile) fortune # " a fortune " exec 1>&3 3>&- # restores the standard output from 3 and closes 3
Bash documentation on process substitution
The syntax command1 <(command2)
and command1 >(command2)
are not redirects.
Bash replaces <(command2)
with a name of a temporary file, starts command2
in background and sets its standard output to the file.
Bash replaces >(command2)
with a name of a temporary file, starts command2
in background and sets its standard input to the file.
Exercise 24 Execute and understand the results of:
/bin/ls >(echo a) >(echo b) <(echo c)
stat >(echo c)
stat >(sleep 3; fortune)
cat <(echo a)
echo 'abc' > >(rev)
cat <(date) > >(rev)
cat < <(date) > >(rev)
cat <(date) <(date) <(date) > >(rev)
cat <(date) <(date) < <(date) > >(rev)
First, recall: standard I/O functions read from the standard input, and output data to standard output.
Unix-like systems favoured for years 4) programs that do one job well.
Complex tasks can be done easily by combing such programs.
For example: say you want to learn how many processes are owned by each user in the system.
ps -ef
lists all processes. So you can do ps -ef > ps_output
.cut --delimiter ' ' --field 1 < ps_output > cut_output
to cut the first space-delimited field in each line.sort < cut_output > sort_output
first, so that repeating usernames are one after another.uniq --count < sort_output
to leave only non-repeating (unique) lines together with their count.Creating files with each intermediate result is usually a bad idea.
UNIX came up with the idea of connecting standard output of one program to standard input of another program.
Instead of the commands above, one can run ps -ef | cut --delimiter ' ' --field 1 | sort | uniq --count
that does the same without creating any files on disk.
This is technically done using a special kind of file called pipe – the shell creates a pipe (in the main memory), replaces standard output of one program with the pipe, and replaces standard input of the other program with the same pipe.
To connect standard output of cmd_a with standard input of cmd_b one writes cmd_a | cmd_b
.
This is called piping output of one program into another. The programs are run in parallel.
cmd_a | cmd_b
is sometimes referred do as pipeline.
A common practice for programs following the Unix philosophy is to read from
files specified by arguments or from standard input when no file is specified.
Moreover, typically whenever -
is encountered where a file name is required,
standard input is used instead.
Using cmd_a | cmd_b
creates what is called an anonymous pipe.
One can create a named pipe using a command mkfifo filename
.
All that is written to a pipe is stored in the main memory (so it never occupies
disk space, regardless if it is a named pipe) until some program reads it.
Exercise 25 Run echo '2+2*2'
. Then, pipe it through bc
.
Exercise 26 echo
some text. Then, echo the text and pipe it through xxd
.
Exercise 27 List files in your home directory.
List the files again, piping it through cat
.
List the files yet another time, now piping it through cat -n
.
Exercise 28 Pipe results of ps -eF
through fold
Exercise 29 Create a named pipe p. Redirect input of fold
from p
in one terminal, and redirect output of ps -eF
to p in another terminal.
Then repeat the commands, running theps
before the fold
.
There is a number of programs (stemming from UNIX) that are called collectively filters.
A filter is a program that processes input in an useful way and writes it to its output.
The head
and tail
programs output a specified number of leading / trailing lines (or bytes).
By default head
and tail
output 10 lines.
By providing -n count
/ -c count
, they output count lines / bytes
When head
is given a number preceded by '-', e.g. head -n -10
, then it outputs all except last 10 lines.
When tail
is given a number preceded by '+', e.g. tail -n +10
, then it outputs all lines starting from 10th.
Exercise 30 Run paste <(seq 15) <(seq 15 -1 1)
. Then, pipe its output through head and/or tail to see:
The tail
command accepts a switch -f
/ --follow
.
tail -f …
will first output as usual, and then wait and output any data appended to the file.
Exercise 31 Run seq 25 > file
. Then run tail -f file
in one
terminal and append (with output redirection) some data to file.
The grep regex [file]...
program prints lines of the input
files (or standard input) that match the regex.
When the switch -r
/ -R
is specified, file can be a directory and
grep will match regex recursively for all files within the directory.
The grep
program accepts several regular expression grammars.
POSIX specifies
basic (default for grep
) and extended regular expressions (selectable with
egrep
or grep -E
). See manual for your implementation of grep
for
more on the grammars.
By default regular expressions perform case-sensitive matching. To switch
to case-insensitive mode, one can add -i
switch to grep.
With the switch -v
, grep
outputs non-matching lines.
grep
can output all matching lines (default), first match or only indicate
whether a file matches.
It can also prepend matching lines with line number
and/or file name (the latter being default whenever multiple input files are
given).
Moreover, grep
can output N lines before match (-B N
), after
match (-A N
) or before and after (which is called context, hence
-C N
).
Exercise 32 Filter seq 75
with grep to see:
5
5
5
or 0
33
and 3 lines before it33
and 4 lines around it
Exercise 33
Display all lines containing 10
in files /etc/passwd
and /etc/group
.
List all files containing ecdsa
in ~/.ssh
.
The cut
program outputs only selected characters (-c spec
) /
bytes (-b spec
) / fields (-f spec
) in each line.
A filed is any number of characters separated by a single-character
delimiter (-d delim
, defaults to tab).
To specify fields/bytes/… one shall write range[,range]...
where
a range is num
, or start-end
, or start-
, or -end
with intuitive meaning.
For instance, echo 123456789abcdef | cut -c -3,6,9-11,14-
outputs 12369abef
(colors added for clarity).
Exercise 34 Filter output of the mount
command (or /etc/mtab
file) to
cut only the fifth (or third in case of /etc/mtab
) space-separated field
(that contains the filesystem type).
Exercise 35 Remove from the output of
egrep '^[Ee]{2}' /usr/share/myspell/en_US.dic
a slash and all that follows
it.
The sort
program by default sorts lines in alphabetical order.
The options -k
defines sort keys.
sort -k4
uses columns 4,5,6,7,8,…sort -k4,4
uses only column 4sort -k4,6
uses columns 4, 5 and 6sort -k5,4
sort -k5,5 -k4,4
uses columns 5 and 4
Sort keys can have options, e.g., -n
sorts numerically and -r
reverses sort direction.
The options can be used for all sort keys, or for selected sort keys only:
sort -r -k5,5 -k4,4
sorts both column 5 and column 4 descendingsort -k5,5r -k4,4
sorts column 5 descending and column 4 ascendingsort -k5,5 -k4,4r
sorts column 5 ascending and column 4 descending
sort
is not stable unless -s
(--stable
) switch is used.
The sort
program has much more to offer. See manual for details.
Exercise 36 Create a file with input data for the next exercises by coying & pasting the following command in your shell:
paste \ <(perl -e 'printf "%d\n", rand(10) for(1..20)') \ <(perl -e 'print((K,Q,J)[rand(3)]."\n") for(1..20);') \ <(perl -e 'printf "%d\n", rand(1500) for(1..20)') \ <(perl -e 'my @a=("a","b","c"); print $a[rand(@a)] . $a[rand(@a)] ."\n" for(1..20);') \ <(perl -e 'printf "%d\n", rand(1500) for(1..20)') \ <(perl -e 'my @a=("x","y","z"); print $a[rand(@a)] . $a[rand(@a)] ."\n" for(1..20);') \ <(seq -w 20) \ > random_data
Exercise 37 Display the file. Then sort it.
Exercise 38 Sort the file ignoring first two columns. Sort the file numerically ignoring first two columns.
Exercise 39 Sort the file by the column with K/Q/J and the column with xyz characters (in that order).
Exercise 40 Sort the file by the second column without and with the --stable
option.
Exercise 41 Sort the file by the second column (alphabetically) and by the third column (numerically).
The wc
(word count) program counts lines, words and bytes.
When multiple files are provided as arguments, wc
displays information on each file as well as a line with totals.
The options -l
, -w
and -c
select lines, words and bytes.
The option -m
counts all characters (including non-printable characters).
This matters for multi-byte characters: wc -mc <<< "‡∞♣"
counts 10 bytes and 4 characters (the three visible and a newline).
The uniq
program by default removes repeating lines.
With switches, it can among others:
-c
— prefix all lines with repetition count-d
— print only the repeating lines-u
— print only lines that do not repeat
nl
numbers lines. It can also numer lines in text files organized into sections and pages.
Exercise 42 Pipe man wc
through cat
. Then pipe man wc
through wc
. How many words are there?
Exercise 43 See the results of wc /etc/motd /etc/SUSE-brand
.
Exercise 44 The perl -e 'printf "%d\n", (int rand(6)+1)+(int rand(6)+1) for(1..100)'
command rolls 100 times 2d6.
Pipe it through uniq
to see rolls with same results in a row.
Then pipe it through sort
and uniq
so that you see how many times each result was hit.
tac
outputs lines in reverse order.
rev
outputs characters in each line in reverse order.
Exercise 45 See the result of echo -e '1 2 3\n4 5 6\n7 8 9'
. Then pipe it through tac
, and finally pipe it through rev
.
The tr
program replaces or deletes characters.
The command tr -d LIST
deletes all characters that are in the LIST.
The command tr FROM TO
translates each n-th character from the list FROM to n-th character in the list TO.
If TO is shorted than FROM, last character form TO is used instead.
With the switch -s
, whenever consecutive characters translate to the same character x, only a single character x is output.
The switch -c
translates all characters that are not in the FROM list.
The lists may contain character ranges (e.g., [0-9]
, [a-f]
) and character classes (e.g., [:alnum:]
, [:space:]
).
Exercise 46 Pipe ls -l
through tr
to:
rwx
The sed
(stream editor) program reads input (standard input or files) line by line and
executes a user-provided script for transforming the line and by default outputs
each line once the script has been fully executed for this line. sed
is Turing-complete.
sed
is commonly used for regex-based search & replace.
The most basic command for this is: sed 's/regexp/replacement/'
.
sed
is out of scope of this course.
awk is another text processing language.
Roughly, it also reads input line by line and executes a user-provided script.
An awk
script consists of rules, and each rule has a condition (that matches
against line contents or selects start/end of a file or whole execution) and
a set of instructions run when the condition is satisfied.
awk
is out of scope of this course.
To display data that does not fit into terminal one can use one of many programs
that are collectively called pagers.
A pager displays at a time as much text as it fits in the terminal and allows
the user to go to a next portion of the text (typically by pressing a key such as space).
Most operating systems (as well as the POSIX standard) include a program called
more
as a rudimentary pager.
Unix-like systems usually come with a pager called less
which is more than more
.
less
parses data through a program indicated by the $LESSOPEN
environmental
variable.
Such program usually outputs human-readable data upon detecting a known
not human-readable file format.
lesspipe
is the leading implementation
of this feature.
less
can be used with the following switches:
-S
– don't wrap lines-R
– output escape sequences that encode colors as themselves rather than erasing them-L
– disable processing input by whatever $LESSOPEN
indicates-N
– number lines
To display help from within less
, type h
. A choice of other useful key shortcuts:
tail -f
)
The man
command usually uses less
as the pager.
Exercise 47 Type man less
to see manual page for less
in less
. Test the key shortcuts mentioned above.
Exercise 48 Open a PDF file (e.g., /usr/share/doc/packages/apparmor-docs/techdoc.pdf
) with less, with and without -L option.
Open a tar
archive with less (e.g., /usr/share/doc/packages/automake/amhello-1.0.tar.gz
).
View a directory (e.g., /usr/include
) with less.
The tee [-a] file...
command writes every byte read from standard input to standard output and to every file specified.
With the -a
switch tee
appends to the file instead of overwriting it.
tee
is commonly used when one wants both to see and to record an output of a long-running command.
Exercise 49 tee
the output of a tree
to a file. View the file with less
.
\t
and \n
is replaced by X
), try: perl -e'for(0..15){printf"\t%x_",$_};print"\n";for$l(0..15){printf"_%x",$l;for$h(0..15){$c=$h<<4|$l;$c=88 if $c==9||$c==10;printf("\t%c",$c)}print"\n"}'|cat -v
set -C
to enable it.