Jan Kończak

This is an old revision of the document!

Outputting file / text / sequences

Printing file contents

cat [file]... outputs input files one after another or outputs the standard input if no files are specified.
Name comes from concatenate.
cat numbers lines with the -n switch and outputs non-printable characters as ^x, M-x, … with the -v switch¹⁾.

paste file_1 [file2]... reads round-robin one line from each input file and outputs them separated by a tab character, then repeats until the end of the longest file.

fold [-w width] [file]... outputs input files (or standard input) forcing a line break whenever a line would exceed width (that defaults to 80).
With the -s switch fold breaks lines on spaces (or at width if there are no spaces).

column [-x] [file]... works just like cat if the longest line in the files (or standard input) would not fit twice within the terminal width.
Else, it prints the input in as many columns as fit the terminal, filling column first (or, with -x, rows first).
column -t [file]... does a completely different thing: it detects columns in input (by a separator that defaults to whitespace) and outputs the input as a table.

od [-t x1], hexdump [-C], and xxd show binary files.

Exercise 1 Print any file with cat. Print two files at once with cat.
(You may use /etc/SUSE-brand and /etc/os-release if you cannot come up with any other file.)

Exercise 2 Run the cat command, input some text, then press Return followed by Ctrl+d.

Exercise 3 Cat /usr/share/doc/mpich/user.pdf with and without -v switch.

Exercise 4 Paste a file with itself.

Exercise 5 Display a binary file /usr/share/themes/Breeze/assets/line-h.png.

Printing text

echo text outputs text followed by a newline (unless -n is specified).
The -e switch turns backslash escapes into corresponding characters, e.g., \t becomes a tab and \n a newline (cf. manual).

printf format [arguments]... works roughly the same as the printf function in C.

figlet [text] outputs text or the standard input by using ascii-art font.

cowsay [text] makes a cow say the text (or the standard input).

Exercise 6 Try echo -e 'foo\n\nbaz'
and echo -e '\n\n one \033[A \033[A two \033[B \033[B \n \033[1;31m red \033[0m'
ANSI escape codes are well summarized here

Exercise 7 Try printf "|%4.2f|%3s|%-20s|\n|%4.2f|%3s|%-20s|\n" 3.1428 pi circumference/radius 9.8 g gravity

Exercise 8 Install cowsay and figlet with sudo zypper -q in -y figlet cowsay.
Try figlet wololo and cowsay moo

Generating number sequences

seq [from [step]] to generates a sequence of numbers starting from from incrementing it by step until it does not exceed to.
from and step default to 1.
With the -w switch, seq makes all number of equal width (e.g., seq -w 8 11 outputs 08, 09, 10 and 11).

Exercise 9 Generate a sequence of numbers from 1 to 15.

Exercise 10 Generate a sequence of numbers from 64 to 1024 with step of 64.

Standard streams

K&R C[1] [2]

printf("Please type your name:\n")
scanf("%s", name);

Python

print("Please type your name:")
name = input()

Did you ever wonder how does a program know where to read input from and where the print-like functions should output?

In UNIX world a program expects to have three files already open upon start – standard input, standard output and standard error. These are called the standard streams.
The standard input/output library of the C programming language – stdio.h – bases on this concept. C was created by one of the authors of UNIX.

Basic I/O functions in most programming languages by default read from the standard input, and output data to standard output.
The rationale of standard error stream is to convey information on what went wrong while executing a program. Programming languages usually offer dedicated functions to output data to standard error.

In Unix-like, as well as POSIX-compatible systems, the operating system is responsible for abstracting files away – the user should not worry about details of accessing a file.
When the user wants to open a file, the user provides the file name and gets an identifier – a file descriptor in return.
(A file descriptor is in fact an index in an array of files maintained for the process by the OS.)
To do standard operations such as reading or writing data, the user just tells which operation shall be executed, on which file descriptor, and the user shall provide the details of the operations (such as where to put data read from the file and how many bytes shall be read).

A child process inherits all file descriptors from its parent.

The three standard streams are the files represented by first three file descriptors – 0 is always used for standard input, 1 for stand output and 2 for standard error.

The files do not need to be ordinary files – Unix-like systems abstract almost everything with a file.
For instance, a terminal device is a file (even if it were a real teletype).

By default, a shell opens the terminal as file 0, 1 and 2.

Redirections

POSIX-compatible shells can replace standard streams with files specified by the user.

The commonly used redirections

Output redirections

command > filename

opens file filename for writing,
truncates the file,
replaces standard output stream with the file.

command 2> filename

opens file filename for writing,
truncates the file,
replaces standard error stream with the file.

command &> filename Warning: this is a Bash extension

opens file filename for writing,
truncates the file,
replaces standard output stream AND standard error stream with the file.

command >> filename

opens file filename for writing in append mode,
replaces standard error stream with the file.

/dev/null is a device that discards any data written to it.

Exercise 11 The date command outputs the current date. Redirect its output to a file.

Exercise 12 Append a new date to the file from the previous exercise.

Exercise 13 Try the cat /etc/motd /etc/shadow command. Redirect the standard error to a file.

Exercise 14 Try the find /var/spool/ command (find will be discussed later on). Redirect the standard error to the /dev/null.

Exercise 15 Redirect standard output of the find /var/spool/ command to one file and the standard error to another file.

Exercise 16 Redirect standard output and the standard error of the find /var/spool/ command to the same file.

Input redirections

command < filename

opens file filename for reading,
replaces standard input stream with the file.

command << delimiter (here documents)

before starting the command, shell creates a temporary file,
shell reads data line by line from its standard input and writes it to the temporary file,
until an input line contains only delimiter,
then, shell opens the temporary file for reading,
and replaces standard input stream with the temporary file.

command <<< string (here string) Warning: this is a Bash extension

creates a temporary file with string followed by a newline as the contents,
replaces standard input stream with the file.

Exercise 17 Create a file containing print("hello " + __file__). Run the python command redirecting input from the file.

Exercise 18 Use hexdump -C to display a hex dump of arbitrary text passed as a here document.
Include a multi-byte character in the text.

Exercise 19 bc is a simple calculator. Use it to calculate sqrt(2.0000).
Then use it to calculate sqrt(2.0000) in non-interactive mode.

Exercise 20 Use bc to calculate sqrt(2.0000) in non-interactive mode and redirect its output to a file.

The details

POSIX documentation on redirection
Bash documentation on redirection

Every redirection consists of: [file_number]operator word

The file_number defaults to 0 if the operator contains <, else it defaults to 1.
So command < file is the same as command 0< file, and command >> file is the same as command 1>> file.
Stream numbers from 0 to 9 are always safe to use. Consult documentation of your shell for other numbers.

The operator may be one of:

`<`	opens word for reading and replaces file_number with the file
`>`	if a file word exists and the noclobber option is set²⁾, then fails; else opens word for writing, truncates it and replaces file_number with the file
`>\|`	opens word for writing, truncates it and replaces file_number with the file (even if word exists)
`>>`	opens word for appending and replaces file_number with the file
`<>`	opens word for reading and writing and replaces file_number with the file
`<<`	1) creates a temporary file 2) reads an input line 3) if the line is word, go to step 6 4) if there are no quotes (a pair of `"` or `'`) in word, performs the expansion³⁾ on the line 5) writes the line to the temporary file 6) opens the temporary file for reading 7) replace file_number with the file The command is run once this is done
`<<-`	same as `<<`, but after step 2 adds a step: 2a) erase all leading tab characters (`\t`) warning: spaces are not erased
`<<<`	warning: this is a Bash extension 1) creates a temporary file 2) writes word to it 3) writes a newline to it 4) opens the temporary file for reading 5) replace file_number with the file The command is run once this is done
`<&`	if word is a number: duplicates a readable descriptor number word to the number file_number if word/ is `-`: closes a descriptor number file_number
`>&`	if word is a number: duplicates a writeable descriptor number word to the number file_number if word/ is `-`: closes a descriptor number file_number
`&>`	warning: this is a Bash extension warning: this does not allow providing file_number (in fact, `&` is the file number) opens word for writing, truncates it and replaces streams 1 and 2 with the file
`&>>`	warning: this is a Bash extension warning: this does not allow providing file_number (in fact, `&` is the file number) opens word for appending and replaces streams 1 and 2 with the file

Exercise 21 Run cat /etc/motd with closed standard output descriptor.
Run find /var/spool/ with closed standard error descriptor.

Exercise 22 Copy a large text file (e.g., /etc/services) to file.
Run hexdump redirecting input from file using the <> operator, and duplicate standard input to standard output. Check what happens then.
Warning: do not use <> twice with the same file for standard input and standard output (unless you dare to face the consequences).

Exercise 23 Swap standard input with standard output of the cat /etc/motd /etc/shadow command. Test if you did this correctly by adding |rev at the end (that will put standard output backwards).

Redirections of current shell standard stream

The exec command can be used to manipulate standard streams of the current shell.

exec 3>&1           # copies the current standard output to descriptor 3
exec 1>myFile       # replaces standard output with myFile
date                # writes  a date   to standard output (which is now myFile)
fortune             #   "    a fortune      "
exec 1>&3  3>&-     # restores the standard output from 3 and closes 3

Process substitution (Bash extension)

Bash documentation on process substitution

The syntax command1 <(command2) and command1 >(command2) are not redirects.

Bash replaces <(command2) with a name of a temporary file, starts command2 in background and sets its standard output to the file.

Bash replaces >(command2) with a name of a temporary file, starts command2 in background and sets its standard input to the file.

Exercise 24 Execute and understand the results of:

/bin/ls >(echo a) >(echo b) <(echo c)
stat >(echo c)
stat >(sleep 3; fortune)
cat <(echo a)
echo 'abc' > >(rev)
cat <(date) > >(rev)
cat < <(date) > >(rev)
cat <(date) <(date) <(date) > >(rev)
cat <(date) <(date) < <(date) > >(rev)

Pipes

First, recall: standard I/O functions read from the standard input, and output data to standard output.

Unix-like systems favoured for years ⁴⁾ programs that do one job well.
Complex tasks can be done easily by combing such programs.

For example: say you want to learn how many processes are owned by each user in the system.

You know that ps -ef lists all processes. So you can do ps -ef > ps_output.
But you need only the first column of the output – the username
For this, you can use the command cut --delimiter ' ' --field 1 < ps_output > cut_output to cut the first space-delimited field in each line.
There is a program that counts how many times a line repeats.
So let's sort < cut_output > sort_output first, so that repeating usernames are one after another.
Now you can use the command uniq --count < sort_output to leave only non-repeating (unique) lines together with their count.

Creating files with each intermediate result is usually a bad idea.

UNIX came up with the idea of connecting standard output of one program to standard input of another program.
Instead of the commands above, one can run ps -ef | cut --delimiter ' ' --field 1 | sort | uniq --count that does the same without creating any files on disk.
This is technically done using a special kind of file called pipe – the shell creates a pipe (in the main memory), replaces standard output of one program with the pipe, and replaces standard input of the other program with the same pipe.

To connect standard output of cmd_a with standard input of cmd_b one writes cmd_a | cmd_b.
This is called piping output of one program into another. The programs are run in parallel.
cmd_a | cmd_b is sometimes referred do as pipeline.

A common practice for programs following the Unix philosophy is to read from files specified by arguments or from standard input when no file is specified. Moreover, typically whenever - is encountered where a file name is required, standard input is used instead.

Using cmd_a | cmd_b creates what is called an anonymous pipe.
One can create a named pipe using a command mkfifo filename.
All that is written to a pipe is stored in the main memory (so it never occupies disk space, regardless if it is a named pipe) until some program reads it.

Exercise 25 Run echo '2+2*2' . Then, pipe it through bc.

Exercise 26 echo some text. Then, echo the text and pipe it through xxd.

Exercise 27 List files in your home directory.
List the files again, piping it through cat.
List the files yet another time, now piping it through cat -n.

Exercise 28 Pipe results of ps -eF through fold

Exercise 29 Create a named pipe p. Redirect input of fold from p in one terminal, and redirect output of ps -eF to p in another terminal.
Then repeat the commands, running theps before the fold.

Filters

There is a number of programs (stemming from UNIX) that are called collectively filters.
A filter is a program that processes input in an useful way and writes it to its output.

head, tail

The head and tail programs output a specified number of leading / trailing lines (or bytes).

By default head and tail output 10 lines.
By providing -n count / -c count, they output count lines / bytes

When head is given a number preceded by '-', e.g. head -n -10, then it outputs all except last 10 lines.
When tail is given a number preceded by '+', e.g. tail -n +10, then it outputs all lines starting from 10th.

Exercise 30 Run paste <(seq 15) <(seq 15 -1 1). Then, pipe its output through head and/or tail to see:

first three lines
last three lines
all lines but three last
all lines but three first
lines from 6 up to 9

The tail command accepts a switch -f / --follow.
tail -f … will first output as usual, and then wait and output any data appended to the file.

Exercise 31 Run seq 25 > file. Then run tail -f file in one terminal and append (with output redirection) some data to file.

grep

The grep regex [file]... program prints lines of the input files (or standard input) that match the regex.

When the switch -r / -R is specified, file can be a directory and grep will match regex recursively for all files within the directory.

The grep program accepts several regular expression grammars. POSIX specifies basic (default for grep) and extended regular expressions (selectable with egrep or grep -E ). See manual for your implementation of grep for more on the grammars.

By default regular expressions perform case-sensitive matching. To switch to case-insensitive mode, one can add -i switch to grep.

With the switch -v, grep outputs non-matching lines.

grep can output all matching lines (default), first match or only indicate whether a file matches.
It can also prepend matching lines with line number and/or file name (the latter being default whenever multiple input files are given).

Moreover, grep can output N lines before match (-B N), after match (-A N) or before and after (which is called context, hence -C N).

Exercise 32 Filter seq 75 with grep to see:

lines containing 5
lines ending with 5
lines ending with 5 or 0
line containing 33 and 3 lines before it
line containing 33 and 4 lines around it

Exercise 33 Display all lines containing 10 in files /etc/passwd and /etc/group.
List all files containing ecdsa in ~/.ssh.

cut

The cut program outputs only selected characters (-c spec) / bytes (-b spec) / fields (-f spec) in each line.
A filed is any number of characters separated by a single-character delimiter (-d delim, defaults to tab).

To specify fields/bytes/… one shall write range[,range]... where a range is num, or start-end, or start-, or -end with intuitive meaning. For instance, echo 123456789abcdef | cut -c -3,6,9-11,14- outputs 12369abef (colors added for clarity).

Exercise 34 Filter output of the mount command (or /etc/mtab file) to cut only the fifth (or third in case of /etc/mtab) space-separated field (that contains the filesystem type).

Exercise 35 Remove from the output of egrep '^[Ee]{2}' /usr/share/myspell/en_US.dic a slash and all that follows it.

sort

The sort program by default sorts lines in alphabetical order.

The options -k defines sort keys.

sort -k4 uses columns 4,5,6,7,8,…
sort -k4,4 uses only column 4
sort -k4,6 uses columns 4, 5 and 6
~~sort -k5,4~~ is invalid
sort -k5,5 -k4,4 uses columns 5 and 4

Sort keys can have options, e.g., -n sorts numerically and -r reverses sort direction.
The options can be used for all sort keys, or for selected sort keys only:

sort -r -k5,5 -k4,4 sorts both column 5 and column 4 descending
sort -k5,5r -k4,4 sorts column 5 descending and column 4 ascending
sort -k5,5 -k4,4r sorts column 5 ascending and column 4 descending

sort is not stable unless -s (--stable) switch is used.

The sort program has much more to offer. See manual for details.

Exercise 36 Create a file with input data for the next exercises by coying & pasting the following command in your shell:

make_random_data

paste \
  <(perl -e 'printf "%d\n", rand(10) for(1..20)') \
  <(perl -e 'print((K,Q,J)[rand(3)]."\n") for(1..20);') \
  <(perl -e 'printf "%d\n", rand(1500) for(1..20)') \
  <(perl -e 'my @a=("a","b","c"); print $a[rand(@a)] . $a[rand(@a)] ."\n" for(1..20);') \
  <(perl -e 'printf "%d\n", rand(1500) for(1..20)') \
  <(perl -e 'my @a=("x","y","z"); print $a[rand(@a)] . $a[rand(@a)] ."\n" for(1..20);') \
  <(seq -w 20) \
  > random_data

Exercise 37 Display the file. Then sort it.

Exercise 38 Sort the file ignoring first two columns. Sort the file numerically ignoring first two columns.

Exercise 39 Sort the file by the column with K/Q/J and the column with xyz characters (in that order).

Exercise 40 Sort the file by the second column without and with the --stable option.

Exercise 41 Sort the file by the second column (alphabetically) and by the third column (numerically).

wc, uniq, nl

The wc (word count) program counts lines, words and bytes.
When multiple files are provided as arguments, wc displays information on each file as well as a line with totals.
The options -l, -w and -c select lines, words and bytes.
The option -m counts all characters (including non-printable characters).
This matters for multi-byte characters: wc -mc <<< "‡∞♣" counts 10 bytes and 4 characters (the three visible and a newline).

The uniq program by default removes repeating lines. With switches, it can among others:

-c — prefix all lines with repetition count
-d — print only the repeating lines
-u — print only lines that do not repeat

nl numbers lines. It can also numer lines in text files organized into sections and pages.

Exercise 42 Pipe man wc through cat. Then pipe man wc through wc. How many words are there?

Exercise 43 See the results of wc /etc/motd /etc/SUSE-brand.

Exercise 44 The perl -e 'printf "%d\n", (int rand(6)+1)+(int rand(6)+1) for(1..100)' command rolls 100 times 2d6.
Pipe it through uniq to see rolls with same results in a row.
Then pipe it through sort and uniq so that you see how many times each result was hit.

tac, rev

tac outputs lines in reverse order.

rev outputs characters in each line in reverse order.

Exercise 45 See the result of echo -e '1 2 3\n4 5 6\n7 8 9' . Then pipe it through tac, and finally pipe it through rev.

tr, sed

The tr program replaces or deletes characters.

The command tr -d LIST deletes all characters that are in the LIST.

The command tr FROM TO translates each n-th character from the list FROM to n-th character in the list TO.
If TO is shorted than FROM, last character form TO is used instead.

With the switch -s, whenever consecutive characters translate to the same character x, only a single character x is output.

The switch -c translates all characters that are not in the FROM list.

The lists may contain character ranges (e.g., [0-9], [a-f]) and character classes (e.g., [:alnum:], [:space:]).

Exercise 46 Pipe ls -l through tr to:

replace all digits with a dash,
make all ASCII letters uppercase,
to squeeze all spaces
remove all the letters rwx

The sed (stream editor) program reads input (standard input or files) line by line and executes a user-provided script for transforming the line and by default outputs each line once the script has been fully executed for this line. sed is Turing-complete.
sed is commonly used for regex-based search & replace.
The most basic command for this is: sed 's/regexp/replacement/' .
sed is out of scope of this course.

awk

awk is another text processing language. Roughly, it also reads input line by line and executes a user-provided script. An awk script consists of rules, and each rule has a condition (that matches against line contents or selects start/end of a file or whole execution) and a set of instructions run when the condition is satisfied.
awk is out of scope of this course.

more, less

To display data that does not fit into terminal one can use one of many programs that are collectively called pagers.
A pager displays at a time as much text as it fits in the terminal and allows the user to go to a next portion of the text (typically by pressing a key such as space).

Most operating systems (as well as the POSIX standard) include a program called more as a rudimentary pager.

Unix-like systems usually come with a pager called less which is more than more.

less parses data through a program indicated by the $LESSOPEN environmental variable.
Such program usually outputs human-readable data upon detecting a known not human-readable file format.
lesspipe is the leading implementation of this feature.

less can be used with the following switches:

-S – don't wrap lines
-R – output escape sequences that encode colors as themselves rather than erasing them
-L – disable processing input by whatever $LESSOPEN indicates
-N – number lines

To display help from within less, type h. A choice of other useful key shortcuts:

space or PgDown / PgUp – go to the next / previous page
any integer – jump to this line
any integer followed by % – jump to this part of the document
g / G – jump to the beginning / end
/pattern / ?pattern — searches for pattern forwards/backwards
n / N – repeat search in the same / reverse direction
s – saves data to a file (useful when less reads from a pipe)
v – opens the file in default editor (available whenever less displays a file)
F – waits for more data to be appended to the file (like tail -f)

The man command usually uses less as the pager.

Exercise 47 Type man less to see manual page for less in less. Test the key shortcuts mentioned above.

Exercise 48 Open a PDF file (e.g., /usr/share/doc/packages/apparmor-docs/techdoc.pdf) with less, with and without -L option.
Open a tar archive with less (e.g., /usr/share/doc/packages/automake/amhello-1.0.tar.gz).
View a directory (e.g., /usr/include) with less.

tee

The tee [-a] file... command writes every byte read from standard input to standard output and to every file specified.
With the -a switch tee appends to the file instead of overwriting it.

tee is commonly used when one wants both to see and to record an output of a long-running command.

Exercise 49 tee the output of a tree to a file. View the file with less.

¹⁾ This notation corresponds to keys one would have to press to input the byte. To see all bytes (\t and \n is replaced by X), try:

perl -e'for(0..15){printf"\t%x_",$_};print"\n";for$l(0..15){printf"_%x",$l;for$h(0..15){$c=$h<<4|$l;$c=88 if $c==9||$c==10;printf("\t%c",$c)}print"\n"}'|cat -v

²⁾ The noclobber option is unset by default; use set -C to enable it.

³⁾ E.g., $VAR is substituted with its value, `date` is replaced by output of the date command etc.

⁴⁾ For some time the "big programs that do all at once and nobody fully comprehends them", such as systemd, are forced as default in distros.

Jan Kończak

Sidebar

Table of Contents

Outputting file / text / sequences

Printing file contents

Printing text

Generating number sequences

Standard streams

Redirections

The commonly used redirections

Output redirections

Input redirections

The details

Redirections of current shell standard stream

Process substitution (Bash extension)

Pipes

Filters

head, tail

grep

cut

sort

wc, uniq, nl

tac, rev

tr, sed

awk

more, less

tee

Jan Kończak

User Tools

Site Tools

Sidebar

Table of Contents

Outputting file / text / sequences

Printing file contents

Printing text

Generating number sequences

Standard streams

Redirections

The commonly used redirections

Output redirections

Input redirections

The details

Redirections of current shell standard stream

Process substitution (Bash extension)

Pipes

Filters

head, tail

grep

cut

sort

wc, uniq, nl

tac, rev

tr, sed

awk

more, less

tee

Page Tools