To create an empty file, you can use touch file
.
To create file with some contents, you can use echo contents > file
or fortune > file
.
The touch
and echo
commands as well as meaning of >
will be explained later on.
The same commands can be used to change the contents of a file.
To display a file, use cat filename
.
mkdir dir
creates a new directory.
mkdir -p dir1/dir2/dir3
creates directory dir1
, and directory dir2
within dir1
, and dir3
within dir1/dir2
.
With the -p
switch mkdir
does not print an error if a directory already exists.
To remove an empty directory one can use rmdir dir
.
Exercise 1 Create the following directory tree:
. ├── foo │ ├── baz │ │ └── bar │ └── bar └── foobaz └── bar
Exercise 2 Remove foo/baz
directory
Multiple directory entries can point to the same file. This is called a hard link.
A directory entry can also point to another directory entry. This is called a soft link or symbolic link (usually abbreviated as symlink).
All Linux filesystems support hard links for ordinary files, and soft links to arbitrary path (e.g., a file or a directory).
To create a hard link, one can use ln source destination
.
To create a symlink, one can use ln -s source destination
.
Exercise 3 Create a file file1
with some contents. Create a hard link called file2
of the file file1
. Modify file2
. Display file1
.
Exercise 4 The ls
command can display link count for each listed file. Discover how to do it.
The disk space used by a file is reclamied once link count drops to 0 (all directory entries that link to the file are erased) and the file is no longer open in any process.
Exercise 5 Create in your home directory a symlink called TMP
pointing to /tmp
. Change directory to TMP
. What does pwd
output?
Exercise 6 Create in your home directory a symlink called loop
that points to your home directory. Enter it. And enter it again.
Exercise 7 Create a symlink to a non-existent path. List the directory containing it.
The commands readlink [-f] target
and realpath -e target
resolve all
symlinks and print a canonical path.
The Windows NTFS filesystem supports links as well. Creating the links is possible e.g., with mklink command.
To copy files, one can use the cp
command. To move (or rename) files, one can use the mv
command.
The basic syntax is cp/mv source… destination
.
Multiple source files can be provided if the destination is a directory.
If the destination is a file, it will be overwritten without warning (unless -i
or -n
switch is used).
By default cp
will deny copying a directory. Use -r
to copy a directory recursively.
When cp
copies a file, it creates a new file with the current date, default permissions, etc.
To copy recursively, and preserve dates, permissions and more, one can use the -a
switch (that stands for --archive
).
With the -l
switch cp
creates a hard link instead of copying a file. Notice that this can be combined with --recursive
.
Exercise 8 Copy, using a single command, the files /etc/os-release
and /etc/SUSE-brand
to the current working directory.
Exercise 9 Run mkdir someDir && for F in file{1..3}; do echo $RANDOM$RANDOM > someDir/$F; done
to create someDir
directory with three files inside.
Exercise 10 Copy the someDir
directory recursively under a new name.
Exercise 11 Move the newly copied directory into the someDir
directory.
Exercise 12 Rename someDir
to a name of your choice.
Exercise 13 Copy the renamed directory with -al
switches. Modify a selected file with any of the directories. Which files changed contents?
(You can cat filename
or display modification dates with second accuracy using ls -l --time-style=+%H:%M:%S …
or tree -D --timefmt=%H:%M:%S …
)
It is possible to copy files via SSH. Whenever one has SSH access to a remote machine, one can copy files with scp
command.
scp
accepts file
as file path on local machine and user@host:file
as a file path on a remote machine. Remote relative paths are relative to home directory.
scp
accepts the -r
switch for copying recursively.
Microsoft Windows now ships with scp
command, but usually various file commanders are more convenient.
Most SSH servers enable also the SFTP protocol that allows copying files more conveniently.
sftp user@host
launches sftp command line. You can use ls
and cd
to navigate the remote filesystem, and get
and put
to copy files. Type help
to see all supported commands.
Exercise 14 Create a file in /tmp
directory in your computer. Copy the file to home directory of user student
on another computer.
Exercise 15 Copy the file to /tmp
directory on another computer.
Exercise 16 Copy the file to /tmp
directory on another computer using sftp
.
The rsync
program is widely used to copy files and directories. It efficiently compares source files with destination files and copies only the differences. It can copy data to/from remote machines, and can compress the data sent via network to increase throughput. rsync
is also commonly used to make backups.
The program that removes files is called rm
.
rm
by default won't remove directories (regardless if empty) and write-protected files.
To remove a directory with rm
(recursively with their contents) one has to add a -r
switch.
To remove write-protected files (and stop printing warnings whenever a file to be removed already does not exist) the switch -f
(--force
) can be used.
A misused rm -rf …
command is a notorious source of data loss. Beware especially of asterisk and what it expands to.
rm
accepts -I
and -i
switches that ask for confirmation. -I
asks once upon an attempt to remove multiple files, while -i
asks upon each file.
Exercise 17 Create multiple files with touch file_{a..z}
.
Remove all files with a single rm
command with 1) no switches 2) -f
switch 3) -I
switch 4) -i
switch.
Exercise 18 Repeat the remove command from the previous exercise when the files are gone.
Then repeat the command again with -f
switch.
Exercise 19 Create a directory and remove it with rm
.
The sleep time
command sleeps for the given time period.
The time period is by default in seconds (s
), furthermore m
, h
, d
units can specified.
sleep 1d
is equivalent to sleep 24h
, sleep 1440m
and sleep 86400[s]
.
Some sleep
implementations allow decimal fractions, e.g., sleep 0.05m
sleeps 3s.
The time command [argument]...
command runs the provided command with arguments and prints the time it took to execute the command.
The timeout time command [argument]...
command runs the provided command and terminates it once the provided (real)time elapses.
Exercise 20 Run a command that sleeps for two seconds.
Exercise 21 Measure how long it took to execute the command in the previous example.
Exercise 22 Tell how long it takes to execute the openssl dhparam -text 1536
command.
Exercise 23 Run the openssl dhparam -text 2048
command with 5s run limit.
A computer program is a sequence of instructions1). A process is the instance of a program — a particular execution of a program.
A process is understood as all that describes its execution — not only the sequence of instructions loaded into the main memory, but among others also the state of the CPU registers and the state of the main memory.
A separate processes is created when a program (the same executable) is started (run) anew.
In Unix-like systems new processes can be started only by existing ones (with the exception of the first one, that always has the identifier 1, called init, which is run by kernel upon boot).
The term child processes of a process x denotes processes that were started by x.
Parent process of a proces x is the process that started x.
A process is identified by its process identifier — pid.
The operating system maintains for each process the parent process identifier — ppid.
When a process x terminates, the operating system remembers x until it
is reaped by its parent — that is, until the parent reads the x's return value.
Until the parent reaps x, it is called a defunct process or, more commonly, a zombie process.
The parent is delivered a CHLD signal whenever any child process changes its state, what includes process termination.
When a process a starts a processs b, and b runs a process c, then once b terminates the process c becomes an orphan process and is reparented to init.
The ps
command displays the list of processes and threads.
A choice of useful ps
switches:
-e
selects every process in the system-l|-f|-F
chooses long, full or extra full output format-L|-T
includes the threads2) in results-H
shows process hierarchy (children names are listed below parents and are accordingly indented)
The ps
program has two allowed command syntaxes — one typical for UNIX programs, one with its roots in BSD.
The ubiquitous ps aux
command is an example of ps
run with BSD-style options
that show all processes in a specific format.
Exercise 24 List processes with ps
command (with no arguments).
Exercise 25 List processes selecting an output format with mode details (columns).
Exercise 26 List all processes.
Exercise 27 List all processes and threads. Find a process that has at least two threads.
The list of processes can also be presented as a tree (hierarchy) with the pstree
command.
Some useful pstree
switches include -p
, -u
, and -a
that extend results by, respectively, pids, user names, and program arguments.
Exercise 28 Display the processes hierarchy.
Exercise 29 Display the processes hierarchy so that both pids and executable names are displayed.
Pseudo-filesystem procfs
, mounted by default at /proc
, represents
detailed information on processes as files.
The /proc/self
directory contains the information on calling process.
See man 5 proc
for details.
Exercise 30 Print out the /proc/self/status
file. Status of which process has been reported?
Exercise 31 Read through manual and scout the /proc/pid/
directories.
It's hard to tell how much memory a process consumes.
The system can sum up sizes of all virtual memory ranges defined for the process
(virtual set size). But the system often does not assign physical main memory
to a the virtual range until the process accesses it.
The system can tell which ranges are present in the main memory (resident set size).
But this excludes swap, and even disregarding swap, if some memory is shared by
several processes (it may, for instance, contain a library for drawing windows),
then sum of RSS for the processes will exceed memory use by the processes.
In Linux, having sufficient permissions, one can tell which memory is private for
a process (unique set size), and for each shared memory region add its size
divided by the number of processes that share it, getting as a result a quite sane
memory consumption metric (proportional set size).
But this abstracts from swap and compression…
There is a smem
program that attempts to present memory usage in a meaningful
way. Check -p
and -t
options.
The top
program is a commonly available, load&processes live viewer.
Exercise 32 Run top
. Press h
to see built-in help.
Sort by CPU use, then change sort column to memory.
Exercise 33 Learn what do the process state R
, S
, D
, t
and Z
stand for by consulting man ps
.
The htop
program has been created as a convenient
top
replacement with more features.
htop
presents basic key shortcuts in the bottom bar.
h
displays help,
t
toggles tree view,
<
and >
choose sort column,
k
sends a signal to the proces,
a
sets CPU affinity,
i
sets I/O priority,
/
searches and \
filters by name,
u
filters by user,
space selects,
tab switches between memory+CPU usage view and disk usage view.
htop
supports mouse and is quite customizable.
Exercise 34 Run htop
and sort by a selected column.
Exercise 35 Run htop
and switch view to tree view.
Then find htop
itself in the process list.
There are numerous top
-inspired programs. These include iotop
(for I/O),
atop
and glances
.
The pgrep
and pidof
commands return pids of
processes that match user-defined criteria.
By default pgrep regex
matches regex against the executable name.
With the -f
switch the full command line (executable name and arguments) is matched against regex.
See manual for pgrep
to learn how to e.g., filter by user or output ppids instead of pids.
The pidof name
command outputs pids of processes whose executable name (trimmed to 15 characters)
is identical to the provided name.
Exercise 36 Run sleep 1h
. Look up its pid in another terminal.
Exercise 37 Run sleep 1337h
in one terminal and sleep 42h
in another. Look up the pid of the first one.
Exercise 38 Tell the pid of gdm
, taking under consideration that there are other processes whose names start from gdm
.
Signals is what plays a similar
role for processes as interrupts do for the operating system — they notify
a process about an external event by forcefully switching the control flow of
one of the process's threads to a signal handler.
The programmer can choose to block or ignore specified signals and may set up
one's own function as the signal handler.
Signals are generated by the operating system in well-defined situations, and can also be sent to a process from userspace provided the sender has sufficient permissions3).
Signals are told apart by their numbers and corresponding names (cf. man 7 signal
).
For convenience, "the X signal" is written as SIGX. For instance, "the TERM signal" is shortened to SIGTERM.
Signals are used to notify about errors, to notify about external events, and to
communicate processes.
The well-defined signals include the following signals:
nohup command [argument]...
runs the command after reconfiguring the process to ignore the signal;
To send a signal one can use the kill [-signal] pid
command.
If no signal is specified, kill
sends a TERM signal.
Signals can be either specified as a number, e.g., kill -3 pid
, or its name, e.g., kill -INT pid
.
Names and corresponding numbers can be displayed with kill -l
or kill -L
.
The kill
command allows to specify a process only by passing its pid.
Negative pid values have a special menaning; see man kill
for details.
Exercise 39 In one terminal run sleep 1h
. Kill the process from another terminal (by sending it either INT, or TERM or KILL signal).
The pkill
command functions almost the same as pgrep
— the difference
is that it sends a signal to matching processes rather than outputting their pids.
Signals in pkill
are specified as for kill
.
For instance, pkill -TERM bash
attempts to request all processes whose name
contains bash to terminate.
The killall name
command sends a signal (specified as in kill
)
to all processes whose name (trimmed to 15 characters) is the same as the given
one.
Switches allow comparing against full executable name, and using regular
expressions.
Exercise 40 In one terminal run sleep 1h
. Kill the process with pkill
from another terminal.
Exercise 41 Run mousepad
. Send it a STOP signal. Check how is mousepad doing. Then send it a CONT signal.
Exercise 42 In one terminal run a DNS server with the named -g -c <(:)
command. From another terminal send a HUP signal to named
. How did it react?
The duty of the operating system is to decide which process is given resources to execute. Users may influence the decisions of the schedulers ([1], [2]).
Processes in Unix-like systems have niceness
that affects the share of CPU time assigned to a process on resource contention.
The niceness may be set in range from -20 (highest priority) up to 19 (lowest), and is inherited from the parent.
The init process starts all processes with the niceness of 0.
Any user can raise the niceness, but only root can lower it.
By issuing nice [-n N] command [argument]...
one can run the command with the niceness of N.
It is possible to change the niceness of a process by issuing renice [-n] N pid
.
Some Linux distros enable autogroup
feature that automatically assigns
processes to separate process groups (cf. man 7 sched
). Niceness is
considered only within such group.
sysctl -ar sched_autogroup_enabled
checks if the feature is in force.
It is possible to select which physical CPU threads are allowed to execute a process.
This is referred to as the processor affinity.
The taskset
program can be used to this end. In Linux the affinity is inherited by child processes.
taskset 0x03 command [argument]...
, or equivalent
taskset -c 0,1 command [argument]...
starts the command so that
it may only use physical thread 0 i 1.
The taskset -p [-c] pid
command outputs a mask (or list) of allowed
physical threads for the provided pid, and taskset -p [-c] spec pid
changes it to spec.
htop
can alter niceness (shortcuts F7/
F8) and affinity (shortcut a).
Exercise 43 Run sleep 1h
with the niceness of 10. Verify the nicneness (using e.g., ps
or htop
).
Exercise 44 Run sleep 1h
and change the niceness to 15.
Exercise 45 In two terminals run taskset -pc 0 $$
(that lets this shell and its children use only first physical thread).
Then, in both terminals, run openssl dhparam -text $((2**13))
(this starts some CPU-intensive computations).
In yet another terminal run htop
and keep observing CPU usage by the programs.
Gradually increase niceness of one of the processes. How does the CPU usage change?
The I/O priorities (this refers predominantly to disk access priorities) can be
set / altered with the ionice
command.
There are "three" I/O classes: 1 (realtime), 2 (best-effort) i 3 (idle) and 0 (none).
The first two classes are split into 8 priorities
(ranging for the highest of value 0 up to the lowest of value 7).
Switches [-c class_id] [-n priority]
select class and priority.
Similarly to taskset
, one may run a command with specified class, or,
by specifying the -p
switch, set / alter the I/O classes and priorities for
the specified pid.
htop
allows setting I/O classes and priorities (shortcut i).
Hardware that has resources (mainly the main memory) to which the access cost
(time or bandwidth) is different from different physical CPU threads
(NUMA)
is logically split by the operating system into nodes. A node is a group of
resources such that the access cost is identical among all resources within
the node, and the costs for accessing resources from another node is always
higher.
An example of a typical NUMA system is a two-socket mainboard. Memory modules
attached to the other socket have higher latency and lower bandwidth than
memory modules connected to this core's socket.
The numactl
command policies running processes in such systems.
numactl -H
displays the nodes (and the relative access costs), and
numactl -N x -m y command [argument]...
runs the command
on physical threads of node x and memory of node y.