os_cp:files_processes

Copying, moving and erasing files
Sleeping, measuring and limiting time
Processes

Copying, moving and erasing files

To create an empty file, you can use touch file. To create file with some contents, you can use echo contents > file or fortune > file.
The touch and echo commands as well as meaning of > will be explained later on.
The same commands can be used to change the contents of a file.

To display a file, use cat filename.

Directories

mkdir dir creates a new directory.
mkdir -p dir1/dir2/dir3 creates directory dir1, and directory dir2 within dir1, and dir3 within dir1/dir2.
With the -p switch mkdir does not print an error if a directory already exists.

To remove an empty directory one can use rmdir dir.

Exercise 1 Create the following directory tree:

  .
  ├── foo
  │   ├── baz
  │   │   └── bar
  │   └── bar
  └── foobaz
      └── bar

Exercise 2 Remove foo/baz directory

Hard and soft links

Multiple directory entries can point to the same file. This is called a hard link.
A directory entry can also point to another directory entry. This is called a soft link or symbolic link (usually abbreviated as symlink).

All Linux filesystems support hard links for ordinary files, and soft links to arbitrary path (e.g., a file or a directory).

To create a hard link, one can use ln source destination.
To create a symlink, one can use ln -s source destination.

Exercise 3 Create a file file1 with some contents. Create a hard link called file2 of the file file1. Modify file2. Display file1.

Exercise 4 The ls command can display link count for each listed file. Discover how to do it.

The disk space used by a file is reclamied once link count drops to 0 (all directory entries that link to the file are erased) and the file is no longer open in any process.

Exercise 5 Create in your home directory a symlink called TMP pointing to /tmp. Change directory to TMP. What does pwd output?

Exercise 6 Create in your home directory a symlink called loop that points to your home directory. Enter it. And enter it again.

Exercise 7 Create a symlink to a non-existent path. List the directory containing it.

The commands readlink [-f] target and realpath -e target resolve all symlinks and print a canonical path.

The Windows NTFS filesystem supports links as well. Creating the links is possible e.g., with mklink command.

Copying and moving files

Within local machine

To copy files, one can use the cp command. To move (or rename) files, one can use the mv command.

The basic syntax is cp/mv source… destination.
Multiple source files can be provided if the destination is a directory.
If the destination is a file, it will be overwritten without warning (unless -i or -n switch is used).

By default cp will deny copying a directory. Use -r to copy a directory recursively.

When cp copies a file, it creates a new file with the current date, default permissions, etc.
To copy recursively, and preserve dates, permissions and more, one can use the -a switch (that stands for --archive).

With the -l switch cp creates a hard link instead of copying a file. Notice that this can be combined with --recursive.

Exercise 8 Copy, using a single command, the files /etc/os-release and /etc/SUSE-brand to the current working directory.

Exercise 9 Run mkdir someDir && for F in file{1..3}; do echo $RANDOM$RANDOM > someDir/$F; done to create someDir directory with three files inside.

Exercise 10 Copy the someDir directory recursively under a new name.

Exercise 11 Move the newly copied directory into the someDir directory.

Exercise 12 Rename someDir to a name of your choice.

Exercise 13 Copy the renamed directory with -al switches. Modify a selected file with any of the directories. Which files changed contents?
(You can cat filename or display modification dates with second accuracy using ls -l --time-style=+%H:%M:%S … or tree -D --timefmt=%H:%M:%S …)

[extra] Copying files to/from a remote machine

It is possible to copy files via SSH. Whenever one has SSH access to a remote machine, one can copy files with scp command.
scp accepts file as file path on local machine and user@host:file as a file path on a remote machine. Remote relative paths are relative to home directory.
scp accepts the -r switch for copying recursively.

Microsoft Windows now ships with scp command, but usually various file commanders are more convenient.

Most SSH servers enable also the SFTP protocol that allows copying files more conveniently.
sftp user@host launches sftp command line. You can use ls and cd to navigate the remote filesystem, and get and put to copy files. Type help to see all supported commands.

Exercise 14 Create a file in /tmp directory in your computer. Copy the file to home directory of user student on another computer.

Exercise 15 Copy the file to /tmp directory on another computer.

Exercise 16 Copy the file to /tmp directory on another computer using sftp.

[extra] Rsync

The rsync program is widely used to copy files and directories. It efficiently compares source files with destination files and copies only the differences. It can copy data to/from remote machines, and can compress the data sent via network to increase throughput. rsync is also commonly used to make backups.

Removing files

The program that removes files is called rm.

rm by default won't remove directories (regardless if empty) and write-protected files.

To remove a directory with rm (recursively with their contents) one has to add a -r switch.

To remove write-protected files (and stop printing warnings whenever a file to be removed already does not exist) the switch -f (--force) can be used.

A misused rm -rf … command is a notorious source of data loss. Beware especially of asterisk and what it expands to.

rm accepts -I and -i switches that ask for confirmation. -I asks once upon an attempt to remove multiple files, while -i asks upon each file.

Exercise 17 Create multiple files with touch file_{a..z}.
Remove all files with a single rm command with 1) no switches 2) -f switch 3) -I switch 4) -i switch.

Exercise 18 Repeat the remove command from the previous exercise when the files are gone.
Then repeat the command again with -f switch.

Exercise 19 Create a directory and remove it with rm.

Sleeping, measuring and limiting time

The sleep time command sleeps for the given time period.
The time period is by default in seconds (s), furthermore m, h, d units can specified.
sleep 1d is equivalent to sleep 24h, sleep 1440m and sleep 86400[s].
Some sleep implementations allow decimal fractions, e.g., sleep 0.05m sleeps 3s.

The time command [argument]... command runs the provided command with arguments and prints the time it took to execute the command.

The timeout time command [argument]... command runs the provided command and terminates it once the provided (real)time elapses.

Exercise 20 Run a command that sleeps for two seconds.

Exercise 21 Measure how long it took to execute the command in the previous example.

Exercise 22 Tell how long it takes to execute the openssl dhparam -text 1536 command.

Exercise 23 Run the openssl dhparam -text 2048 command with 5s run limit.

Processes

Process vs program

A computer program is a sequence of instructions¹⁾. A process is the instance of a program — a particular execution of a program.

A process is understood as all that describes its execution — not only the sequence of instructions loaded into the main memory, but among others also the state of the CPU registers and the state of the main memory.

A separate processes is created when a program (the same executable) is started (run) anew.

Parents, children, zombies…

In Unix-like systems new processes can be started only by existing ones (with the exception of the first one, that always has the identifier 1, called init, which is run by kernel upon boot).

The term child processes of a process x denotes processes that were started by x.
Parent process of a proces x is the process that started x.

A process is identified by its process identifier — pid.
The operating system maintains for each process the parent process identifier — ppid.

When a process x terminates, the operating system remembers x until it is reaped by its parent — that is, until the parent reads the x's return value.
Until the parent reaps x, it is called a defunct process or, more commonly, a zombie process.

The parent is delivered a CHLD signal whenever any child process changes its state, what includes process termination.

When a process a starts a processs b, and b runs a process c, then once b terminates the process c becomes an orphan process and is reparented to init.

Getting information on currently running processes

List of processes

The ps command displays the list of processes and threads.
A choice of useful ps switches:

-e selects every process in the system
-l|-f|-F chooses long, full or extra full output format
-L|-T includes the threads²⁾ in results
-H shows process hierarchy (children names are listed below parents and are accordingly indented)

The ps program has two allowed command syntaxes — one typical for UNIX programs, one with its roots in BSD.
The ubiquitous ps aux command is an example of ps run with BSD-style options that show all processes in a specific format.

Exercise 24 List processes with ps command (with no arguments).

Exercise 25 List processes selecting an output format with mode details (columns).

Exercise 26 List all processes.

Exercise 27 List all processes and threads. Find a process that has at least two threads.

The list of processes can also be presented as a tree (hierarchy) with the pstree command.
Some useful pstree switches include -p, -u, and -a that extend results by, respectively, pids, user names, and program arguments.

Exercise 28 Display the processes hierarchy.

Exercise 29 Display the processes hierarchy so that both pids and executable names are displayed.

Pseudo-filesystem procfs, mounted by default at /proc, represents detailed information on processes as files.
The /proc/self directory contains the information on calling process.
See man 5 proc for details.

Exercise 30 Print out the /proc/self/status file. Status of which process has been reported?

Exercise 31 Read through manual and scout the /proc/pid/ directories.

It's hard to tell how much memory a process consumes.
The system can sum up sizes of all virtual memory ranges defined for the process (virtual set size). But the system often does not assign physical main memory to a the virtual range until the process accesses it.
The system can tell which ranges are present in the main memory (resident set size). But this excludes swap, and even disregarding swap, if some memory is shared by several processes (it may, for instance, contain a library for drawing windows), then sum of RSS for the processes will exceed memory use by the processes.
In Linux, having sufficient permissions, one can tell which memory is private for a process (unique set size), and for each shared memory region add its size divided by the number of processes that share it, getting as a result a quite sane memory consumption metric (proportional set size).
But this abstracts from swap and compression…

There is a smem program that attempts to present memory usage in a meaningful way. Check -p and -t options.

[extra] Live process viewers

The top program is a commonly available, load&processes live viewer.

Exercise 32 Run top. Press h to see built-in help. Sort by CPU use, then change sort column to memory.

Exercise 33 Learn what do the process state R, S, D, t and Z stand for by consulting man ps.

The htop program has been created as a convenient top replacement with more features.
htop presents basic key shortcuts in the bottom bar. h displays help, t toggles tree view, < and > choose sort column, k sends a signal to the proces, a sets CPU affinity, i sets I/O priority, / searches and \ filters by name, u filters by user, space selects, tab switches between memory+CPU usage view and disk usage view.
htop supports mouse and is quite customizable.

Exercise 34 Run htop and sort by a selected column.

Exercise 35 Run htop and switch view to tree view. Then find htop itself in the process list.

There are numerous top-inspired programs. These include iotop (for I/O), atop and glances.

Looking up process identifiers

The pgrep and pidof commands return pids of processes that match user-defined criteria.

By default pgrep regex matches regex against the executable name.
With the -f switch the full command line (executable name and arguments) is matched against regex.
See manual for pgrep to learn how to e.g., filter by user or output ppids instead of pids.

The pidof name command outputs pids of processes whose executable name (trimmed to 15 characters) is identical to the provided name.

Exercise 36 Run sleep 1h. Look up its pid in another terminal.

Exercise 37 Run sleep 1337h in one terminal and sleep 42h in another. Look up the pid of the first one.

Exercise 38 Tell the pid of gdm, taking under consideration that there are other processes whose names start from gdm.

Signals

Signals is what plays a similar role for processes as interrupts do for the operating system — they notify a process about an external event by forcefully switching the control flow of one of the process's threads to a signal handler.
The programmer can choose to block or ignore specified signals and may set up one's own function as the signal handler.

Signals are generated by the operating system in well-defined situations, and can also be sent to a process from userspace provided the sender has sufficient permissions³⁾.

Signals are told apart by their numbers and corresponding names (cf. man 7 signal).
For convenience, "the X signal" is written as SIGX. For instance, "the TERM signal" is shortened to SIGTERM.
Signals are used to notify about errors, to notify about external events, and to communicate processes.

The well-defined signals include the following signals:

INT — request to interrupt process execution, generated among others by Ctrl+c,
TERM — request to terminate a process,
KILL — request to kill a process; this is one of two signals that are handled by the operating system and cannot be re-implemented by a programmer,
STOP — request to stop (pause) a process; the other non-overrideable signal,
CONT — request to continue a stopped process,
ALRM — information that a specified time has just elapsed, generated once the program asked for this signal (using the alarm system call),
SEGV — notifies that the process attempted to access an address outside its memory,
USR1 — signal with no predefined meaning, to be used for an programmer-defined purpose,
HUP — generated upon a hang-up (shutting down a [pseudo]terminal) — that is, when terminal emulator is closed or a SSH session terminates; shells pass this signal to children, the default HUP handler terminates the process; nohup command [argument]... runs the command after reconfiguring the process to ignore the signal;
many server programs redefine this signal to reload upon it configuration and/or log files.

Sending signals

To send a signal one can use the kill [-signal] pid command. If no signal is specified, kill sends a TERM signal.
Signals can be either specified as a number, e.g., kill -3 pid, or its name, e.g., kill -INT pid.
Names and corresponding numbers can be displayed with kill -l or kill -L.
The kill command allows to specify a process only by passing its pid.
Negative pid values have a special menaning; see man kill for details.

Exercise 39 In one terminal run sleep 1h. Kill the process from another terminal (by sending it either INT, or TERM or KILL signal).

The pkill command functions almost the same as pgrep — the difference is that it sends a signal to matching processes rather than outputting their pids. Signals in pkill are specified as for kill.
For instance, pkill -TERM bash attempts to request all processes whose name contains bash to terminate.

The killall name command sends a signal (specified as in kill) to all processes whose name (trimmed to 15 characters) is the same as the given one.
Switches allow comparing against full executable name, and using regular expressions.

Exercise 40 In one terminal run sleep 1h. Kill the process with pkill from another terminal.

Exercise 41 Run mousepad. Send it a STOP signal. Check how is mousepad doing. Then send it a CONT signal.

Exercise 42 In one terminal run a DNS server with the named -g -c <(:) command. From another terminal send a HUP signal to named. How did it react?

[extra] Controlling process scheduling

The duty of the operating system is to decide which process is given resources to execute. Users may influence the decisions of the schedulers ([1], [2]).

CPU time and affinity

Processes in Unix-like systems have niceness that affects the share of CPU time assigned to a process on resource contention.
The niceness may be set in range from -20 (highest priority) up to 19 (lowest), and is inherited from the parent.
The init process starts all processes with the niceness of 0.
Any user can raise the niceness, but only root can lower it.

By issuing nice [-n N] command [argument]... one can run the command with the niceness of N.

It is possible to change the niceness of a process by issuing renice [-n] N pid.

Some Linux distros enable autogroup feature that automatically assigns processes to separate process groups (cf. man 7 sched). Niceness is considered only within such group.
sysctl -ar sched_autogroup_enabled checks if the feature is in force.

It is possible to select which physical CPU threads are allowed to execute a process. This is referred to as the processor affinity.
The taskset program can be used to this end. In Linux the affinity is inherited by child processes.

taskset 0x03 command [argument]..., or equivalent taskset -c 0,1 command [argument]... starts the command so that it may only use physical thread 0 i 1.
The taskset -p [-c] pid command outputs a mask (or list) of allowed physical threads for the provided pid, and taskset -p [-c] spec pid changes it to spec.

htop can alter niceness (shortcuts F7/F8) and affinity (shortcut a).

Exercise 43 Run sleep 1h with the niceness of 10. Verify the nicneness (using e.g., ps or htop).

Exercise 44 Run sleep 1h and change the niceness to 15.

Exercise 45 In two terminals run taskset -pc 0 $$ (that lets this shell and its children use only first physical thread).
Then, in both terminals, run openssl dhparam -text $((2**13)) (this starts some CPU-intensive computations).
In yet another terminal run htop and keep observing CPU usage by the programs.
Gradually increase niceness of one of the processes. How does the CPU usage change?

I/O

The I/O priorities (this refers predominantly to disk access priorities) can be set / altered with the ionice command.
There are "three" I/O classes: 1 (realtime), 2 (best-effort) i 3 (idle) and 0 (none).
The first two classes are split into 8 priorities (ranging for the highest of value 0 up to the lowest of value 7).
Switches [-c class_id] [-n priority] select class and priority.
Similarly to taskset, one may run a command with specified class, or, by specifying the -p switch, set / alter the I/O classes and priorities for the specified pid.

htop allows setting I/O classes and priorities (shortcut i).

NUMA

Hardware that has resources (mainly the main memory) to which the access cost (time or bandwidth) is different from different physical CPU threads (NUMA) is logically split by the operating system into nodes. A node is a group of resources such that the access cost is identical among all resources within the node, and the costs for accessing resources from another node is always higher.
An example of a typical NUMA system is a two-socket mainboard. Memory modules attached to the other socket have higher latency and lower bandwidth than memory modules connected to this core's socket.

The numactl command policies running processes in such systems.
numactl -H displays the nodes (and the relative access costs), and numactl -N x -m y command [argument]... runs the command on physical threads of node x and memory of node y.

¹⁾ usually completed by specification of the runtime environment, including the initial memory state and dynamically loaded libraries

²⁾ -T stands from threads, and -L stems from light-weight_process

³⁾ One has permissions to send signals to one's own processes.

Table of Contents