===== Copying, moving and erasing files ===== To create an empty file, you can use ''touch //file//''. To create file with some contents, you can use ''echo //contents// > //file//'' or ''fortune > //file//''. \\ The ''touch'' and ''echo'' commands as well as meaning of ''>'' will be explained later on. \\ The same commands can be used to change the contents of a file. To display a file, use ''cat //filename//''. ==== Directories ==== **''mkdir //dir//''** creates a new directory. \\ ''mkdir -p //dir1/dir2/dir3//'' creates directory ''dir1'', and directory ''dir2'' within ''dir1'', and ''dir3'' within ''dir1/dir2''. \\ With the ''-p'' switch ''mkdir'' does not print an error if a directory already exists. To remove an empty directory one can use ''rmdir //dir//''. ~~Exercise.#~~ Create the following directory tree: . ├── foo │   ├── baz │   │   └── bar │   └── bar └── foobaz     └── bar ~~Exercise.#~~ Remove ''foo/baz'' directory ==== Hard and soft links ==== Multiple directory entries can point to the same file. This is called a [[https://en.wikipedia.org/wiki/Hard_link|hard link]]. \\ A directory entry can also point to another directory entry. This is called a [[https://en.wikipedia.org/wiki/Symbolic_link|soft link or symbolic link]] (usually abbreviated as symlink). All Linux filesystems [[https://​en.wikipedia.org/wiki/Comparison_of_file_systems#File_capabilities|support]] hard links for ordinary files, and soft links to arbitrary path (e.g., a file or a directory). To create a hard link, one can use **''ln //source// //destination//''**.\\ To create a symlink, one can use **''ln -s //source// //destination//''**. ~~Exercise.#~~ Create a file ''file1'' with some contents. Create a hard link called ''file2'' of the file ''file1''. Modify ''file2''. Display ''file1''. ~~Exercise.#~~ The ''ls'' command can display link count for each listed file. Discover how to do it. The disk space used by a file is reclamied once link count drops to 0 (all directory entries that link to the file are erased) and the file is no longer open in any process. ~~Exercise.#~~ Create in your home directory a symlink called ''TMP'' pointing to ''/tmp''. Change directory to ''TMP''. What does ''pwd'' output? ~~Exercise.#~~ Create in your home directory a symlink called ''loop'' that points to your home directory. Enter it. And enter it again. ~~Exercise.#~~ Create a symlink to a non-existent path. List the directory containing it. The ''readlink [-f] //target//'' command resolves all symlinks and prints a [[https://en.wikipedia.org/wiki/Canonicalization|canonical]] path. The Windows NTFS filesystem supports links as well. Creating the links is possible e.g., with [[https://learn.microsoft.com/en-us/windows-server/administration/windows-commands/mklink|mklink]] command. ==== Copying and moving files ==== === Within local machine === To copy files, one can use the **''cp''** command. To move (or rename) files, one can use the **''mv''** command. The basic syntax is ''cp/mv //source//… //destination//''. \\ Multiple source files can be provided if the destination is a directory. \\ If the destination is a file, it will be overwritten without warning (unless ''-i'' or ''-n'' switch is used). By default ''cp'' will deny copying a directory. Use ''-r'' to copy a directory recursively. When ''cp'' copies a file, it creates a new file with the current date, default permissions, etc. \\ To copy recursively, and preserve dates, permissions and more, one can use the **''-a''** switch (that stands for ''--archive''). With the ''-l'' switch ''cp'' creates a hard link instead of copying a file. Notice that this can be combined with ''--recursive''. ~~Exercise.#~~ Copy, using a single command, the files ''/​etc/​os-release''​ and ''/​etc/​SUSE-brand''​ to the current working directory. ~~Exercise.#~~ Run ''mkdir someDir && for F in file{1..3}; do echo $RANDOM$RANDOM > someDir/$F; done'' to create ''someDir'' directory with three files inside. ~~Exercise.#~~ Copy the ''someDir'' directory recursively under a new name. ~~Exercise.#~~ Move the newly copied directory into the ''someDir'' directory. ~~Exercise.#~~ Rename ''someDir'' to a name of your choice. ~~Exercise.#~~ Copy the renamed directory with ''-al'' switches. Modify a selected file with any of the directories. Which files changed contents? \\ (You can ''cat //filename//'' or display modification dates with second accuracy using ''ls -l --time-style=+%H:%M:%S …'' or ''tree -D --timefmt=%H:%M:%S …'') === [extra] Copying files to/from a remote machine === It is possible to copy files via SSH. Whenever one has SSH access to a remote machine, one can copy files with **''scp''** command. \\ ''scp'' accepts ''//file//'' as file path on local machine and ''//user//@//host//://file//'' as a file path on a remote machine. Remote relative paths are relative to home directory.\\ ''scp'' accepts the ''-r'' switch for copying recursively. Microsoft Windows now ships with ''scp'' command, but usually [[https://en.wikipedia.org/wiki/Comparison_of_FTP_client_software#Protocol_support|various file commanders]] are more convenient. Most SSH servers enable also the SFTP protocol that allows copying files more conveniently.\\ **''sftp //user//@//host//''** launches sftp command line. You can use ''ls'' and ''cd'' to navigate the remote filesystem, and ''get'' and ''put'' to copy files. Type ''help'' to see all supported commands. ~~Exercise.#~~ Create a file in ''/tmp'' directory in your computer. Copy the file to home directory of user ''student'' on another computer. ~~Exercise.#~~ Copy the file to ''/tmp'' directory on another computer. ~~Exercise.#~~ Copy the file to ''/tmp'' directory on another computer using ''sftp''. === Rsync === The ''[[https://en.wikipedia.org/wiki/Rsync|rsync]]'' program is widely used to copy files and directories. It efficiently compares source files with destination files and copies only the differences. It can copy data to/from remote machines, and can compress the data sent via network to increase throughput. ''rsync'' is also commonly used to make backups. ==== Removing files ==== The program that removes files is called **''rm''**. ''rm'' by default won't remove directories (regardless if empty) and write-protected files. To remove a directory with ''rm'' (recursively with their contents) one has to add a ''-r'' switch. To remove write-protected files (and stop printing warnings whenever a file to be removed already does not exist) the switch ''-f'' (''--force'') can be used. A misused ''rm -rf …'' command is a notorious source of data loss. Beware especially of asterisk and what it expands to. ''rm'' accepts **''-I''** and ''-i'' switches that ask for confirmation. ''-I'' asks once upon an attempt to remove multiple files, while ''-i'' asks upon each file. ~~Exercise.#~~ Create multiple files with ''touch file_{a..z}''. \\ Remove all files with a single ''rm'' command with 1) no switches 2) ''-f'' switch 3) ''-I'' switch 4) ''-i'' switch. ~~Exercise.#~~ Repeat the remove command from the previous exercise when the files are gone. \\ Then repeat the command again with ''-f'' switch. ~~Exercise.#~~ Create a directory and remove it with ''rm''. ===== Sleeping, measuring and limiting time ===== The ''**sleep** //time//'' command sleeps for the given time period. \\ The time period is by default in seconds (''s''), furthermore ''m'', ''h'', ''d'' units can specified. \\ ''sleep 1d'' is equivalent to ''sleep 24h'', ''sleep 1440m'' and ''sleep 86400[s]''. \\ Some ''sleep'' implementations allow decimal fractions, e.g., ''sleep 0.05m'' sleeps 3s. The ''time //command// [//argument//]...'' command runs the provided command with arguments and prints the time it took to execute the command. The ''timeout //time// //command// [//argument//]...'' command runs the provided command and terminates it once the provided (real)time elapses. ~~Exercise.#~~ Run a command that sleeps for two seconds. ~~Exercise.#~~ Measure how long it took to execute the command in the previous example. ~~Exercise.#~~ Tell how long it takes to execute the ''openssl dhparam -text 1536'' command. ~~Exercise.#~~ Run the ''openssl dhparam -text 2048'' command with 5s run limit. ===== Processes ===== ==== Process vs program ==== A computer program is a sequence of instructions((usually completed by specification of the runtime environment, including the initial memory state and dynamically loaded libraries)). A process is the instance of a program — a particular execution of a program. A process is understood as all that describes its execution — not only the sequence of instructions loaded into the main memory, but among others also the state of the CPU registers and the state of the main memory. A separate processes is created when a program (the same executable) is started (run) anew. === Parents, children, zombies… === In Unix-like systems new processes can be started only by existing ones (with the exception of the first one, that always has the identifier 1, called [[https://en.wikipedia.org/wiki/Init|init]], which is run by kernel upon boot). The term **child processes** of a process //x// denotes processes that were started by //x//. \\ **Parent process** of a proces //x// is the process that started //x//. A process is identified by its __p__rocess __id__entifier — **pid**. \\ The operating system maintains for each process the __p__arent __p__rocess __id__entifier — **ppid**. When a process //x// terminates, the operating system remembers //x// until it is reaped by its parent — that is, until the parent reads the //x//'s return value. \\ Until the parent reaps //x//, it is called a defunct process or, more commonly, a [[https://en.wikipedia.org/wiki/Zombie_process|zombie]] process. The parent is delivered a CHLD signal whenever any child process changes its state, what includes process termination. When a process //a// starts a processs //b//, and //b// runs a process //c//, then once //b// terminates the process //c// becomes an [[https://en.wikipedia.org/wiki/Orphan_process|orphan process]] and is reparented to init. ==== Getting information on currently run processes ==== === List of processes === The **''ps''** command displays the list of processes and threads. \\ A choice of useful ''ps'' switches: * ''-e'' selects every process in the system * ''-l|-f|-F'' chooses long, full or extra full output format * ''-L|-T'' includes the threads((''-T'' stands from threads, and ''-L'' stems from [[https://en.wikipedia.org/wiki/Light-weight_process|light-weight_process]])) in results * ''-H'' shows process hierarchy (children names are listed below parents and are accordingly indented) The ''ps'' program has two allowed command syntaxes — one typical for UNIX programs, one with its roots in BSD. \\ The ubiquitous ''ps aux'' command is an example of ''ps'' run with BSD-style options that show all processes in a specific format. ~~Exercise.#~~ List processes with ''ps'' command (with no arguments). ~~Exercise.#~~ List processes selecting an output format with mode details (columns). ~~Exercise.#~~ List all processes. ~~Exercise.#~~ List all processes and threads. Find a process that has at least two threads. The list of processes can also be presented as a tree (hierarchy) with the **''pstree''** command. \\ Some useful ''pstree'' switches include ''-p'', ''-u'', and ''-a'' that extend results by, respectively, pids, user names, and program arguments. ~~Exercise.#~~ Display the processes hierarchy. ~~Exercise.#~~ Display the processes hierarchy so that both pids and executable names are displayed. Pseudo-filesystem ''procfs'', mounted by default at ''/proc'', represents detailed information on processes as files. \\ The ''/proc/self'' directory contains the information on calling process. \\ See ''man 5 proc'' for details. ~~Exercise.#~~ Print out the ''/proc/self/status'' file. Status of which process has been reported? ~~Exercise.#~~ Read through manual and scout the ''/proc/****//pid//****/'' directories. It's hard to tell how much memory a process consumes.\\ The system can sum up sizes of all virtual memory ranges defined for the process (//virtual set size//). But the system often does not assign physical main memory to a the virtual range until the process accesses it. \\ The system can tell which ranges are present in the main memory (//resident set size//). But this excludes swap, and even disregarding swap, if some memory is shared by several processes (it may, for instance, contain a library for drawing windows), then sum of RSS for the processes will exceed memory use by the processes. \\ In Linux, having sufficient permissions, one can tell which memory is private for a process (//unique set size//), and for each shared memory region add its size divided by the number of processes that share it, getting as a result a quite sane memory consumption metric (//proportional set size//). \\ But this abstracts from swap and [[https://en.wikipedia.org/wiki/Zram|compression]]… There is a ''smem'' program that attempts to present memory usage in a meaningful way. Check ''-p'' and ''-t'' options. === [extra] Live process viewers === The **''top''** program is a commonly available, load&processes live viewer. ~~Exercise.#~~ Run ''top''. Press ''h'' to see built-in help. Sort by CPU use, then change sort column to memory. ~~Exercise.#~~ Learn what do the process state ''R'', ''S'', ''D'', ''t'' and ''Z'' stand for by consulting ''man ps''. The **''[[https://htop.dev/|htop]]''** program has been created as a convenient ''top'' replacement with more features. \\ ''htop'' presents basic key shortcuts in the bottom bar. ''h'' displays help, ''t'' toggles tree view, ''<'' and ''>'' choose sort column, ''k'' sends a signal to the proces, ''a'' sets CPU affinity, ''i'' sets I/O priority, ''/'' searches and ''\'' filters by name, ''u'' filters by user, //space// selects, //tab// switches between memory+CPU usage view and disk usage view. \\ ''htop'' supports mouse and is quite customizable. ~~Exercise.#~~ Run ''htop'' and sort by a selected column. ~~Exercise.#~~ Run ''htop'' and switch view to tree view. Then find ''htop'' itself in the process list. There are numerous ''top''-inspired programs. These include ''iotop'' (for I/O), ''atop'' and ''glances''. === Looking up process identifiers === The **''pgrep''** and **''pidof''** commands return pids of processes that match user-defined criteria. By default ''pgrep //regex//'' matches regex against the executable name. \\ With the ''-f'' switch the full command line (executable name and arguments) is matched against regex. \\ See manual for ''pgrep'' to learn how to e.g., filter by user or output ppids instead of pids. The ''pidof //name//'' command outputs pids of processes whose executable name (trimmed to 15 characters) is identical to the provided name. ~~Exercise.#~~ Run ''sleep 1h''. Look up its pid in another terminal. ~~Exercise.#~~ Run ''sleep 1337h'' in one terminal and ''sleep 42h'' in another. Look up the pid of the first one. ~~Exercise.#~~ Tell the pid of ''gdm'', taking under consideration that there are other processes whose names start from ''gdm''. ==== Signals ==== [[https://en.wikipedia.org/wiki/Signal_(IPC)|Signals]] is what plays a similar role for processes as interrupts do for the operating system — they notify a process about an external event by forcefully switching the control flow of one of the processes threads to a signal handler. \\ The programmer can choose to block or ignore specified signals and may set up one's own function as the signal handler. Signals are generated by the operating system in well-defined situations, and can also be sent to a process from userspace provided the sender has sufficient permissions((One has permissions to send signals to one's own processes.)). Signals are told apart by their numbers and corresponding names (cf. ''man 7 signal''). \\ For convenience, "the //X// signal" is written as SIG//X//. For instance, "the TERM signal" is shortened to SIGTERM. \\ Signals are used to notify about errors, to notify about external events, and to communicate processes. The well-defined signals include the following signals: * INT — request to __int__errupt process execution, generated among others by //Ctrl+c//, * TERM — request to __term__inate a process, * KILL — request to __kill__ a process; this is one of two signals that are handled by the operating system and cannot be re-implemented by a programmer, * STOP — request to __stop__ (pause) a process; the other non-overrideable signal, \\ CONT — request to __cont__inue a stopped process, * ALRM — information that a specified time has just elapsed, generated once the program asked for this signal (using the __ala__r__m__ system call), * SEGV — notifies that the process attempted to access an address outside its memory, * USR1 — signal with no predefined meaning, to be used for an programmer-defined purpose, * HUP — generated upon a __h__ang-__up__ (shutting down a [pseudo]terminal) — that is, when terminal emulator is closed or a SSH session terminates; shells pass this signal to children, the default HUP handler terminates the process; ''nohup //command// [//argument//]...'' runs the command after reconfiguring the process to ignore the signal; \\ many server programs redefine this signal to reload upon it configuration and/or log files. === Sending signals === To send a signal one can use the ''**kill** [-//signal//] //pid//'' command. If no signal is specified, ''kill'' sends a TERM signal. \\ Signals can be either specified as a number, e.g., ''kill -3 //pid//'', or its name, e.g., ''kill -INT //pid//''. \\ Names and corresponding numbers can be displayed with ''kill -l'' or ''kill -L''. \\ The ''kill'' command allows to specify a process only by passing its pid. \\ Negative pid values have a special menaning; see ''man kill'' for details. ~~Exercise.#~~ In one terminal run ''sleep 1h''. Kill the process from another terminal (by sending it either INT, or TERM or KILL signal). The ''**pkill**'' command functions almost the same as ''pgrep'' — the difference is that it sends a signal to matching processes rather than outputting their pids. Signals in ''pkill'' are specified as for ''kill''. \\ For instance, ''pkill -TERM bash'' attempts to request all processes whose name contains //bash// to terminate. The ''**killall** //name//'' command sends a signal (specified as in ''kill'') to all processes whose name (trimmed to 15 characters) is the same as the given one. \\ Switches allow comparing against full executable name, and using regular expressions. ~~Exercise.#~~ In one terminal run ''sleep 1h''. Kill the process with ''pkill'' from another terminal. ~~Exercise.#~~ Run ''mousepad''. Send it a STOP signal. Check how is mousepad doing. Then send it a CONT signal. ~~Exercise.#~~ In one terminal run a DNS server with the ''named -g -c <(:)'' command. From another terminal send a HUP signal to ''named''. How did it react? ==== [extra] Controlling process scheduling ==== The duty of the operating system is to decide which process is given resources to execute. Users may influence the decisions of the schedulers ([[https://en.wikipedia.org/wiki/Scheduling_(computing)#Process_scheduler|[1]]], [[https://en.wikipedia.org/wiki/I/O_scheduling|[2]]]). === CPU time and affinity === Processes in Unix-like systems have [[https://en.wikipedia.org/wiki/Nice_(Unix)|niceness]] that affects the share of CPU time assigned to a process on resource contention. \\ The niceness may be set in range from -20 (highest priority) up to 19 (lowest), and is inherited from the parent. \\ The init process starts all processes with the niceness of 0. \\ Any user can raise the niceness, but only root can lower it. By issuing ''**nice** [-n //N//] //command// [//argument//]...'' one can run the command with the niceness of N. It is possible to change the niceness of a process by issuing ''**renice** [-n] //N// //pid//''. Some Linux distros enable ''autogroup'' feature that automatically assigns processes to separate process groups (cf. ''man 7 sched''). Niceness is considered only within such group. \\ ''sysctl -ar sched_autogroup_enabled'' checks if the feature is in force. It is possible to select which physical CPU threads are allowed to execute a process. This is referred to as the [[https://en.wikipedia.org/wiki/Processor_affinity|processor affinity]]. \\ The **''taskset''** program can be used to this end. In Linux the affinity is inherited by child processes. ''taskset 0x03 //command// [//argument//]...'', or equivalent ''taskset -c 0,1 //command// [//argument//]...'' starts the command so that it may only use physical thread 0 i 1. \\ The ''taskset -p [-c] //pid//'' command outputs a mask (or list) of allowed physical threads for the provided pid, and ''taskset -p [-c] //spec// //pid//'' changes it to //spec//. ''htop'' can alter niceness (shortcuts //F7//''''/''''//F8//) and affinity (shortcut //a//). ~~Exercise.#~~ Run ''sleep 1h'' with the niceness of 10. Verify the nicneness (using e.g., ''ps'' or ''htop''). ~~Exercise.#~~ Run ''sleep 1h'' and change the niceness to 15. ~~Exercise.#~~ In two terminals run ''taskset -pc 0 $$'' (that lets this shell and its children use only first physical thread). \\ Then, in both terminals, run ''openssl dhparam -text %%$((2**14))%% [//arbitrary text//]'' (this starts some CPU-intensive computations). \\ In yet another terminal run ''htop'' and keep observing CPU usage by the programs. \\ Gradually increase niceness of one of the processes. How does the CPU usage change? === I/O === The I/O priorities (this refers predominantly to disk access priorities) can be set / altered with the **''ionice''** command. \\ There are "three" I/O classes: 1 (realtime), 2 (best-effort) i 3 (idle) and 0 (none). \\ The first two classes are split into 8 priorities (ranging for the highest of value 0 up to the lowest of value 7). \\ Switches ''[-c //class_id//] [-n //priority//]'' select class and priority. \\ Similarly to ''taskset'', one may run a command with specified class, or, by specifying the ''-p'' switch, set / alter the I/O classes and priorities for the specified pid. ''htop'' allows setting I/O classes and priorities (shortcut //i//). === NUMA === Hardware that has resources (mainly the main memory) to which the access cost (time or bandwidth) is different from different physical CPU threads ([[https://en.wikipedia.org/wiki/Non-uniform_memory_access|NUMA]]) is logically split by the operating system into nodes. A node is a group of resources such that the access cost is identical among all resources within the node, and the costs for accessing resources from another node is always higher. \\ An example of a typical NUMA system is a two-socket mainboard. Memory modules attached to the other socket have higher latency and lower bandwidth than memory modules connected to this core's socket. The **''numactl''** command policies running processes in such systems. \\ ''numactl -H'' displays the nodes (and the relative access costs), and ''numactl -N //x// -m //y// //command// [//argument//]...'' runs the command on physical threads of node //x// and memory of node //y//. ~~META: language = en ~~