Jan Kończak

POSIX

POSIX – the Portable Operating System Interface standard – apart from standardising shell and utilities, it also defines functions, macros, and external variables to support applications portability at the C-language source level.

POSIX specifies what should be present in the C standard libraries (e.g., stdio.h or stdlib.h), and also specifies user-level API to the operating system (those two do intersect).

Libraries required by POSIX are cleanly summarised at C POSIX library Wikipedia page.

The unistd.h library defines a number of basic constants and functions for interfacing the operating system.
The fcntl.h library defines constants and functions for file control.

Data types

POSIX defines certain data types that are usually fancy names for certain integer types.

For instance, ssize_t shall be used to store signed size of arbitrary data, pid_t shall be used to store process identifiers, uid_t shall be used for user identifiers, time_t shall be used for storing numbers of seconds etc.

Programmers shall use these types, even if they know that time_t (as well as ssize_t) is in fact a long int in the POSIX implementation they use.
This contributes to code portability and legibility.

Type system of C/C++ checks types after resolving typedefs, therefore variables of such 'types' won't even generate compiler warnings when used interchangeably.

Return value conventions

The majority of POSIX functions upon unsuccessful execution return -1 and set the errno variable accordingly to the failure reason.

To access the errno (error number) variable, one shall #include <errno.h>.
errno is a part of C standard (since 1989).
Possible values of errno after executing some function are explained in the function's documentation.
All standard values of errno are documented here.

To get a human-readable error explanation of the number errnum, one can use:

char * strerror(int errnum) defined by C standard, might not be thread-safe,
strerror_r that requires the programmer to provide a buffer for the message, and strerror_l that allows to specify locale(=language); both are defined by POSIX and are thread-safe,
void perror(const char *str) a C standard function that always uses errno to obtain errnum and prints str: explanation to standard error (or just explanation if str is NULL).

File descriptors

A program can (obviously) use multiple files concurrently. Upon each I/O operation (read, write, …) the programmer must indicate which file should it involve. To this end, the POSIX API assigns non-negative integers to each file used by the program. Such number is called a file descriptor.
Notice that Unix introduced the idea that everything is a file, and for POSIX a file might be just an ordinary file, but a directory is also a file, a pipe is also a file, a network connection is also a file, and so on.

Internally, the operating system maintains for each process an array of open files. The file descriptor is an index in this array.
You can learn more here.

POSIX assumes that each newly started program has already three files open. These files are called the standard streams and occupy first three indices in the file array, that is the numbers 0, 1 and 2. or, more verbosely, the equivalent fancy constants STDIN_FILENO, STDOUT_FILENO and STDERR_FILENO.

Reading & writing data

To read or write data, POSIX defines:
Needs header:
unistd.h ssize_t read (int fildes, void *buf, size_t nbyte)
ssize_t write(int fildes, const void *buf, size_t nbyte)

fildes is the file descriptor. That is, fildes is a number that indicates which file should be read/written.

buf is the location in the memory where the data to be written is read from, or where the data read from file should be written to.

nbyte tells how many bytes shall be read/written.
Notice that buf must point to sufficient space.

The functions return number of bytes successfully read/written (unless an error occurred).
Both read and write may return less bytes than they were ordered to read/write.
When the files are ordinary files, this usually means that either (upon read) the file has ended, or (upon write) that the disk is full.

The thread executing read/write blocks if the file is in the (default) blocking mode until the operation completes.
Two concurrent I/O function calls are guaranteed to execute atomically.

Reading/writing advances the position in the file.
In some files (among others in ordinary files), one may change the position within the file with:
Needs header: unistd.h off_t lseek(int fd, off_t offset, int whence);
In this function, fd selects the file which position should be changed, whence chooses if the new position is given relative to beginning of the file, current position or the end of the file by, respectively, providing SEEK_SET, SEEK_CUR and SEEK_END, and finally offset chooses the offset from the chosen whence.

An attempt to read a file when the position is at (or beyond) the end of file returns 0.
You must not confuse the constant 0 indicating the end of file in POSIX with a C file API constant called ~~EOF~~ that has the value of -1.

An example of a basic use of read/write on standard streams:

#include <errno.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>
int main() {
  const char prompt[] = "Tell me your name: ";
  write(STDOUT_FILENO, prompt, strlen(prompt));
  char response[64];
  int readBytes = read(STDIN_FILENO, response, 64);
  if (readBytes <= 0) {
    char msg[256];
    int len = snprintf(msg, 256, "\nCould not read your name: %s\n",
                       readBytes == -1 ? strerror(errno) : "EOF reached");
    write(STDERR_FILENO, msg, len);
    return 1;
  }
  write(1, "Hello ", 6);
  write(1, response, readBytes);
  return 0;
}

Exercise 1 Read standard input until the end of file, and write the read data to standard output. Test this both by using Ctrl+d to indicate end of file and by redirecting a file to the standard input.

Exercise 2 Read standard input until the end of file, and write the read data to standard output. When you reach end of file, use lseek to set position in the file to its beginning (that is, 0 bytes from SEEK_SET) and repeat reading&writing. Test this by redirecting some file as standard input.

Exercise 3 Replace the standard output with a file descriptor of value 4. Test whether the program works if you tell the shell to open file number 4 for your program by doing a 4>file redirection.

Creating, opening and closing files

To create or open a file, POSIX defines:
Needs header:
fcntl.h int open(const char *pathname, int flags)
int open(const char *pathname, int flags, mode_t mode)
and a creat function that is a shorthand to open(pathname, O_WRONLY|O_CREAT|O_TRUNC, mode).

pathname is a path.

mode is used only if open creates a new file, and it defines its permissions. Either use octal number, or use symbolic constants described in the manual.

flags is a rat's nest.
flags must contain exactly one of the following flags: O_RDONLY, O_WRONLY, or O_RDWR that choose whether file is opened for reading, writing or both.
flags may additionally contain other flags, including the following:

O_APPEND sets file position to the end of a file before every write
O_TRUNC (shall be used only in conjunction with O_WRONLY or O_RDWR) truncates the file (sets its size to 0)
O_CREAT tells open that if the file does not exist, it should be created
O_EXCL (shall be used only in conjunction with O_CREAT) makes open fail when the file exists
and at least a dozen of other flags

For example:

int fd1 = open("/tmp/foo", O_RDONLY);
if (fd1 == -1) perror("Opening /tmp/foo for reading failed");
 
int fd2 = open("/tmp/baz", O_WRONLY|O_APPEND);
if (fd2 == -1) perror("Opening /tmp/baz for appending (write-only) failed");
 
int fd3 = open("/tmp/bar", O_RDWR|O_CREAT|O_EXCL, 0600);
if (fd3 == -1) perror("Creating a new file /tmp/bar failed");
// if open succeeds, the file is open for reading and writing and has permissions of 0600

To close a file, POSIX defines:
Needs header:unistd.h int close(int filedes)
that closes a file number filedes.
On Linux, invoking close(fd) always closes fd, even if close returns -1.

Exercise 4 Open a file with hardcoded filename, read its contents and write it to standard output.

Exercise 5 Open a file specified as the first argument of your program, read its contents and write it to standard output.

Exercise 6 Open a file specified as the first argument of your program, read its contents and write it to standard output with line numbers (just like cat -n file).
memchr looks up a character (e.g., \n) in memory.

Exercise 7 Implement a program that checks if two files have the same contents.
memcmp compares two memory areas.

Exercise 8 Implement a program that works as paste [file]....
Hint: the dirty solution reads single character a time.

Signals

Disclaimer: these materials contain the very basics of signals. For comprehensive informations, see the POSIX standard.

Sending signals

To send a signal, one can use the function Needs header: signal.h int kill(pid_t pid, int sig)
This function works just like the shell kill utility.

Setting up signal handlers

To handle a signal delivered to the process, one can replace the default signal handler with a custom function.
The function must return void and take an int argument, that is its prototype has to be as follows:
void your_function_name(int the_number_of_the_signal_delivered);

The following paragraph explains C syntax for function pointers.
Notice that in C, to represent the type of (pointer to) such function, one has to write void (*)(int).
To take (a pointer to) such function as a parameter (or declare a variable of this type), one has to write:
void (*some_name)(int)
So a function some_function_1 that takes a void (*)(int) function as an argument and returns, say, float, looks like this:
float some_function_1(void (*some_name)(int))
And a function some_function_2(…) that returns a function of void (*)(int) type is written as:
void (*some_function_2(…))(int)

An example code that uses function pointers

int foo(char a) {return a;}
int baz(char a) {return a;}
 
int (* getFoo     (void)  )(char) {return foo;}
int (* getFooOrBaz(long w))(char) {if(w) return foo; return baz;}
 
int (*savedFunc)(char); // a variable that stores (a pointer to) a function
int (*getFunc())(char){ // a function returning (a pointer to) a function
    return savedFunc;
}
 
void setFunc(int (*arg)(char)) {savedFunc = arg;}
int (*getOldSetNewFunc(int (*newFunc)(char)))(char) {
    int (*oldFunc)(char) = savedFunc;
    savedFunc = newFunc;
    return oldFunc;
}
 
void doSomething(){
    setFunc(&foo);  // both lines are correct, a function name is an implicit
    setFunc(baz);   // pointer to the function
 
    int result1 = savedFunc('a'); // calls the function
 
    int (*x)(char) = getFooOrBaz(0); // saves (a pointer to) a function returned by getFooOrBaz
 
    int result2 = x('a'); // calls the x (that points to baz) with the argument 'a'
 
    // getFooOrBaz(1) returns a function foo, then foo('a') is called and returns the result
    int result3 = getFooOrBaz(1)('a');
 
    // passes 'foo' (= result of getFooOrBaz) to getOldSetNewFunc
    // and calls the resut of getOldSetNewFunc ('baz') with argument 'a'
    int result4 = getOldSetNewFunc(getFooOrBaz(1))('a');
}

C++ has a MUCH better support for functional programming.

To replace a signal handler, one can use the following function:
Needs header: signal.h void (*signal(int sig, void (*func)(int)))(int)

That definition looks ugly, but hey, it's even a part of the C standard.
There's a way to make it look better: let's define a type that refers to functions taking an int and returning nothing:
typedef void (*sighandler_t)(int);
Now, the same signal function looks like this:
sighandler_t signal(int signum, sighandler_t func);
(This way of writing signal function is preferred in the Linux manual.)

The signal function replaces the handler for signal signum with the function func. A single function can be used for multiple signals, hence when the function is called, the signal number is passed as an argument.

See the following example:

#include <signal.h>                           
#include <stdio.h>                            
#include <unistd.h>                           
                                                                 // A fprintf in a signal handler
void handleSignals(int num) {                                    // is an undefined behaviour,
  if (num == SIGINT)                                             // hence this program is not
    fprintf(stderr, "You probably pressed Ctrl+c\n");            // correct. It's still probably
  else if (num == SIGTERM)                                       // going to work as expected in
    fprintf(stderr, "You probably run 'kill %d'\n", getpid());   // in this simple case.
}
 
int main() {
  signal(SIGINT,  handleSignals);
  signal(SIGTERM, handleSignals);
  while (1)
    pause();
}

There are two special values for signal handlers: SIG_DFL and SIG_IGN. The former resets the signal handler to the default handler, and the latter ignores the signal. For instance, when one wants the program to ignore the HUP signal, one can simply write:
signal(SIGHUP, SIG_IGNORE);

Signal handlers

How they work

When a signal is delivered to a process by the operating system, the operating system stops one of the threads of the process, modifies the stack so that it looks as if the previous instruction called the signal handler, and sets the program counter (the CPU register that tells which instruction should be executed next) to the first instruction of the signal handler.
This means that a signal handler can be executed in any, possibly completely inconvenient, moment.
When the signal handler returns, the control flow returns to the place where it's been interrupted by the signal.
By default, interrupted system calls restart – for instance, when a signal is received while waiting in read for input, then after returning from the signal handler the read is continued.

Restrictions

Since a signal handler can be called anytime, possibly in the middle of executing a complex action, only a subset of functions can be safely called from within the signal handler. Such functions are marked as async-signal-safe functions.
Linux manual and POSIX standard provide a list of async-signal-safe functions.

Even worse, "anytime" means even in the middle of an assignment – did you know that an innocently looking i = 0; is split into two machine instructions if i is a long long int and you run this code on a 32-bit machine?
Thus, naturally, there are restrictions on accessing variables declared outside the signal handler from the handler code.
It is only guaranteed that variables of the volatile sig_atomic_t type (and lock-free atomic variables) can be safely accessed.
One may also use errno provided one saves it beforehand and restores its previous value after use.

These restrictions are not enforced anyhow. If you don't follow these rules, you end up in the broad field of undefined behaviour.

[Extra]

To control actions done by the operating system on signal delivery more accurately, one can use the sigaction function.
The sigaction can also be used to set up a signal handler that gets more information on the incoming signal.
Delivery of signals can be blocked (and later unblocked) e.g., by calling sigprocmask.
Signal handlers can have their own list of blocked signals (to prevent nesting signals).
There is a number of functions that wait until a signal occurs, e.g., sigwait, sigsuspend, pause.
kill delivers a signal to an arbitrary thread of a given process.
raise delivers a signal to the current thread. pthread_kill delivers a signal to a specified thread of the current process.
While there is no POSIX function that delivers signal to a specified thread of another process, there is a Linux-specific tgkill function that does that.

Exercises

Exercise 9 Write a program that prints Shutting down... and exits when it receives SIGINT (Ctrl+c).
The program should either sleep (sleep), or wait for an input (read / getchar), or explicitly wait for a signal (pause).

Exercise 10 Write a program that on getting a USR1 signal:
• opens a file called signal_log,
• appends current timestamp to it,
• closes the file.
Use to following code to write the current timestamp:

#include <time.h>
#include <unistd.h>
 
void writeDateTo(int fd) {
  struct timespec now;
  char buf[22];
  clock_gettime(CLOCK_REALTIME, &now);
  buf[20] = '\n';
  for (int i = 0; i < 10; ++i, now.tv_nsec /= 10)
    buf[19 - i] = '0' + now.tv_nsec % 10;
  buf[10] = '.';
  for (int i = 0; i < 10; ++i, now.tv_sec /= 10)
    buf[9 - i] = '0' + now.tv_sec % 10;
  write(fd, buf, 21);
}

Jan Kończak

Sidebar

Table of Contents

POSIX

Data types

Return value conventions

File descriptors

Reading & writing data

Creating, opening and closing files

Signals

Sending signals

Setting up signal handlers

Signal handlers

How they work

Restrictions

[Extra]

Exercises

Jan Kończak

User Tools

Site Tools

Sidebar

Table of Contents

POSIX

Data types

Return value conventions

File descriptors

Reading & writing data

Creating, opening and closing files

Signals

Sending signals

Setting up signal handlers

Signal handlers

How they work

Restrictions

[Extra]

Exercises

Page Tools