User Tools

Site Tools


Sidebar

os_cp:pipes_signals

Blocking vs non-blocking operations

Most POSIX functions complete in a limited number of steps.

But when the user invokes certain functions, then in well-defined circumstances it makes sense to wait until a particular thing happens.

For instance, when a php -S 0:8080 2>&1 | tee -a log command is executed in a shell, then it is desired that when the tee program attempts to read from the output of php program, then the read function waits until the php wrote something.

When a function stops not because it is not given the CPU time but because it waits for something to happen, then it blocks.
Functions that may block are called blocking. (Cf. definition of blocking in POSIX standard.)

Blocking may take indefinite time. When a blocking function is used, the programmer must always account that a call to the function may stop the thread that invoked it for an arbitrary time.

There is usually a way to invoke blocking functions in a non-blocking mode.
When a blocking functions is used in non-blocking mode, then it either does what it's supposed to do without waiting, or it returns -1 and sets errno to EWOULDBLOCK or EAGAIN.
When one uses non-blocking mode, one must handle the case when the function failed to do (a part of) what it was supposed to do.

For functions related to file descriptors, the blocking / non-blocking mode is selected by a O_NONBLOCK flag for a file descriptor.
To set/clear the O_NONBLOCK flag, one shall first read the flags with int flags = fcntl(fd, F_GETFL);, then set/clear the flag (e.g., flags |= O_NONBLOCK;) and finally set the new flags with fcntl(fd, F_SETFL, flags);.

Pipes & FIFOs

A pipe is an unidirectional communication channel – a pair of file descriptors such that any data written to the second descriptor can be read from the first descriptor. A pipe is created with the following function:
Needs header: unistd.h int pipe(int fildes[2])
The fildes[0] is opened for reading, and the fildes[1] is opened for writing.
Pipes can be used to send data from one process to another process, or from one thread to another thread of the same process1).

By default, pipes are blocking, that is reading data from a pipe will stall the thread that invoked read until some data is written to a pipe. Also, writing data to a pipe will block when sufficiently many bytes were already written and are not yet read from the pipe.
In non-blocking mode the write function may write only a part of the data.

When all file descriptors that allowed writing to a pipe are closed, a read from the pipe will return 0.
When all file descriptors that allowed reading from a pipe are closed, a write to the pipe will first raise SIGPIPE, then return -1 and set errno to EPIPE (provided the process did not terminate upon SIGPIPE).

To share a pipe between two processes, one must create a pipe in one process and fork – file descriptors are copied upon forking.
A FIFO file (or a named pipe) is a special file that allows opening either end of a pipe by providing a path to the file. A FIFO file can be created with mkfifo shell utility or by the mkfifo function.
A call to open on a path to a FIFO file is blocking. open returns only once at least one process invoked open with O_RDONLY and at least one process invoked open with O_WRONLY2). From that point on the file descriptors act as those of an (anonymous) pipe.

Pipe is unidirectional. The unix socket is its bidirectional equivalent. See man 7 unix for details.

Exercise 1 Write a program that:
    • creates a pipe
    • forks
    • in the child process:
          · calculates a computationally expensive mathematical equation (say, "2+2")
          · writes the result to the pipe
          · terminates
    • in the parent process:
          · reads from pipe
          · writes result to the standard output

Exercise 2 Write a program that:
    • creates a pipe
    • creates three child processes
    • in each child process:
          · calculates a computationally expensive mathematical equation (say, "2+2")
          · writes the result to the pipe as a four-byte integer
          · terminates
    • in the parent process:
          · calculates a computationally expensive mathematical equation (say, "2+2")
          · reads from the pipe all three results (and calls wait() to reap defunct children)
          · writes the sum of all four results to the standard output

Exercise 3 Write a program that prints the result of ls -l in uppercase.
You may do this e.g., by: pipe, fork, in one process: dup2 and exec, in the other process: reading from pipe and changing case.
The toupper function (available from ctype.h) converts a single character to upper case.

Exercise 4 Write a program that does ps -eF | sort -nk6.

Signals

Disclaimer: these materials contain the very basics of signals. For comprehensive informations, see the POSIX standard.

Sending signals

To send a signal, one can use the function Needs header: signal.h int kill(pid_t pid, int sig)
This function works just like the shell kill utility.

Setting up signal handlers

To handle a signal delivered to the process, one can replace the default signal handler with a custom function.
The function must return void and take an int argument, that is its prototype has to be as follows:
void your_function_name(int the_number_of_the_signal_delivered);

The following paragraph explains C syntax for function pointers.
Notice that in C, to represent the type of (pointer to) such function, one has to write void (*)(int).
To take (a pointer to) such function as a parameter (or declare a variable of this type), one has to write:
void (*some_name)(int)
So a function some_function_1 that takes a void (*)(int) function as an argument and returns, say, float, looks like this:
float some_function_1(void (*some_name)(int))
And a function some_function_2(…) that returns a function of void (*)(int) type is written as:
void (*some_function_2(…))(int)

An example code that uses function pointers

C++ has a MUCH better support for functional programming.

To replace a signal handler, one can use the following function:
Needs header: signal.h void (*signal(int sig, void (*func)(int)))(int)
That definition looks ugly, but hey, it's even a part of the C standard.
There's a way to make it look better: let's define a type that refers to functions taking an int and returning nothing:
typedef void (*sighandler_t)(int);
Now, the same signal function looks like this:
sighandler_t signal(int signum, sighandler_t func);
(This way of writing signal function is preferred in the Linux manual.)

The signal function replaces the handler for signal signum with the function func. A single function can be used for multiple signals, hence when the function is called, the signal number is passed as an argument.

See the following example:

#include <signal.h>                           
#include <stdio.h>                            
#include <unistd.h>                           
                                                                 // A fprintf in a signal handler
void handleSignals(int num) {                                    // is an undefined behaviour,
  if (num == SIGINT)                                             // hence this program is not
    fprintf(stderr, "You probably pressed Ctrl+c\n");            // correct. It's still probably
  else if (num == SIGTERM)                                       // going to work as expected in
    fprintf(stderr, "You probably run 'kill %d'\n", getpid());   // in this simple case.
}
 
int main() {
  signal(SIGINT,  handleSignals);
  signal(SIGTERM, handleSignals);
  while (1)
    pause();
}

There are two special values for signal handlers: SIG_DFL and SIG_IGN. The former resets the signal handler to the default handler, and the latter ignores the signal. For instance, when one wants the program to ignore the HUP signal, one can simply write:
signal(SIGHUP, SIG_IGNORE);

Signal handlers

How they work

When a signal is delivered to a process by the operating system, the operating system stops one of the threads of the process, modifies the stack so that it looks as if the previous instruction called the signal handler, and sets the program counter (the CPU register that tells which instruction should be executed next) to the first instruction of the signal handler.
This means that a signal handler can be executed in any, possibly completely inconvenient, moment.
When the signal handler returns, the control flow returns to the place where it's been interrupted by the signal.
By default, interrupted system calls restart – for instance, when a signal is received while waiting in read for input, then after returning from the signal handler the read is continued.

Restrictions

Since a signal handler can be called anytime, possibly in the middle of executing a complex action, only a subset of functions can be safely called from within the signal handler. Such functions are marked as async-signal-safe functions.
Linux manual and POSIX standard provide a list of async-signal-safe functions.

Even worse, "anytime" means even in the middle of an assignment – did you know that an innocently looking i = 0; is split into two machine instructions if i is a long long int and you run this code on a 32-bit machine?
Thus, naturally, there are restrictions on accessing variables declared outside the signal handler from the handler code.
It is only guaranteed that variables of the volatile sig_atomic_t type (and lock-free atomic variables) can be safely accessed.
One may also use errno provided one saves it beforehand and restores its previous value after use.

These restrictions are not enforced anyhow. If you don't follow these rules, you end up in the broad field of undefined behaviour.

[Extra]

To control actions done by the operating system on signal delivery more accurately, one can use the sigaction function.
The sigaction can also be used to set up a signal handler that gets more information on the incoming signal.
Delivery of signals can be blocked (and later unblocked) e.g., by calling sigprocmask.
Signal handlers can have their own list of blocked signals (to prevent nesting signals).
There is a number of functions that wait until a signal occurs, e.g., sigwait, sigsuspend, pause.
kill delivers a signal to an arbitrary thread of a given process.
raise delivers a signal to the current thread. pthread_kill delivers a signal to a specified thread of the current process.
While there is no POSIX function that delivers signal to a specified thread of another process, there is a Linux-specific tgkill function that does that.

Exercises

Exercise 5 Write a program that prints Shutting down... and exits when it receives SIGINT (Ctrl+c).
The program should either sleep (sleep), or wait for an input (read / getchar), or explicitly wait for a signal (pause).

Exercise 6 Write a program that on getting a USR1 signal:
    • opens a file called signal_log,
    • appends current timestamp to it,
    • closes the file.
Use to following code to write the current timestamp:

#include <time.h>
#include <unistd.h>
 
void writeDateTo(int fd) {
  struct timespec now;
  char buf[22];
  clock_gettime(CLOCK_REALTIME, &now);
  buf[20] = '\n';
  for (int i = 0; i < 10; ++i, now.tv_nsec /= 10)
    buf[19 - i] = '0' + now.tv_nsec % 10;
  buf[10] = '.';
  for (int i = 0; i < 10; ++i, now.tv_sec /= 10)
    buf[9 - i] = '0' + now.tv_sec % 10;
  write(fd, buf, 21);
}

Shared memory

It is possible for two processes to use the same physical address range of the main memory.
Obviously any modification to such memory done by one process is visible to the other processes that access the memory.
Such memory is called shared memory.

Keep in mind that although two processes use the same physical addresses, the virtual addresses are usually different.
So do not store any pointers in the shared memory - they will not be valid for other processes sharing the memory. Storing offsets (differences between addresses) within a continuous shared memory address range is fine.
There is an exception to this – when two processes share memory as a result of a fork, then the addresses naturally match.

To use shared memory, a program must explicitly request the kernel to set up a (possibly shared) memory address range, and tell the kernel what memory should be associated with the address range.
This is done using the mmap(…) Needs header: sys/mman.h function with MAP_SHARED flag.
On success, mmap returns an address to the start of the newly created address range.

To tell what memory should back the address rage, one must either pass a file descriptor to mmap or ask it to allocate some memory.
The latter can be shared only by this process and its newly created child processes, and is done by adding the MAP_ANONYMOUS flag, passing the file descriptor of -1 and passing the offset of 0.
MAP_ANONYMOUS is not part of the POSIX standard. However, virtually any UNIX-like system supports it.

When the file descriptor passed to mmap refers to an ordinary file, the file is automatically copied by the operating system from disk to memory (a memory page is fetched upon first access within the page). Writing the changes back to the file must be done manually by calling the msync function.

The file descriptor passed to mmap can also refer to a shared memory object. Such descriptors are returned by the shm_open function, which has identical arguments as the open function, but the returned descriptor refers to a region of main memory associated with a given name (rather then with a disk file associated with a path).
(In portable code, the name shall be one word starting with a slash.)

In mmap one has to specify the size of the memory range. If this size is larger than the backing file, then mmap succeeds, but any accesses to the mapped addresses beyond the file result in a SIGBUS signal.
To ensure that the file is large enough, one can use the following functions:

  • ftruncate Needs header: unistd.h resizes file to a given size.
    Whenever the file is larger it is truncated.
  • posix_fallocate Needs header: fcntl.h ensures that the file at least of a given size.
    Whenever the file is larger it is left unchanged.
    Warning: posix_fallocate is guaranteed to work for ordinary files. The result of using posix_fallocate on a shared memory object is undefined.

To clean up the memory mapping, one has to use the munmap function. (Provided one wishes to flush data to a backing ordinary file, one must call msync before munmap.)

The shm_open, mmap, msync and munmap functions need #include <sys/mman.h> (memory management).
The shm_open function needs also #include <fcntl.h> for the file open mode flags (such as O_RDWR).

To compile programs that use shm_open with older glibc versions, one must add -lrt to compile options.

Exercise 7 Test the following program that uses shared memory. Run it concurrently in multiple terminals.
Answer the following questions:
      • what is the size of the shared memory object?
      • how does a struct help in laying out memory of the shared object?
Try to modify the following:
      • print the address of the shared memory
      • add a new field called counter in the struct myData and implement i num and d num commands to increment/decrement the counter
      • change the prot argument of mmap to PROT_READ (without PROT_WRITE) and check how the program works
      • change the size argument of mmap to 1024*1024 and try to access an address:
              · within mapping, but outside shared memory object (e.g., putchar(*((char*)data+1025*1024));)
              · outside mapping (e.g., putchar(*((char*)data-1));).
        what is the difference in handling those two accesses by the OS?

simple_cli.c
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
#include <unistd.h>
 
#define CHECK(result, textOnFail)                                              \
  if (((long int)result) == -1) {                                              \
    perror(textOnFail);                                                        \
    exit(1);                                                                   \
  }
 
struct myData {
  int version;
  char text[1020];
};
 
int main() {
  int fd = shm_open("/os_cp", O_RDWR | O_CREAT, 0600);
  CHECK(fd, "shm_open failed");
  int r = ftruncate(fd, sizeof(struct myData));
  CHECK(r, "ftruncate failed");
  struct myData *data = mmap(NULL, sizeof(struct myData),
                             PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
  CHECK(data, "mmap failed");
  close(fd);
 
  printf("commands:\n"
         "  r           - reads the text\n"
         "  w <text>    - writes new text\n"
         "  q           - quits\n");
 
  while (1) {
    printf("> ");
    fflush(stdout);
 
    char c, text[1022] = {0};
    scanf("%1021[^\n]", text);
    do { // this reads all remaining characters in this line including '\n'
      c = getchar();
      CHECK(c, "getchar EOF'ed");
    } while (c != '\n');
 
    if (!strlen(text)) // empty line
      continue;
 
    switch (text[0]) {
    case 'r':
      printf("version: %d\n   text: %s\n", data->version, data->text);
      break;
    case 'w':
      data->version++;
      strcpy(data->text, text + 2);
      break;
    case 'q':
      munmap(data, sizeof(struct myData));
      exit(0);
      break;
    }
  }
}

Exercise 8 The following program uses the same shared memory object named /os_cp. Run it in one terminal, and run in parallel the program from the previous exercise. Try to read the text.

writer.c
#include <fcntl.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <unistd.h>
 
#define CHECK(result, textOnFail)                                              \
  if (((long int)result) == -1) {                                              \
    perror(textOnFail);                                                        \
    exit(1);                                                                   \
  }
 
struct myData {
  int version;
  char text[1020];
};
 
volatile sig_atomic_t stopFlag = 0;
void ctrlC(int num) { stopFlag = 1; }
 
int main() {
  int fd = shm_open("/os_cp", O_RDWR | O_CREAT, 0600);
  CHECK(fd, "shm_open failed");
  int r = ftruncate(fd, sizeof(struct myData));
  CHECK(r, "ftruncate failed");
  struct myData *data = mmap(NULL, sizeof(struct myData),
                             PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
  CHECK(data, "mmap failed");
  close(fd);
 
  signal(SIGINT, ctrlC);
  while (!stopFlag) {
    for (char letter = 'a'; letter <= 'z'; ++letter) {
      data->version++;
      for (int i = 0; i < 1020 - 1; ++i) {
        data->text[i] = letter;
      }
      if (stopFlag)
        break;
    }
  }
  munmap(data, sizeof(struct myData));
  return 0;
}
1) While using a pipe to communicate threads is inefficient, it has its use cases. For instance, it enables waking a thread that is waiting for I/O from multiple sources with select or poll.
2) POSIX defines only what happens with FIFO files opened with O_RDONLY and O_WRONLY. Result of using O_RDWR with a FIFO file is explicitly undefined in POSIX.
os_cp/pipes_signals.txt · Last modified: 2023/05/09 23:18 by jkonczak