Concurrent and System Programming

Lecturer: dr hab. inż. Paweł T. Wojciechowski, prof. PP

Web page in e-kursy: https://ekursy.put.poznan.pl/course/view.php?id=35705

Consultation: by e-mail

Over the past decades, Moore’s law has been in force: “the number of transistors in an integrated circuit doubles about every two years”. Each new generation of processors made the programs to work faster. However, this trend is changing. The future computer systems will have more and more CPU multi-core units, but each individual unit will not be faster than the unit from the previous year. Thus, to fully utilize the hardware capabilities, computations must be carried out concurrently.

A major focus of concurrent programming is to correctly and effectively program the parts of the code that require coordination and synchronization, because these parts can substantially affect the correctness and performance of our concurrent program as a whole.

The aim of the course is to introduce concurrency control concepts and their implications for system design and implementation aimed for current computer architectures. A handout for students is available here.

Below is a list of topics (Winter term 2023) (Syllabus in Polish):

Introduction

  1. Processes and threads.

  2. Concurrency vs parallelism.

Synchronization

  1. A critical section and the need for concurrent coordination, explained solving a real life example. The example has been used to explain basic properties: mutual exclusion, deadlock-freedom, starvation-freedom (or lockout-freedom), and limited waiting.

  2. Two basic synchronization patterns: atomicity and condition synchronization.

  3. Atomicity implemented using a single lock vs fine-grain locking; deadlock on locks.

  4. Condition synchronization illustrated by a bounded buffer.

  5. A synchronization barrier.

  6. Two basic techniques to implement condition synchronization and mutual exclusion (locks): spinning (busy waiting) and blocking (at the thread scheduler).

  7. Correctness: safety (e.g., mutual exclusion, deadlock freedom) and liveness (e.g., starvation freedom, livelock freedom).

  8. Amdahl’s Law.

Shared-Memory Architecture

  1. Uniform memory access (UMA) vs Nonuniform memory access (NUMA) modern hardware architectures.

  2. Cores, caches (write-through vs write-back, cache misses), and interconnect.

  3. Temporal and spacial locality.

  4. A cache-coherent parallel system: directory-based vs snoopy-based cache coherence protocols; write broadcast and write invalidate approaches.

  5. Sequential consistency vs relaxed memory models.

  6. Sources of inconsistency: processor, cache, interconnect, and compilers.

  7. Synchronizing instructions in the relaxed memory architectures: fences and special loads and stores for memory access.

  8. Understanding concurrent programs executed on relaxed memory architectures: memory coherence, local order, global order, program order, instructions bypassing instructions, fully synchronizing vs weaker synchronizing instructions (allow weakening the local order for efficiency).

  9. TSO (Total Store Order) (e.g. x86) vs “more relaxed” machines (e.g. ARM).

  10. Atomic machine instructions (TAS, Swap, FAI, FAA, CAS, and LL/SC).

  11. Atomic read-modify-write for any function with CAS, and LL/SC.

  12. The ABA problem in a linked-list stack and the fix using a counted pointer technique.

Deadlock

  1. Requirements for deadlock: exclusive use, hold and wait, irrevocability (no preemption), and circularity.

  2. A resource allocation graph (RAG).

  3. Methods to ensure deadlock freedom:

    • breaking some requirement for deadlock,

    • preventing deadlock by design,

    • detecting deadlock and recovery,

    • avoiding deadlock (i.e., allocating resources safely).

  4. RAG for avoiding deadlock.

  5. Dijkstra’s Banker’s algorithm for avoiding deadlock.

Theory

  1. Designing safe concurrent objects.

  2. Sequential consistency and lack of composable orders.

  3. Linearizability and its formal definition (sequential, well formed, legal, and linearizable histories); linearizable objects.

  4. Comparison of sequential consistency and linearizability using examples.

  5. Properties of linearizability: compositionality (locality) and nonblocking, with proofs.

  6. Parallelizing a sorted linked list using hand-over-hand locking (lock coupling) for efficiency.

  7. Transactions, serializability, and strict serializability.

  8. Comparison of ordering properties (sequential consistency, linearizability, serializability, and strict serializability).

  9. Liveness: blocking vs nonblocking.

  10. Variants of nonblocking progress: wait freedom (≡ starvation freedom), lock freedom (≡ livelock freedom), and obstruction freedom.

  11. Fairness (weak vs strong) by examples.

  12. The consensus number to describe the relative “power” of atomic primitives.

  13. Data races and the happens-before relation.

  14. Towards language-level memory models.

Spin Locks

  1. Historical algorithms to solve mutual exclusion (implement locks):

    • Dekker’s algorithm for two threads and its generalization to n-threads,

    • Peterson’s algorithm for two threads, with a proof sketch,

    • Lamport’s “bakery” algorithm.

  2. Algorithms using TAS:

    • test-and-set (TAS) lock,

    • test-and-test-and-set (TTAS) lock,

    • a lock with exponential backoff.

  3. Mellor-Crummey and Scott’s ticket lock (with FAI).

  4. Extended interface: timeout, trylocks.

  5. A reentrant lock and its example implementation.

  6. The readers and writers problem (including fairness issues).

  7. Reader-writer lock and its example implementation using CAS and FAA.

  8. Other variants of locks: queued locks and sequence locks.

  9. The Read-copy update (RCU) strategy to avoid some locks; a general scheme for setting grace period in order to reclaim memory safely.

Scheduling, Semaphores, Monitors, and CCRs

  1. Thread scheduling: multiplexing concurrent (kernel) threads on a single core (coroutines, context blocks, a ready list, preemption, and routines: reschedule, yield, and transfer).

  2. Dijkstra’s semaphores and its blocking implementation (at the thread scheduler level).

  3. Binary vs counting semaphores.

  4. N-resource allocation with a semaphore.

  5. The producer-consumer problem and a bounded buffer.

  6. A bounded buffer implemented using semaphores.

  7. Monitors and condition variables.

  8. Hoare’s classic monitor and its implementation using semaphores.

  9. A bounded buffer implemented using Hoare’s monitor.

  10. Monitors in Java/C#.

  11. Comparison of the “classic” and Java/C# monitors.

  12. The nested monitor problem.

  13. Conditional critical region (CCR).

  14. A bounded buffer implemented using CCR.

Barriers

  1. Example use of flags and barriers.

  2. The sense-reversing centralized algorithm implementing a barrier.

  3. The dissemination barrier.

  4. Fuzzy barriers improving performance of programs with barriers.

  5. Fuzzy variant of the sense-reversing barrier.

  6. Hardware AND barriers and the “Eureka” OR operation.

  7. Series-parallel execution in Cilk (extension of C for parallel computing).

Transactional Memory

  1. Transactional memory (TM) and its potential advantages (e.g. scalability, composability, and deadlock prevention).

  2. Conditional synchronization with retry and orElse.

  3. The design of software transactional memory (STM): progress guarantees, buffering of speculative updates (undo logging vs redo logging), conflict resolution (eagerly vs lazily), access tracking (“ownership” records).

  4. Validation in STM and correctness issues (single lock atomicity, opacity).

  5. Inconsistent state due to interaction of STM transactions with nontransactional code (the “publication” and “privatization” examples).

  6. Hardware transactional memory (HTM); extension of the MESI cache coherence protocol.

  7. Advantages and disadvantages of HTM.

Nonblocking Algorithms for Concurrent Data Structures

  1. An example of a single-word atomic counter with CAS.

  2. The lock-free Treiber’s stack algorithm (using CAS).

  3. The lock-free Michael and Scott’s M&S queue algorithm (using CAS).

  4. The Harris and Michael’s H&M list – a general idea.

  5. Herlihy’s universal construction of a nonblocking data structure from its sequential specification.

Concurrency without Shared Data

  1. Concurrent access via explicit communication.

  2. Active objects in the Ada programming language; rendezvous and selective rendezvous.

  3. A bounded buffer implemented using guards in Ada.

  4. Message passing in operating systems and as the means to implement remote procedure call (RPC).

  5. Example of synchronous message passing (with semantics of the CSP calculus) in the Occam programming language.

  6. Actors and asynchronous message passing in the Erlang programming language.

  7. Mobile agents and asynchronous message passing (with semantics of the pi-calculus) in the Nomadic Pict programming language.

This document was generated on Apr 17, 2024.