Shared Memory Concurrency
- The Little Book of Semaphores by Allen B. Downey (Introductory, GNU FDL)
- Concurrent Programming on Windows By Joe Duffy, 2008
- Java concurrency in practice By Brian Goetz, Tim Peierls, 2006 (Concurrency-interest -- Discussion list for JSR-166)
- The art of multiprocessor programming By Maurice Herlihy, Nir Shavit, 2008
- Is Parallel Programming Hard, And, If So, What Can You Do About It? by Paul E. McKenney (CC-BY-SA)
- Programming on Parallel Machines by Norm Matloff -- covers GPU programming with a CUDA focus
- http://gcc.gnu.org/wiki/Atomic -- gcc's status of implementing C++11 and C11 atomics. Links to good background knowledge papers as well.
- gcc __atomic builtins, C++11 memory model aware, introduced in gcc 4.7 (the LLVM Atomic Instructions and Concurrency Guide has a much more readable, simplified description of the C++11 memory model)
- example: FreeBSD: stdatomic.h
- legacy gcc __sync builtins based on Intel Itanium Processor Specific Application Binary Interface (ABI) section 7.4 Synchronization Primitives
- To see what instruction __sync_synchronize generates for each architecture for a full memory barrier: look for "memory_barrier" in gcc/config/*/*.md
- Compare above with mb, rmb, wmb implementations in linux-2.6/arch/*/include/asm/system.h
- As of gcc-4.4.2 the libgomp OpenMP helper library implements atomic_write_barrier() with __sync_synchronize() on ia64, x86, mips, and s390 while powerpc uses "eieio", sparc uses "membar #StoreStore"and alpha uses "wmb".
- MemoryBarrier Macro
- _InterlockedCompareExchange Intrinsic Functions
- Synchronization and Multiprocessor Issues
- Interlocked Variable Access
- Multiprocessor Considerations for Kernel-Mode Drivers, Updated: October 31, 2004
Static and Dynamic Checking
- Using Promela and Spin to verify parallel algorithms by Paul McKenney, August 1, 2007
- Verifying Multi-threaded C Programs with SPIN
- The Relacy Race Detector
- Valgrind: Helgrind: a thread error detector
- Cambridge Relaxed Memory Concurrency Group
- PPCMEM/ARMMEM: A Tool for Exploring the POWER and ARM Memory Models (lwn: Validating Memory Barriers and Atomic Instructions by Paul McKenney)
- (older) The JSR-133 Cookbook for Compiler Writers
- LLVM Atomic Instructions and Concurrency Guide
- Getting C++ Threads Right
- Double Checked Locking Is Broken
- Thread Pools, compare
- Qt: compare using QtConcurrent::run() to schedule work and Qt:FutureWatcher to monitor results with signals and slots with Java:
- Java: java.util.concurrent.Executor, javax.swing.SwingWorker (two different methods to schedule work). I can't even figure out how to get notifications from Executor with NIO.
- Ulrich Drepper
- Origins Of Concurrent Programming
- Paul E. McKenney (Read Copy Update) (SMP Scalability Papers) (RCU Papers)
- Is Parallel Programming Hard, And, If So, What Can You Do About It?, Jan. 2, 2011
- Sleepable RCU, October 9, 2006
- RCU and Unloadable Modules, January 14, 2007
- The design of preemptible read-copy-update, October 8, 2007
- What is RCU, Fundamentally? December 17, 2007
- What is RCU? Part 2: Usage, December 24, 2007
- RCU part 3: the RCU API, April 22, 2008
- Integrating and Validating dynticks and Preemptable RCU, April 22, 2008
- Hierarchical RCU, November 4, 2008
- Lockdep-RCU, February 1, 2010
- Mathieu Desnoyers Ph.D. dissertation: Low-Impact Operating System Tracing
- Petra VM: Memory Consistency Models, February 7th, 2010 by luis
- The Old New Thing: High-performance multithreading is very hard, May 28, 2004
- The Linux "hwspinlock" framework: allows the implementation of synchronization primitives on systems where different cores are running different operating systems
- Hans Boehm: Threads and memory model for C++
Software Transactional Memory
- TM & Languages Google Group produced the Draft Specification of Transactional Language Constructs for C++
- Transactional Memory in GCC(implements a modified version of the Intel TM ABI)
- A brief retrospective on transactional memory by Joe Duffy, January 2010
- http://lwn.net/Articles/336039/ -- Transactional Memory discussion
- "Transactional Memory: Architectural Support for Lock-Free Data Structures" Herlihy & Moss (1993)
- "Software Transactional Memory", Shavit and Toutiou, 1997.
- "A Methodology for Implementing Highly Concurrent Data Objects", Herlihy, 1993.
- "A Methodology for Implementing Highly Concurrent Data Structures", Herlihy, 1990.
- "A Practical Multi-Word Compare-and-Swap Operation", Harris, Fraser, Pratt, 2002.
- "Impossibility and Universality Results for Wait-Free Synchronization", Herlihy, 1988.
- "Practical Lock-Free and Wait-Free LL:SC:VL Implementations Using 64-bit CAS", Michael, 2004.
- "Wait-Free Synchronization", Herlihy, 1991.
- Peter Sewell Computer Laboratory, University of Cambridge: "... looking for PhD students to work on the semantics of concurrent programs, focussed especially on the relaxed memory models of real-world multiprocessors and programming languages, or in other applied-semantics areas."
- urcu: Userspace RCU (read-copy-update)
- liburcu provides read-side access which scales linearly with the number of cores. It does so by allowing multiples copies of a given data structure to live at the same time, and by monitoring the data structure accesses to detect grace periods after which memory reclamation is possible.
- liburcu-cds provides efficient data structures based on RCU and lock-free algorithms including hash tables, queues, stacks, and doubly-linked lists."
- Concurrency Kit provides a plethora of concurrency primitives, safe memory reclamation mechanisms and lock-less and lock-free data structures designed to aid in the design and implementation of high performance concurrent systems.
- nbds: C implementations of several scalable non-blocking data structures for x86 and x86-64
- System Programming
- MSDN Parallel Computing Learning Resources
- Computer Architecture And Compilers