Synchronization and Concurrency
Why Synchronization?
Cores work at their own pace. Without synchronization, race conditions occur when multiple threads access shared data without coordination.
Barrier Synchronization
A barrier ensures all threads reach a certain point before any proceed.
CUDA
c
__syncthreads(); // All threads in the block must reach this pointMPI
c
MPI_Barrier(MPI_COMM_WORLD); // All MPI processes must reach this pointMutual Exclusion
Locks (Mutex)
c
pthread_mutex_lock(&mutex);
// critical section
pthread_mutex_unlock(&mutex);Atomic Operations (CUDA)
c
atomicAdd(&sum, value); // Atomic addition
atomicExch(&ptr, value); // Atomic exchange
atomicCAS(&ptr, old, new); // Compare-and-swapAtomic operations serialize access to shared variables — use sparingly.
Memory Consistency Models
Sequential Consistency
The simplest model: memory operations appear to execute in program order. All processors see a single interleaving of operations.
Relaxed Consistency
Weaker guarantees for better performance:
- Total Store Ordering (TSO): Writes may be buffered, reads can bypass earlier writes
- Release Consistency: Distinguishes acquire (read) and release (write) operations; synchronization between them ensures visibility
False Sharing
When two threads modify different variables that happen to be on the same cache line:
- Thread 1 writes to variable A
- Cache coherence invalidates Thread 2's copy of the cache line
- Thread 2 writes to variable B, invalidating Thread 1's cache line
- ...ping-pong effect degrades performance
Solution: Pad data structures to align variables to cache line boundaries, or use thread-local storage.
Concurrency Control Patterns
Producer-Consumer
c
// Producer
produce(data);
signal(empty_slot);
// Consumer
wait(full_slot);
consume(data);Read-Write Lock
- Multiple readers can hold lock simultaneously
- Writer requires exclusive access
- Useful when reads vastly outnumber writes
Deadlock
Four necessary conditions:
- Mutual exclusion
- Hold and wait
- No preemption
- Circular wait
Prevention: Break any one condition (e.g., always acquire locks in a fixed order).
Practical Guidelines
- Minimize the critical section (keep it short)
- Avoid nesting locks with different orders
- Use lock-free data structures when possible
- Profile before optimizing — synchronization bottlenecks may not be where you expect