Skip to content

Introduction to Parallel Computing

What is Parallel Computing?

Parallel computing: The use of two or more processors (computers), usually within a single system, working simultaneously to solve a single problem.

Distributed computing: Computing that involves multiple computers remote from each other that each have a role in a computation problem or information processing.

Parallel vs. Distributed Systems

AspectParallel SystemsDistributed Systems
MemoryTightly coupled shared memory (UMA, NUMA)Distributed memory (message passing, RPC)
ControlGlobal clock control (SIMD, MIMD)No global clock; synchronization algorithms needed
InterconnectBus, mesh, tree, hypercube (Tbps)Ethernet, token ring, switching network (Gbps)
GranularityFineCoarse
ReliabilityAssumedNot assumed
Main FocusPerformance (time and scale)Performance (cost, scalability), reliability, resource sharing

Why Parallel Computing?

  • Moore's Law: Transistor density doubles every ~18 months
  • Clock speed scaling has hit physical limits (power density, ILP tapped out)
  • Chip yield favors many smaller, simpler processors
  • Memory is not keeping pace with processor logic

Application Pull

  • Scientific computing: climate modeling, genomics, astrophysics, CFD
  • Engineering: semiconductor design, structural modeling, crash simulation
  • Business: financial modeling, web services, search engines
  • Simulation is the "third pillar of science" alongside theory and experiment

How to Write Parallel Programs

Task Parallelism

Partition various tasks carried out in solving the problem among cores.

  • Example: TA#1 grades questions 1-5, TA#2 grades questions 6-10

Data Parallelism

Partition the data used in solving the problem among cores. Each core carries out similar operations on its part of the data.

  • Example: Each TA grades 100 exam papers

Coordination

Cores need to coordinate:

  • Communication: Sending partial results between cores
  • Load balancing: Distributing work evenly
  • Synchronization: Ensuring cores don't get too far ahead

Key Insight

Sometimes the best parallel solution requires devising an entirely new algorithm, not just parallelizing a serial one. Example: tree-based global sum vs. master-slave global sum — factor of 100 improvement with 1000 cores.