- Programming with shared memory vs message passing
- Shared Memory Computer: UMA vs NUMA
---
title: Recap II
layout: center
---
# How to Design Parallel Programs/Applications?
<v-click>
Using Foster's Design Methodology.
</v-click>
<v-clicks>
-**Partitioning**: The process of dividing the computation and data into pieces.
-**Communication**: The process of determining how tasks will communicate with each other, distinguishing between local communication and global communication.
-**Agglomeration**: The process of grouping tasks into larger tasks to improve performance or simplify programming.
-**Mapping**: The process of assigning tasks to physical processor.
</v-clicks>
---
title: OpenMP I
---
# OpenMP
An API for Writing Multithreaded Applications.
- A set of compiler directives and library routines
- Greatly simplifies writing multi-threaded programs in C/C++, Fortran
- Standardized
OpenMP is a multi-threading, shared address model.
### Assumptions
GNU GCC or Clang is already available on your machine.
Details about OpenMP support in compilers can be found with the following links:
- Master thread spawns a team of threads as needed.
- Parallelism added incrementally until performance goals are met: i.e. the sequential program evolves into a parallel program.
---
title: OpenMP II
---
## Exercise-1
Your first OpenMP program.
Finish the `hello.c` in `exercises/OpenMP`.
### Hint
Use the `#pragma omp parallel` directive to create a parallel construct.
Find and use the suitable function declared in `<omp.h>`.
---
title: OpenMP III
---
## Exercise-2
Try to parallelize the program that calculates the integral:
$$
\int_{0}^{1} \frac {4.0} {(1 + x^2)} \,dx = \pi
$$
Using the classical approximation: calculate the sum of the area of the rectangles below the curve.
Create a parallel version of the sequential pi program using a parallel construct using **SPMD (Single Program Multiple Data)**
See `pi.c` in `exercises/OpenMP`.
### Hint
In addition to a parallel construct, you will need the runtime library routines:
-`int omp_get_num_threads();` - number of threads in the team
-`int omp_get_thread_num();` - thread ID or rank
-`double omp_get_wtime();` - time in seconds elapsed since a fixed point in the past
---
title: OpenMP IV
---
### Solution with SPMD
See live demo.
This pattern is very general and has been used to support most (if not all) the algorithm strategy patterns.
### Problem with SPMD
If independent data elements happen to sit on the same cache line, each update will cause the cache lines to “slosh back and forth” between threads -- This is called _false sharing_.