init tutorial-7

6df30f2b · Mactavish Zhan · fbd79397 · 6df30f2b · 6df30f2b · 6df30f2b
Commit 6df30f2b authored 2 years ago by Mactavish Zhan
--- a/README.md
+++ b/README.md
-# ALP4 Tutorial-6
+# ALP4 Tutorial-7

-This branch contains all materials for the 6th tutorial session.
+This branch contains all materials for the 7th tutorial session.

 ## Agenda

 - Assignment's solution presentation (if any)
- Recap & Discussion: Parallelism, OpenMP
+- Recap & Discussion: Parallelism, MPI
 - Q&A
--- a/slides/pages/qa.md
+++ b/slides/pages/qa.md
@@ -6,16 +6,10 @@ title: Q&A

 Any questions about:

- Fourth Assignment Sheet
 - Fifth Assignment Sheet
+- Sixth Assignment Sheet
 - Topics from the lectures
 - Organisation

 <br/>

-### Materials
-
- [OpenMP Tutorials](https://www.openmp.org/resources/tutorials-articles/)
- [Using OpenMP with C](https://curc.readthedocs.io/en/latest/programming/OpenMP-C.html)
- [Parallel Programming Primer](https://researchcomputing.princeton.edu/support/knowledge-base/parallel-code)
- [Parallel Algorithm Design](http://compsci.hunter.cuny.edu/~sweiss/course_materials/csci493.65/lecture_notes/chapter03.pdf)
--- a/slides/pages/recap.md
+++ b/slides/pages/recap.md
@@ -11,180 +11,3 @@ layout: center
 - Concurrent programming vs parallel programming
 - Programming with shared memory vs message passing
 - Shared Memory Computer: UMA vs NUMA
-
---
-title: Recap II
-layout: center
---
-
-# How to Design Parallel Programs/Applications?
-
-<v-click>
-
-Using Foster's Design Methodology.
-
-</v-click>
-
-<v-clicks>
-
- **Partitioning**: The process of dividing the computation and data into pieces.
- **Communication**: The process of determining how tasks will communicate with each other, distinguishing between local communication and global communication.
- **Agglomeration**: The process of grouping tasks into larger tasks to improve performance or simplify programming.
- **Mapping**: The process of assigning tasks to physical processor.
-
-</v-clicks>
-
---
-title: OpenMP I
---
-
-# OpenMP
-
-An API for Writing Multithreaded Applications.
-
- A set of compiler directives and library routines
- Greatly simplifies writing multi-threaded programs in C/C++, Fortran
- Standardized
-
-OpenMP is a multi-threading, shared address model.
-
-### Assumptions
-
-GNU GCC or Clang is already available on your machine.
-
-Details about OpenMP support in compilers can be found with the following links:
-
- [GNU](https://gcc.gnu.org/projects/gomp/)
- [Clang](https://clang.llvm.org/docs/OpenMPSupport.html)
-
-You can also take a look at [OpenMP reference cards](https://www.openmp.org/resources/refguides/).
-
-
---
-title: OpenMP Model
---
-
-## Fork-Join Model
-
-<div class="container flex justify-center mt-10 mb-10">
-    <img src="/images/fork-join-model.png" class="block w-lg"/>
-</div>
-
- Master thread spawns a team of threads as needed.
- Parallelism added incrementally until performance goals are met: i.e. the sequential program evolves into a parallel program. 
-
---
-title: OpenMP II
---
-
-## Exercise-1
-
-Your first OpenMP program.
-
-Finish the `hello.c` in `exercises/OpenMP`.
-
-### Hint
-
-Use the `#pragma omp parallel` directive to create a parallel construct.
-
-Find and use the suitable function declared in `<omp.h>`.
-
---
-title: OpenMP III
---
-
-## Exercise-2
-
-Try to parallelize the program that calculates the integral:
-
-$$
-\int_{0}^{1} \frac {4.0} {(1 + x^2)} \,dx = \pi
-$$
-
-Using the classical approximation: calculate the sum of the area of the rectangles below the curve.
-
-Create a parallel version of the sequential pi program using a parallel construct using **SPMD (Single Program Multiple Data)**
-
-See `pi.c` in `exercises/OpenMP`.
-
-### Hint
-
-In addition to a parallel construct, you will need the runtime library routines:
-
- `int omp_get_num_threads();` - number of threads in the team
- `int omp_get_thread_num();` - thread ID or rank
- `double omp_get_wtime();` - time in seconds elapsed since a fixed point in the past
-
---
-title: OpenMP IV
---
-
-### Solution with SPMD
-
-See live demo.
-
-This pattern is very general and has been used to support most (if not all) the algorithm strategy patterns.
-
-### Problem with SPMD
-
-If independent data elements happen to sit on the same cache line, each update will cause the cache lines to “slosh back and forth” between threads -- This is called _false sharing_.
-
-<div class="container flex justify-center">
-    <img src="/images/false-sharing.png" class="block w-lg"/>
-</div>
-
-Correct but horrible performance due to bouncing the cache line back and forth.
-
---
-title: OpenMP V
---
-
-## Exercise-3
-
-Try to refactor the program from **exercise-2** using **synchronization** in OpenMP.
-
-Synchronization in OpenMP:
-
- **barrier**: `#pragma omp barrier`
- **critical**: `#pragma omp critical`
- **atomic**: `#pragma omp atomic` but only for basic binary operations such as `=`, `++`, `--`, `+=`, `-=` etc.
- _ordered_
- _flush_
- _locks_
-
---
-title: OpenMP VI
---
-
-## Worksharing
-
-A parallel construct by itself creates an SPMD, i.e., each thread redundantly executes the same code. 
-
-How do you split up pathways through the code between threads within a team? -- via _worksharing_
-
- **Loop construct**: `#pragma omp for` with schedule clauses (affects how loop iterations are mapped onto threads)
-    - `schedule(static [,chunk])`
-    - `schedule(dynamic, [,chunk])`
-    - `schedule(guided, [,chunk])`
-    - `schedule(runtime)`
-    - `schedule(auto)`
- Sections/section construct: `#pragma omp sections` and `#pragma omp section`
- Single construct: `#pragma omp single`
- _Task construct_
-
---
-title: OpenMP VII
---
-
-## Exercise-4
-
-Try to parallelize the original serial pi program with a loop construct in OpenMP.
-
-### Hints
-
- loop index `i` is private by default
- Use reduction: `reduction (op : list) `
-    - A local copy of each list variable is made and initialized depending on the “op” (e.g. 0 for “+”). 
-    - The variables in “list” must be shared in the enclosing parallel region.
-    - Updates occur on the local copy
-    - Local copies are reduced into a single value and combined with the original global value 
--- a/slides/slides.md
+++ b/slides/slides.md
@@ -16,7 +16,7 @@ transition: fade-out
 css: unocss
 ---

-# ALP4 Tutorial 6
+# ALP4 Tutorial 7

 ## Chao Zhan