diff --git a/slides/images/Blocking.png b/slides/images/Blocking.png deleted file mode 100644 index 9eba32da386649db88eec89ae019341448b9c8ef..0000000000000000000000000000000000000000 Binary files a/slides/images/Blocking.png and /dev/null differ diff --git a/slides/images/MPI-datatypes.png b/slides/images/MPI-datatypes.png deleted file mode 100644 index f71edd6a23f6c4514e9cc6f705b406b3e8b6cb71..0000000000000000000000000000000000000000 Binary files a/slides/images/MPI-datatypes.png and /dev/null differ diff --git a/slides/images/SPMD.png b/slides/images/SPMD.png deleted file mode 100644 index 0d10faeef597f4ed11e267788c177c284dd0779a..0000000000000000000000000000000000000000 Binary files a/slides/images/SPMD.png and /dev/null differ diff --git a/slides/images/buffered-mode.png b/slides/images/buffered-mode.png deleted file mode 100644 index a477e2aa93953dd6a342a12caf7632e2fa254116..0000000000000000000000000000000000000000 Binary files a/slides/images/buffered-mode.png and /dev/null differ diff --git a/slides/images/cluster.png b/slides/images/cluster.png deleted file mode 100644 index c270b447276972c9145773d4afd958d483b5e9da..0000000000000000000000000000000000000000 Binary files a/slides/images/cluster.png and /dev/null differ diff --git a/slides/images/communicator-split.png b/slides/images/communicator-split.png deleted file mode 100644 index 74a6db9b39bcdc1387be81ee70d7bf1aa1e528e5..0000000000000000000000000000000000000000 Binary files a/slides/images/communicator-split.png and /dev/null differ diff --git a/slides/images/communicator.png b/slides/images/communicator.png deleted file mode 100644 index b6337b1ca7aea67dfe9c8ad3ce1d47fadbdff100..0000000000000000000000000000000000000000 Binary files a/slides/images/communicator.png and /dev/null differ diff --git a/slides/images/deadlock-prevention.png b/slides/images/deadlock-prevention.png deleted file mode 100644 index 17b291f9377f2044153515cac6e5289219b7fb46..0000000000000000000000000000000000000000 Binary files a/slides/images/deadlock-prevention.png and /dev/null differ diff --git a/slides/images/distributed-memory.png b/slides/images/distributed-memory.png deleted file mode 100644 index 2eb1face240df5bc2f3cddccd05d34248e9092a4..0000000000000000000000000000000000000000 Binary files a/slides/images/distributed-memory.png and /dev/null differ diff --git a/slides/images/inf-runtime.png b/slides/images/inf-runtime.png new file mode 100644 index 0000000000000000000000000000000000000000..ac37b46b1f5c85a3fa18f40cedfe1760bbcef50e Binary files /dev/null and b/slides/images/inf-runtime.png differ diff --git a/slides/images/non-blocking-detailed.png b/slides/images/non-blocking-detailed.png deleted file mode 100644 index 6197ac43d870652ce87689c9b0b426962dd9e7ba..0000000000000000000000000000000000000000 Binary files a/slides/images/non-blocking-detailed.png and /dev/null differ diff --git a/slides/images/non-blocking.png b/slides/images/non-blocking.png deleted file mode 100644 index 29c214280d1b681e2646db8fc4149e21dd48b9fb..0000000000000000000000000000000000000000 Binary files a/slides/images/non-blocking.png and /dev/null differ diff --git a/slides/images/p-runtime.png b/slides/images/p-runtime.png new file mode 100644 index 0000000000000000000000000000000000000000..663bebe5c4d7bd039197f5c8e20b79af945ed50d Binary files /dev/null and b/slides/images/p-runtime.png differ diff --git a/slides/images/p2p.png b/slides/images/p2p.png deleted file mode 100644 index 1ef2c97938af84313d9378b09ecc909730e9da1e..0000000000000000000000000000000000000000 Binary files a/slides/images/p2p.png and /dev/null differ diff --git a/slides/images/runtime.png b/slides/images/runtime.png new file mode 100644 index 0000000000000000000000000000000000000000..de29bb7564308c98d41fbbc0f8f500f692e0b2f5 Binary files /dev/null and b/slides/images/runtime.png differ diff --git a/slides/images/shared-memory.png b/slides/images/shared-memory.png deleted file mode 100644 index 0c50e799c420c1333f9161e5d073f4c9f398137b..0000000000000000000000000000000000000000 Binary files a/slides/images/shared-memory.png and /dev/null differ diff --git a/slides/images/standard-mode.png b/slides/images/standard-mode.png deleted file mode 100644 index 15be2eae820eca35b8111568df51f1a82c3b0986..0000000000000000000000000000000000000000 Binary files a/slides/images/standard-mode.png and /dev/null differ diff --git a/slides/images/synchronous-mode.png b/slides/images/synchronous-mode.png deleted file mode 100644 index 6863b8060d20f66ef25b164fc269b374c2a4fd8e..0000000000000000000000000000000000000000 Binary files a/slides/images/synchronous-mode.png and /dev/null differ diff --git a/slides/pages/recap.md b/slides/pages/recap.md index 912fc8389a80cfdc9aab4872aec769a26b329c07..2405cb3f79139901a26446e6ffcb86fe186f004f 100644 --- a/slides/pages/recap.md +++ b/slides/pages/recap.md @@ -2,740 +2,77 @@ title: Recap I --- -# Recap +# Recap & Discussion -### Parallel Architecture - Shared Memory +### MPI vs OpenMP -<div class="container flex justify-center mt-5"> - <img src="/images/shared-memory.png" class="block w-lg"/> -</div> - -#### Pros and Cons +<br/> -- **Pros**: shared/direct memory access, fast, single OS instance -- **Cons**: cache coherence expensive, data races, complex hardware (NUMA) +- What are the differences between MPI and OpenMP? +- What are the advantages and disadvantages of each? +- Can we combine them? --- title: Recap II --- -### Parallel Architecture - Distributed Memory - -<div class="container flex justify-center mt-5"> - <img src="/images/distributed-memory.png" class="block w-lg"/> -</div> - -#### Pros and Cons - -- **Pros**: separated main memories, better scalability, no data races -- **Cons**: communication overhead, more system complexity +### Amdahl's Law ---- -title: Recap III ---- +**The runtime of a program**: -### Parallel Architecture - Cluster +- sequential part: $T_s$ +- parallelisable part: $T_p$ +- total execution time: $T = T_s + T_p$ +- serial fraction: $f = \frac{T_s}{T} \, (0 \le f \le 1)$ -<div class="container flex justify-center mt-5 mb-5"> - <img src="/images/cluster.png" class="block w-lg"/> +<div class="container flex justify-left"> + <img src="/images/runtime.png" class="block w-lg"/> </div> -- HPC market is at large dominated by distributed memory multicomputers: clusters and specialised supercomputers. -- _Usually_ separated machines, copies of the same OS (environment), connected via LAN. +**The speedup with n-fold (n processors) parallelisation:**: ---- -title: Recap IV ---- - -### SPMD Model +- total execution time: $T_n = T_s + \frac{T_p}{n}$ +- parallel speedup: $S_n = \frac{T}{T_n} = \frac{T}{T_s + \frac{T_p}{n}} = \frac{1}{f + \frac{1-f}{n}} = \frac{n}{(n-1)f + 1}$ +- parallel effinciency: $E_n = \frac{T}{nT_n} = \frac{S_n}{n} = \frac{1}{(n-1)f + 1}$ -Single Program Multiple Data: - -<div class="container flex justify-center mt-5"> - <img src="/images/SPMD.png" class="block w-lg"/> +<div class="container flex justify-left"> + <img src="/images/p-runtime.png" class="block w-lg"/> </div> -- Abstractions make programming and understanding easier -- Multiple instruction flows (instances) from a Single Program working on Multiple (different parts of) Data -- Instances could be threads (OpenMP) and/or processes (MPI) - - --- -title: MPI I +title: Recap III --- -## MPI - -A language-agnostic specification of a set of communication and I/O operations. - -Standard bindings for C and Fortran. Non-standard bindings for other languages: C++, Java, Python etc. +### Amdahl's Law II +**What happens when $n \rightarrow \infty$?** <v-click> -### Differences between MPI and OpenMP - -Unlike OpenMP, MPI does not extend the base language, but provides a set of library functions and a specialised runtime. It also makes use of existing compilers. +- $T_{\infty} = T_s + \frac{T_p}{\infty} \rightarrow T_s$ +- $S_{\infty} = \frac{T}{T_{\infty}} \rightarrow \frac{T}{T_s} = \frac{1}{f}$ +- $E_{\infty} = 1$ if $f = 0$; otherwise $E_{\infty} = 0$ </v-click> <v-click> -### Documentation - -- [The MPI Forum document archive (the standards)](https://www.mpi-forum.org/docs/) -- [OpenMPI documentation](https://www.open-mpi.org/doc/) -- [MPICH documentation](https://www.mpich.org/documentation/guides/) - -</v-click> - ---- -title: MPI II ---- - -## General Structure of an MPI Program - -Start-up, initialisation, finalisation, and shutdown in C: - -```c -#include <mpi.h> - -int main(int argc, char **argv) { - // some code - - // initialization of the MPI runtime environment - MPI_Init(&argc, &argv); - - // code that handles computation & communication - - // finalization of the MPI runtime enviroment, internal buffers are flushed etc. - MPI_Finalize(); - - // wrap-up code - return 0; -} -``` - -MPI programs are compiled and executed using the wrapper commands: `mpicc` and `mpirun`. - ---- -title: MPI III ---- - -## MPI Example: Hello World - -Important functions: - -**MPI_Init(int \*argc, int \*\*\*argv)**: initializes MPI runtime, **must be called before most other MPI routines are called**. - -**MPI_Comm_size(MPI_Comm comm, int \*size)**: indicates the number of processes involved in a communicator. For MPI\_COMM\_WORLD, it indicates the total number of processes available. - -**MPI_Comm_rank(MPI_Comm comm, int \*rank)**: gives the rank of the process in the particular communicator’s group. - -**MPI_Get_processor_name(char \*name, char \*resultlen)**: name of the processor on which it was called at the moment of the call. - -**MPI_Finalize(void)**: cleans up the MPI library and prepares the process for termination, **must be called once before the process terminates**. - -### Demo - -See live demo in the tutorial. - ---- -title: MPI IV ---- - -# Important Concepts in MPI - -### Ranks - -- The processes in any MPI program are initially indistinguishable -- **MPI\_Init** assigns each process a unique identity – rank -- Ranks range from 0 up to the total number of processes minus 1 - -<div class="container flex justify-center mt-5"> - <img src="/images/communicator.png" class="block w-lg"/> -</div> - - -Ranks are associated with the so-called **communicators**. - ---- -title: MPI IV ---- - -### Communicator - -<br/> - -- Logical contexts where communication takes place, an abstraction of a group of processes. -- Represent groups of MPI processes with some additional information. -- The most important one is the world communicator **MPI_COMM_WORLD** (by default). -- Ranks are always provided in MPI calls in combination with the corresponding communicator -- You can split or create your own communicators. - -<div class="container flex justify-center mt-5"> - <img src="/images/communicator-split.png" class="block w-lg"/> -</div> - - ---- -title: MPI V ---- - -## Point to Point Communication - -<div class="container flex justify-center mt-5"> - <img src="/images/p2p.png" class="block w-lg"/> +<div class="container flex justify-left mt-5"> + <img src="/images/inf-runtime.png" class="block w-lg"/> </div> -### Basic requirements - -What do we need to know/do in order to create a point to point communication? - -<v-clicks> - -- Send and receive operations (How?) -- Identification of both the sender and the receiver (To Whom?) -- Specification of what has to be sent/received (What?) - -</v-clicks> - ---- -title: MPI VI ---- - -### Sending & Receving Data - -**MPI_Send(void \*data, int count, MPI_Datatype type, int dest, int tag, MPI_Comm comm)** - -**MPI_Recv(void \*data, int count, MPI_Datatype type, int source, int tag, MPI_Comm comm, MPI_Status \*status)** - -- `data`: location of the send/receive buffer -- `count`: number of data elements to be sent/received -- `type`: the data type to be sent/received -- `source`/`dest`: rank of the sender/receiver or **MPI_ANY_SOURCE** for receiver. -- `tag`: additional information for distinguishing the messages or **MPI_ANY_TAG** for the receiver -- `comm`: communicator -- `status`: status of the receive operation or **MPI_STATUS_IGNORE** - -### Message Envelope - -Apart from its bare data, each message has a message envelope. This has enough information to distinguish messages from each other: **the source**, **destination**, **tag**, **communicator**. - ---- -title: MPI VII ---- - -## MPI Datatypes - -MPI cannot infer the type of elements in the supplied buffer at run time, so it has to be specified. - -**MPI datatype must match the language type(s) in the data buffer.** - -MPI datatypes in C: - -<div class="container flex justify-center mt-5"> - <img src="/images/MPI-datatypes.png" class="block w-lg"/> -</div> - - ---- -title: MPI VII ---- - -## MPI Example: Simple Point to Point Communication - -Important functions: - -- **MPI_Abort(MPI_Comm comm, int errorcode)**: makes a "best attempt" to abort all tasks in the group of comm. -- Default blocking send & receive (**MPI_Send** & **MPI_Recv**) - -### Error handling - -Most MPI calls in C return an integer error code. Failure is indicated by error codes other than **MPI_SUCCESS**. - -```c -if (MPI_SUCCESS != MPI_Send(...)) { - // error handling here -} -``` - -In general, error checking in simple programs is redundant. - - -### Demo - -See live demo in the tutorial. - ---- -title: MPI VIII -layout: center ---- - -## Exercise - -Pass around the token value by all processes in a ring-like fashion. - -E.g. for 4 processes: - -token -> 0 -> 1 -> 2 -> 3 -> 0 -> token - -Add all missing function calls in `exercises/MPI_examples/round_trip/round_trip.c` - -<v-click> - -**Question: what is bad about this program?** - </v-click> ---- -title: MPI IX ---- - -### Message Reception and Status - -The receive buffer must be able to fit the entire message: - -- send count <= receive count (<span class="text-green-500">OK</span>) -- send count > receive count (<span class="text-rose-500">Error: message truncation</span>) - -The MPI status object holds information about the received message. - -**MPI_Status status** object is a structure in C with freely accessible members: - -```c -status.MPI_SOURCE // message source rank -status.MPI_TAG // message tag -status.MPI_ERROR // receive status code -``` - -### Message Size Inquiry - -**MPI_Get_count (MPI_Status \*status, MPI_Datatype datatype, int \*count)** - -- Calculates how many **datatype** elements can be formed from the data in the message referenced by status -- Can be used with the status from **MPI_Recv** or **MPI_Probe** - - ---- -title: MPI Probe ---- - -### MPI Probe - -Blocks until a matching message appears without actually receiving the message: - -**MPI_Probe (int source, int tag, MPI_Comm comm, MPI_Status \*status)** - -One must still call **MPI_Recv** to receive the message and copy the data into the buffer. - -When probing tells you that there is a message, you can use **MPI_Get_count** to determine its size, allocate a large enough receive buffer, and do a regular receive to have the data copied. - -```c -if (rank == receiver) { - MPI_Status status; - MPI_Probe(sender, 0, comm, &status); - int count; - MPI_Get_count(&status,MPI_FLOAT,&count); - float recv_buffer[count]; - MPI_Recv(recv_buffer, count, MPI_FLOAT, sender, 0, comm, MPI_STATUS_IGNORE); -} else if (rank == sender) { - float buffer[buffer_size]; - MPI_Send(buffer, buffer_size, MPI_FLOAT, receiver, 0, comm); -} -``` - ---- -title: Operating Completion ---- - -### Operation Completion - -**MPI operations complete (or return from the function call) once the message buffer is no longer in use by the MPI library and is thus free for reuse**. - -Send operations complete: - -- once the message is constructed **and** - - placed completely onto the network **or** - - buffered completely (by MPI, the OS, the network, …) - -Receive operations complete: - -- once the entire message has arrived and has been placed into the buffer - -**Blocking MPI calls only return once the operation has completed**: - -- **MPI_Send** and **MPI_Recv** are blocking - ---- -title: Blocking Calls ---- - -### Blocking Calls - -Blocking send and receive calls (without buffering): - -<div class="container flex justify-center mt-5"> - <img src="/images/blocking.png" class="block w-lg"/> -</div> - -**Both MPI_Send and MPI_Recv calls are blocking** - ---- -title: Deadlock I ---- - -- The receive operation only returns after a matching message has arrived -- The send operation **might** be buffered (**implementation-specific**) and therefore return before the message is actually placed onto the network - -**Deadlock !!!** - -```c -if (rank == 0) { - MPI_Recv(&a, 1, MPI_INT, 1, 0, MPI_COMM_WORLD, &status); - MPI_Send(&b, 1, MPI_INT, 1, 0, MPI_COMM_WORLD); -} else if (rank == 1) { - MPI_Recv(&a, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &status); - MPI_Send(&b, 1, MPI_INT, 0, 0, MPI_COMM_WORLD); -} -``` - -**Solution: non-symmetric calls (not scalable):** - -```c -if (rank == 0) { - MPI_Recv(&a, 1, MPI_INT, 1, 0, MPI_COMM_WORLD, &status); - MPI_Send(&b, 1, MPI_INT, 1, 0, MPI_COMM_WORLD); -} else if (rank == 1) { - MPI_Send(&b, 1, MPI_INT, 0, 0, MPI_COMM_WORLD); - MPI_Recv(&a, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &status); -} -``` - ---- -title: Deadlock II ---- - -In contrast, the following code will often not cause any deadlock in practice: - -```c -if (rank == 0) { - MPI_Send(&b, 1, MPI_INT, 1, 0, MPI_COMM_WORLD); - MPI_Recv(&a, 1, MPI_INT, 1, 0, MPI_COMM_WORLD, &status); -} else if (rank == 1) { - MPI_Send(&b, 1, MPI_INT, 0, 0, MPI_COMM_WORLD); - MPI_Recv(&a, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &status); -} -``` - -This is known as an **eager send**, and it relies on the availability of some amount of available buffer space (using buffering). - -### How to crack down this? - -See live demo. - <v-click> -This tells us: <span class="text-rose-500 font-bold">Never rely on any implementation-specific behaviour!!!</span> - -</v-click> - ---- -title: Deadlock III ---- - -### Another Solution to Deadlock - -**Using Combined Send and Receive**: - -```c -MPI_Sendrecv ( - void *senddata, - int sendcount, - MPI_Datatype sendtype, - int dest, - int sendtag, - void *recvdata, - int recvcount, - MPI_Datatype recvtype, - int source, - int recvtag, - MPI_Comm comm, - MPI_Status *status -) -``` - -Sends one message and receives one message (in any order) **without deadlocking(unless unmatched)** - -<span class="text-rose-500 font-bold">Send and receive buffers must not overlap!</span> - -Don't want two buffers? -> Use **MPI_Sendrecv_replace**! - - ---- -title: Message Ordering ---- - -### Message Ordering - -**Order is preserved in a given communicator for point-to-point operations between any pair of processes**: - -- Messages within some communicator to the same rank are non-overtaking -- Probe/receive returns the earliest matching message - -**Order is not guaranteed for**: - -- messages sent within different communicators -- messages arriving from different senders - -See live demo. - ---- -title: Non-Blocking Calls ---- - -### Non-Blocking Calls - -**Non-blocking MPI calls return immediately while the communication operation continues asynchronously in the background** - -Each non-blocking operation is represented by a **request**: +**What does this tell us?** -- In C: **MPI_Request** -Non-blocking operations are progressed by certain MPI calls but most notably by the **test** and **wait** MPI calls. - -Blocking MPI calls = non-blocking calls + immediate waiting - -**Benifits?** +</v-click> <v-click> -<span class="text-rose-500 font-bold">Overlay communications, prevent deadlocks</span> +**No parallel program can outrun the sum of its sequential parts!** </v-click> ---- -title: Non-Blocking Calls 2 ---- - -## Related Operations - -Non-blocking send and receive: - -```c -MPI_Isend (void *data, int count, MPI_Datatype dataType, int dest, int tag, -MPI_Comm comm, MPI_Request *request) - -MPI_Irecv (void *data, int count, MPI_Datatype dataType, int source, int tag, -MPI_Comm comm, MPI_Request *request) -``` - -Blocking wait for completion: - -```c -MPI_Wait (MPI_Request *request, MPI_Status *status) -``` - -The request is passed by reference, so that the wait routine can free it: - -- The wait call deallocates the request object, **and** -- sets the value of the variable to **MPI_REQUEST_NULL** - - ---- -title: Non-Blocking Calls 3 -layout: two-cols ---- - -**Equivalent to blocking calls**: - -<div class="container flex justify-center mt-5 mr-5"> - <img src="/images/non-blocking.png" class="block w-lg"/> -</div> - -::right:: - -**Other work can be done in between**: - -<div class="container flex justify-center mt-5 ml-5"> - <img src="/images/non-blocking-detailed.png" class="block w-lg"/> -</div> - ---- -title: Deadlock Prevention ---- - -## Deadlock Prevention - -**Non-blocking operations can be used to prevent deadlocks in symmetric code:** - -<div class="container flex justify-center mt-5"> - <img src="/images/deadlock-prevention.png" class="block w-lg"/> -</div> - -<span class="text-rose-500 font-bold">That is how MPI_Sendrecv is usually implemented.</span> - - ---- -title: Non-Blocking Request Testing ---- - -Using the following function to test if a given operation has completed: - -```c -MPI_Test (MPI_Request *request, int *flag, MPI_Status *status) -``` - -- **flag** will be set to **true** if the operation has completed, otherwise **false** -- **status**: only set if **flag** is **true** -- Can be (and usually is) called repeatedly inside a loop -- When the operations is completed, the **request** is freed and set to **MPI_REQUEST_NULL** - -<br/> - -### Null Request Object - -If **MPI_Request** is Null, both **MPI_Wait** and **MPI_Test** returns immediately. - -### Demo - -See live demo. - ---- -title: Non-Blocking Request Testing 2 ---- - -### Test and Wait on Many Requests - -**MPI_Waitany / MPI_Testany**: - -- Wait for one of the specified requests to complete and free it -- Test if one of the specified requests has completed and free it if it did - -**MPI_Waitall / MPI_Testall**: - -- Wait for all the specified requests to complete and free them -- Test if all of the specified requests have completed and free them if they have - -**MPI_Waitsome / MPI_Testsome** - -- Wait for any number of the specified requests to complete and free them -- Test if any number of the specified requests have completed and free these that have - -Use **MPI_STATUSES_IGNORE** to ignore status from -all/-some operations. - -See live demo. - ---- -title: Communication Modes ---- - -## Communication Modes - -Four send modes: - -- **Standard** -- **Synchronous** -- **Buffered** -- _Ready_ - -Only one receive mode: **Synchronous** - -Send modes differ in the relation between the **completion of the operation** and the **actual message transfer** - ---- -title: Send Modes I ---- - -### Standard Mode - -The call blocks until the message has **either** been transferred or **copied** to an internal buffer for later delivery. - -Representative: <span class="text-rose-500 font-bold">MPI_Send</span> - -<div class="container flex justify-center mt-5"> - <img src="/images/standard-mode.png" class="block w-lg"/> -</div> - ---- -title: Send Modes II ---- - -### Synchronous Mode - -The call blocks until a matching receive has been posted and the message reception has started. - -Representative: <span class="text-rose-500 font-bold">MPI_Ssend</span> - -<div class="container flex justify-center mt-5"> - <img src="/images/synchronous-mode.png" class="block w-lg"/> -</div> - ---- -title: Send Modes III ---- - -### Buffered Mode - -The call blocks until the message has been copied to a user-supplied buffer. Actual transmission may happen at a later point - -Representative: <span class="text-rose-500 font-bold">MPI_Bsend</span> - -<div class="container flex justify-center mt-5"> - <img src="/images/buffered-mode.png" class="block w-lg"/> -</div> - ---- -title: Send Modes IV ---- - -### Ready Mode (Don't use) - -The operation succeeds only if a matching receive has already been posted. - -Behaves as standard send in every other aspect - -Representative: <span class="text-rose-500 font-bold">MPI_Rsend</span> - -**Advice**: avoid using this function unless you are 100% sure of what you are doing. It's error-prone. - ---- -title: Send Modes Calls ---- - -**These modes can be combined with the concept of blocking and non-blocking:** - -- **MPI_Send**: blocking standard send -- **MPI_Isend**: non-blocking standard send -- **MPI_Ssend**: blocking synchronous send -- **MPI_Issend**: non-blocking synchronous send -- **MPI_Bsend**: blocking buffered send -- **MPI_Ibsend**: non-blocking buffered send -- **MPI_Rsend**: blocking ready-mode send -- **MPI_Irsend**: non-blocking ready-mode send - -**Buffered operations** require an explicitly provided user buffer using: - -- **MPI_Buffer_attach (void \*buf, int size)** -- **MPI_Buffer_detach (void \*buf, int \*size)** -- Buffer size must also consider the envelope size (**MPI_BSEND_OVERHEAD**) - ---- -title: Caveats on Send Modes ---- - -## Caveats - -One rarely needs anything else except the standard send. - -The synchronous send can be used to synchronise two ranks. - -**Simple correctness check:** - -- Replacing all blocking standard sends with blocking synchronous sends should not result in deadlock -- If program deadlocks, you are relying on the buffering behaviour (implementation-specific) of the standard send -> change your algorithm - -**Common Pitfalls**: - -- Pass pointers to pointers in MPI calls. -- Use flat multidimensional arrays, arrays of pointers do not work.