Skip to content
Snippets Groups Projects
Commit e93d1a09 authored by FKHals's avatar FKHals
Browse files

Add VRM Jobid to the send process infos

which only works if the "SLURM_VRM_JOBID" environment variable is set.
If that is not the case, the message looks just as before.
That means that if the env var is set then the locserv (test server)
will throw an error since it will not be able to parse the message.

Also add a troubleshooting entry in the README.
parent ecb03886
No related branches found
No related tags found
1 merge request!5Add VRM Jobid to the send process infos
...@@ -33,6 +33,20 @@ make all ...@@ -33,6 +33,20 @@ make all
``` ```
Now run the two scripts first `./server.sh` and then `./client.sh` in two separate terminal windows to observe that the clients (the MPI-processes) inform the server of their identity an then the server answers with modified ranks which the client then applies, completing it's initialization. Now run the two scripts first `./server.sh` and then `./client.sh` in two separate terminal windows to observe that the clients (the MPI-processes) inform the server of their identity an then the server answers with modified ranks which the client then applies, completing it's initialization.
## Troubleshooting
### MPI not found
If running `./client.sh` causes the following error:
```shell
./client.sh: Zeile 21: mpirun: Command not found.
```
Then the problem is caused by MPI not being found.
In this case the `OMPI` and `PATH` variables are not properly set.
They need to point to the Open MPI installation directory (as set in the `--prefix` of the `configure` call at the installation).
---
**Now follows the actual/default Open MPI README:** **Now follows the actual/default Open MPI README:**
# Open MPI # Open MPI
...@@ -94,48 +108,49 @@ $ ../configure --prefix=<path> |& tee config.out ...@@ -94,48 +108,49 @@ $ ../configure --prefix=<path> |& tee config.out
The rest of this file contains: The rest of this file contains:
* [General release notes about Open MPI](#general-notes) - [Custom Open MPI](#custom-open-mpi)
* [Platform-specific notes](#platform-notes) - [Building](#building)
* [Compiler-specific notes](#compiler-notes) - [Usage](#usage)
* [Run-time support notes](#general-run-time-support-notes) - [Troubleshooting](#troubleshooting)
* [MPI functionality and features](#mpi-functionality-and-features) - [MPI not found](#mpi-not-found)
* [OpenSHMEM functionality and - [Open MPI](#open-mpi)
features](#openshmem-functionality-and-features) - [Quick start](#quick-start)
* [MPI collectives](#mpi-collectives) - [Table of contents](#table-of-contents)
* [OpenSHMEM collectives](#openshmem-collectives) - [General notes](#general-notes)
* [Network support](#network-support) - [Platform Notes](#platform-notes)
* [Open MPI extensions](#open-mpi-extensions) - [Compiler Notes](#compiler-notes)
* [Detailed information on building Open MPI](#building-open-mpi) - [General Run-Time Support Notes](#general-run-time-support-notes)
* [Installation options](#installation-options) - [MPI Functionality and Features](#mpi-functionality-and-features)
* [Networking support and options](#networking-support--options) - [OpenSHMEM Functionality and Features](#openshmem-functionality-and-features)
* [Run-time system support and options](#run-time-system-support) - [MPI Collectives](#mpi-collectives)
* [Miscellaneous support - [OpenSHMEM Collectives](#openshmem-collectives)
libraries](#miscellaneous-support-libraries) - [Network Support](#network-support)
* [MPI functionality options](#mpi-functionality) - [Open MPI Extensions](#open-mpi-extensions)
* [OpenSHMEM functionality options](#openshmem-functionality) - [Building Open MPI](#building-open-mpi)
* [Miscellaneous functionality - [Installation Options](#installation-options)
options](#miscellaneous-functionality) - [Networking support / options](#networking-support--options)
* [Open MPI version and library numbering - [Run-time system support](#run-time-system-support)
policies](#open-mpi-version-numbers-and-binary-compatibility) - [Miscellaneous support libraries](#miscellaneous-support-libraries)
* [Backwards compatibility polices](#backwards-compatibility) - [MPI Functionality](#mpi-functionality)
* [Software version numbering](#software-version-number) - [OpenSHMEM Functionality](#openshmem-functionality)
* [Shared library version numbering](#shared-library-version-number) - [Miscellaneous Functionality](#miscellaneous-functionality)
* [Information on how to both query and validate your Open MPI - [Open MPI Version Numbers and Binary Compatibility](#open-mpi-version-numbers-and-binary-compatibility)
installation](#checking-your-open-mpi-installation) - [Backwards Compatibility](#backwards-compatibility)
* [Description of Open MPI extensions](#open-mpi-api-extensions) - [Software Version Number](#software-version-number)
* [Compiling the extensions](#compiling-the-extensions) - [Shared Library Version Number](#shared-library-version-number)
* [Using the extensions](#using-the-extensions) - [Checking Your Open MPI Installation](#checking-your-open-mpi-installation)
* [Examples showing how to compile Open MPI applications](#compiling-open-mpi-applications) - [Open MPI API Extensions](#open-mpi-api-extensions)
* [Examples showing how to run Open MPI applications](#running-open-mpi-applications) - [Compiling the extensions](#compiling-the-extensions)
* [Summary information on the various plugin - [Using the extensions](#using-the-extensions)
frameworks](#the-modular-component-architecture-mca) - [Compiling Open MPI Applications](#compiling-open-mpi-applications)
* [MPI layer frameworks](#mpi-layer-frameworks) - [Running Open MPI Applications](#running-open-mpi-applications)
* [OpenSHMEM component frameworks](#openshmem-component-frameworks) - [The Modular Component Architecture (MCA)](#the-modular-component-architecture-mca)
* [Run-time environment - [MPI layer frameworks](#mpi-layer-frameworks)
frameworks](#back-end-run-time-environment-rte-component-frameworks) - [OpenSHMEM component frameworks](#openshmem-component-frameworks)
* [Miscellaneous frameworks](#miscellaneous-frameworks) - [Back-end run-time environment (RTE) component frameworks:](#back-end-run-time-environment-rte-component-frameworks)
* [Other notes about frameworks](#framework-notes) - [Miscellaneous frameworks:](#miscellaneous-frameworks)
* [How to get more help](#questions--problems) - [Framework notes](#framework-notes)
- [Questions? Problems?](#questions--problems)
Also, note that much, much more information is also available [in the Also, note that much, much more information is also available [in the
Open MPI FAQ](https://www.open-mpi.org/faq/). Open MPI FAQ](https://www.open-mpi.org/faq/).
......
...@@ -35,6 +35,7 @@ ...@@ -35,6 +35,7 @@
#include <stdio.h> #include <stdio.h>
#include <stdlib.h> #include <stdlib.h>
#include <sys/socket.h> #include <sys/socket.h>
#include <sys/types.h>
#include <sys/un.h> #include <sys/un.h>
#include <unistd.h> #include <unistd.h>
#include <string.h> #include <string.h>
...@@ -57,6 +58,8 @@ ...@@ -57,6 +58,8 @@
#define FD_STDIN 0 #define FD_STDIN 0
#define BUFFLEN 128 #define BUFFLEN 128
#define JOBID_ENV_VAR "SLURM_VRM_JOBID"
/* /*
** Table for Fortran <-> C communicator handle conversion ** Table for Fortran <-> C communicator handle conversion
** Also used by P2P code to lookup communicator based ** Also used by P2P code to lookup communicator based
...@@ -141,11 +144,20 @@ static int get_modified_ranks(uint32_t jobid, uint32_t vpid, size_t size, opal_v ...@@ -141,11 +144,20 @@ static int get_modified_ranks(uint32_t jobid, uint32_t vpid, size_t size, opal_v
pid_t pid = getpid(); pid_t pid = getpid();
const char * vrm_jobid = getenv(JOBID_ENV_VAR);
char vrm_jobid_with_leading_comma[sizeof(uint64_t) + 1] = "";
if (NULL != vrm_jobid) {
char comma = ',';
strncat(vrm_jobid_with_leading_comma, &comma, 1);
strcat(vrm_jobid_with_leading_comma, vrm_jobid);
printf("TEST: %s", vrm_jobid_with_leading_comma);
}
char info_to_send[BUFFLEN]; char info_to_send[BUFFLEN];
memset(info_to_send, 0, BUFFLEN); memset(info_to_send, 0, BUFFLEN);
snprintf(info_to_send, BUFFLEN, snprintf(info_to_send, BUFFLEN,
"{\"msg_type\": 128, \"msg_data\": \"%d,%u,%u,%zu\"}", "{\"msg_type\": 128, \"msg_data\": \"%d,%u,%u,%zu%s\"}",
pid, vpid, jobid, size); pid, vpid, jobid, size, vrm_jobid_with_leading_comma);
// TODO: endianness // TODO: endianness
uint32_t msg_length = strlen(info_to_send) + 1; uint32_t msg_length = strlen(info_to_send) + 1;
......
...@@ -84,6 +84,9 @@ int main(void) { ...@@ -84,6 +84,9 @@ int main(void) {
uint32_t jobid = 0; uint32_t jobid = 0;
size_t size = 0; size_t size = 0;
int vars_read = sscanf(client_message, "{\"msg_type\": 128, \"msg_data\": \"%d,%u,%u,%zu\"}", &pid, &vpid, &jobid, &size); int vars_read = sscanf(client_message, "{\"msg_type\": 128, \"msg_data\": \"%d,%u,%u,%zu\"}", &pid, &vpid, &jobid, &size);
if (4 != vars_read) {
errorExit("Message could not be parsed: Too many/few entries in msg_data!");
}
printf("Sscanf read count: %d (should equal 4)\n", vars_read); printf("Sscanf read count: %d (should equal 4)\n", vars_read);
printf("Spawned - PID: %d, vpid: %u, jobID: %u, size: %zu\n", pid, vpid, jobid, size); printf("Spawned - PID: %d, vpid: %u, jobID: %u, size: %zu\n", pid, vpid, jobid, size);
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment