Reference Implementation of the Process Management Interface Exascale (PMIx) standard
Downloads Privacy Policy Security Policy Publications Community ContributeAdd APIs and internal support for RM-network library interactions
Add a network support framework and appropriate APIs so that the RM can:
* precondition an application (e.g., adding a security token to the
app's environment) prior to launch
* setup the local network driver to support an application (e.g., for
"instant on" address resolution) prior to spawning the local processes
* pass directives in the environment of client processes prior to forking
* cleanup after each child process terminates
* cleanup after all local children for a given application have
terminated
[EXTENSION][SERVER-API]
[APPROVED]
Copyright 2016 Intel, Inc. All rights reserved.
This document is subject to all provisions relating to code contributions to the PMIx community as defined in the community’s LICENSE file. Code Components extracted from this document must include the License text as described in that file.
Part of the “Instant On” initiative relies on establishing a partnership between the resource manager (RM) and the networking library that allows the combination to fully setup the messaging environment prior to spawn of an application’s processes. Completing this procedure enables applications to communicate without discovery and exchange of network endpoint information.
There are two new APIs required to enable this support:
The structures may consist of any combination of key-value pairs, and the RM shall:
if the key consists of PMIX_SET_ENVAR, then the provided string shall be included in the environment of each spawned application process in this nspace. Each environmental variable shall be provided in a separate pmix_info_t structure.
if the key consists of PMIX_UNSET_ENVAR, then the environment of the application shall be searched, and the provided envar removed if present. Each environmental variable shall be provided in a separate pmix_info_t structure.
all other key-values shall be included in the job-level info proved at process start.
Note: this API is not network-specific. Thus, as other precondition data is identified in the future, internal support can be extended to ensure all precondition data is included without changing this API.
The expected flow of operation is that the workload manager will call PMIx_server_setup_application from its head node (or system management node) once for each job to be launched. The returned information will then be included in the launch message containing the job description sent to each compute node. The compute node PMIx server will subsequently include the information in its call to PMIxServer_register_nspace_ so that the local client processes will receive it.
Note: some components executed by the PMIx_server_setup_local_support call may require elevated (e.g., root) privileges.
Note: this API is not network-specific. Thus, as other setup operations are identified in the future, internal support can be extended to ensure all setup is accomplished without changing this API.
Several other operations are also required by this RFC, but are not done as part of exposed APIs – i.e., they are simple additions to internal procedures. These include:
Passing network-specific environmental variables to the process at startup. A call into the PNET framework has been added to the PMIx_server_setup_fork function for this purpose.
Cleanup of network library information at child process termination. A call into the PNET framework has been added to the PMIx server library upon detection of termination.
Cleanup of network library information upon termination of all local child processes from a given application. A call into the PNET framework has been added to the PMIx server library for this purpose.
The PMIx library implementation is covered in the Add network support APIs pull request.
Ralph H. Castain
Intel, Inc.
Github: rhc54