Slurm support for PMIx was first included in v16.05 based on the PMIx v1.2 release. It has since been updated to support the PMIx v2.x series, as per the following table:
Slurm/PMIx Compatibility Matrix
Distributions provide separate RPMs for Slurm’s PMIx support. If installing from source, note that an appropriate version of PMIx must be installed prior to building Slurm! More details are provided below.
The following discussion assumes that you are using a version of Slurm that supports PMIx.
PMIx can be used in one of two ways: either directly via PMIx APIs, or via the backward compatibility interfaces that support the PMI-1 and PMI-2 APIs. When backward compatibility is enabled at configure (which is the default), PMIx provides both libpmi and libpmi2 libraries that contain the respective APIs and a copy of the PMIx library – each API is translated into its PMIx equivalent. This was done at the request of users whose apps/libs were hardcoded to dlopen “libpmi” or “libpmi2”. Unfortunately, Slurm also includes plugins for those versions of PMI, and the PMI-1 plugin is built by default (the PMI-2 plugin must be manually built and installed). Thus, there is a potential installation conflict between the Slurm and PMIx versions of libpmi and libpmi2.
We recommend installing Slurm and PMIx in different (non-default) locations to avoid the conflict. Alternatively, the distributions are modifying their packaging plan to move the PMI support for Slurm into a separate libpmi-slurm rpm, and doing the same with the PMI-1 and PMI-2 support from PMIx (the precise naming of these packages can be distro-dependent). These rpm’s will then be setup to generate a conflict if someone attempts to install both of them in the same location. Note that this will not resolve the problem if, for example, Slurm is installed via rpm but PMIx is installed from source as no conflict check is made.
Further complicating the situation is the inherent incompatibility of the Slurm vs PMIx PMI-1 and PMI-2 implementations. If you build your application and link it against “libpmi2”, and that library is actually the one from PMIx, then it won’t work with Slurm’s PMI-2 plugin because the communication protocols are completely different. The same is true for the PMI-1 plugin. Thus, it is necessary that the PMIx plugin be invoked even when utilizing either PMI-1 or PMI-2 interfaces via the PMIx backward compatibility feature.
There are a couple of reasons why you might want to use the PMIx backward compatibility in place of the native Slurm plugins. First, installing and using the PMIx libraries provides access to the PMIx APIs even if your underlying library doesn’t use them. This allows your application code to, for example, take advantage of event notification, allocation request support, and other PMIx features – all of which are directly accessible to the application.
Second, there are some launch performance enhancements implemented in the more recent PMIx plugin (starting with Slurm 17.11) which will be utilized even through the backward compatible PMI-1 or PMI-2 interfaces, but are not available if using the Slurm PMI plugins. These are explained further below.
So if you want to use PMIx backward compatibility, you need to:
Most of the standard PMI2 calls are covered by the backward compatibility libraries, so things like MPICH should work out-of-the-box (and tests confirm it does). However, MVAPICH2 added a PMI2 extension call to the Slurm PMI2 library that they use and PMIx doesn’t cover (as there really isn’t an easy equivalent, and they called it PMIX_foo which causes a naming conflict). Thus, MVAPICH2 users wanting to use the PMIx backward compatibility library must be careful to build MVAPICH2 against the PMIx PMI-2 header and not the one from Slurm.
Building from Source
Building Slurm with PMIx support from source is fairly simple to do. It begins with installing an appropriate PMIx release as per the above table. Instructions for obtaining and installing PMIx are provided elsewhere.
If PMIx was installed to a standard location (i.e., with a prefix of /usr or /usr/local), then Slurm will find it and build the PMIx plugin by default. Otherwise, Slurm should be configured with the –with-pmix=path-to-where-pmix-was-installed option. The PMIx plugin will be built and installed under the Slurm location.
Starting with Slurm 17.11, it is possible to build against multiple PMIx versions. For example, building against both PMIx versions 1.2 and 2.1 can be done by specifying –with-pmix=path-to-1.2:path-to-2.1 on the Slurm configure line. When submitting a job, the desired version can then be selected using either the –mpi=pmix_v1 or –mpi=pmix_v2 command line options for “srun”. If the non-version-specific –mpi=pmix is given, then the highest installed PMIx version will be used.
Executing an application using the Slurm PMIx plugin (whether via the native PMIx or the backward compatibility APIs) requires that one add –mpi=pmix (or a version-specific directive) to the srun command line. Alternatively, a system administrator can designate the desired PMIx plugin as the default in the slurm.conf file.
Using the UCX Extension
Starting with the Slurm 17.11 release, the PMIx plugin was extended to support several new features:
As the SC’17 presentation shows, the new features demonstrated good results on a small scale. Further validation at larger scales is underway.
All the new features are controllable at runtime on a per-jobstep basis using the following environmental variables (envars):
The SC’17 presentation includes two backup slides explaining how to enable point-to-point and collectives micro-benchmarks integrated into the PMIx plugin to get some basic reference number for the performance on your system.