Reference Implementation of the Process Management Interface Exascale (PMIx) standard
Downloads Privacy Policy Security Policy Publications Community ContributeThe following versions of the PMIx Standard document are available:
Prior ad hoc versions of the standard were embodied in the header files of the corresponding releases of the PMIx Reference Implementation. These definitions have been superseded by the formal documents. Each version of the Standard includes information on all prior versions (e.g., the Version 2.0 document contains the definitions from Version 1) and clearly marks all additions/changes incorporated since the last release. Note that the PMIx Community chose not to release a Version 1 document due to the delay in getting the formal Standard document completed.
The PMIx developer community is currently in an early stage – thus, the process for extending the standard is relatively lightweight compared to more mature communities. In contrast, the process for modifying an existing definition in the standard is intentionally made to be very, very hard. This serves both to discourage any breaks in backward compatibility, and to push proposers to scrutinize and scrub their proposed extensions to ensure they have provided adequate flexibility for future uses.
Given the dynamic, fast-moving nature of the community at this time, the current process for extending the standard consists of the following stages:
Note that the above process relies heavily on the level of collaboration in the current PMIx community. No formal voting process is involved, nor are there “membership” requirements that must be met before someone has a voice in the process. This is likely to change as the community grows and matures. However, the expectation is that the standard will also have matured by that time, and so a slower, more formal process may be more appropriate.
The process for modifying an existing definition in the standard utilizes the same first three steps of the extension process. However, the initial presentation of the RFC/prototype must include:
The amount of information provided should reflect the magnitude of the proposed change. For example, a minor modification in behavior associated with an existing attribute would require less explanation than a change to an existing API. In many cases, proposals to modify definitions are changed to attribute extensions (i.e., the adding of new attribute definitions). This reflects the PMIx standard’s philosophy of adding sufficient flexibility (via an array of pmix_info_t directives) to each API to accommodate future additional or modified behaviors without perturbing the API itself.
Should the justification prove sufficiently convincing, a Notice of Impending Change (containing a summary of the proposed change and the justification) is sent out to the community’s mailing list alerting them to the proposed modification, and inviting comments. This provides an opportunity for users and implementors to voice their concerns and suggest modifications or alternative solutions. Three notices must be sent prior to a final review of the proposal.
Assuming no objections are raised, a final review of the proposal – and its justification – is conducted during a developer’s weekly telecon. The proposal can be accepted, rejected, or pushed back for modification at that time. If accepted, the change is made to the standard’s document – this will include both a description of the change, and the justification for it.
No standards body can require an implementor to support something in their standard, and PMIx is no different in that regard. While an implementor of the PMIx library itself must at least include the standard PMIx headers and instantiate each function, they are free to return “not supported” for any function they choose not to implement.
This also applies to the host environments. Resource managers and other system management stack components retain the right to decide on support of a particular function. The PMIx community continues to look at ways to assist SMS implementors in their decisions by highlighting functions that are critical to basic application execution (e.g., PMIx_Get), while leaving flexibility for tailoring a vendor’s software for their target market segment.
One area where this can become more complicated is regarding the attributes that provide information to the client process and/or control the behavior of a PMIx standard API. For example, the PMIX_TIMEOUT attribute can be used to specify the time (in seconds) before the requested operation should time out. The intent of this attribute is to allow the client to avoid “hanging” in a request that takes longer than the client wishes to wait, or may never return (e.g., a PMIx_Fence that a blocked participant never enters).
If an application (for example) truly relies on the PMIX_TIMEOUT attribute in a call to PMIx_Fence, it should set the required flag in the pmix_info_t for that attribute. This informs the library and its SMS host that it must return an immediate error if this attribute is not supported. By not setting the flag, the library and SMS host are allowed to treat the attribute as optional, ignoring it if support is not available.
It is therefore critical that users and application implementors:
While a PMIx library implementor, or an SMS component server, may choose to support a particular PMIx API, they are not required to support every attribute that might apply to it. This would pose a significant barrier to entry for an implementor as there can be a broad range of applicable attributes to a given API, at least some of which may rarely be used. The PMIx community is attempting to help differentiate the attributes by indicating those that are generally used (and therefore, of higher importance to support) vs those that a “complete implementation” would support.
Note that an environment that does not include support for a particular attribute/API pair is not “incomplete” or of lower quality than one that does include that support. Vendors must decide where to invest their time based on the needs of their target markets, and it is perfectly reasonable for them to perform cost/benefit decisions when considering what functions and attributes to support.
The flip side of that statement is also true: Users who find that their current vendor does not support a function or attribute they require may raise that concern to their vendor and request that the implementation be expanded. Alternatively, users may wish to utilize the PMIx Reference Server as a “shim” between their application and the host environment as it might provide the desired support until the vendor can respond. Finally, in the extreme, one can exploit the portability of PMIx-based application to change vendors.
The PMIx Standard is evolving fairly rapidly in response to milestones associated with delivery of next-generation supercomputers. Accordingly, the timeline is focused towards completing a broad array of features by the end of 2019. The standard is currently defined in 3-4 header files in each release, as shown below.
The initial version of the standard, released in late 2015, covers the basic functions required to launch and wireup a parallel application. This includes the following APIs:
Note that the last set of APIs (focused on error handlers) was subsequently replaced in v2 with a more generalized ability to handle events. In addition, there was a modification made to PMIx_Init and PMIx_Finalize in v2 to extend their flexibility and bring them into alignment with the PMIx standard practice of including attribute arrays to support future modifications of behavior.
The second version of the standard, released in mid 2017, extended the v1 release by adding support for workflow orchestration and tools.
Descriptions of these APIs are provided in the v2 RFCs shown below.
The third version of the standard, released in July 2018, focused on completion of “instant on” support, further support for tools and debuggers, and extension to support fabric and storage manager integration.
The fourth version of the standard is currently under development. The full set of new APIs has not yet been defined, but the standard is expected to be extended to provide schedulers with access to point-to-point communication cost information along with providing general access to network topology graphs, completion of debugger tool support, the initial support for storage requests, and support for the new PMIx Groups concept (in collaboration with the MPI Sessions Working Group). In addition, Python bindings for the PMIx APIs will be introduced in this release.
Development of the Standard can be followed in the v4 RFCs, as listed below.
Provides guidance on the expectations PMIx places on various cluster subsystems, including required as well as desired levels of support.