OpenPMIx

Reference Implementation of the Process Management Interface Exascale (PMIx) standard

View the Project on GitHub

Downloads   Privacy Policy   Security Policy   Publications   Community   Contribute

RFC0002

Title

PMIx Event Notification

Abstract

The PMIx Event Notification system provides a mechanism by which the resource manager can communicate system events to applications, thus providing applications with an opportunity to generate an appropriate response. In addition, applications can use the system to request that the resource manager notify their peers of internal events (e.g., computational errors and aborted operations), and notify the resource manager of events detected by the application.

Labels

[MODIFICATION] [EXTENSION] [ORGANIZATION]

Action

[APPROVED]

Copyright (c) 2016 Intel, Inc. All rights reserved.

This document is subject to all provisions relating to code contributions to the PMIx community as defined in the community’s LICENSE file. Code Components extracted from this document must include the License text as described in that file.

Description

The resource manager will be aware of a wide range of events that occur across the system. For the purposes of this discussion, only events that impact the allocated session being served by the PMIx server are considered. These events can be divided into two distinct classes:

Clients can indicate a desire to register solely for job-specific events by including the PMIX_EVENT_JOB_LEVEL key in their call to PMIx_Register_event – i.e., providing this key will explicitly indicate that environment events are not to be reported to this callback function.

Note that race conditions can cause the registration to come after events of possible interest (e.g., a memory ECC event that occurs after start of execution but prior to registration). RMs are free to cache events in this category for some time to mitigate this situation, but are not required to do so. Thus, applications must be aware that environment events prior to registration may not be included in notifications.

As above, clients can indicate a desire to register solely for environment events of a given type by include the PMIX_EVENT_ENVIRO_LEVEL key in their registration call.

The PMIx server will cache any environment and job-related events passed to it for a period of time to provide notification to clients that have not yet registered for them. Currently, the PMIx server uses a ring buffer to cache events. The size of the ring buffer defaults to 512 events (as of PMIx 2.0), but can be configured using the PMIx_server_cache_size info key during the call to the PMIx_Server_init API. Job-related events will be retained until all local clients have received them, regardless of the size or number of events being cached in the ring buffer. Of course, it is possible that enough job-related events could occur to “flood” the ring buffer, thereby causing events to be lost. A long-term solution to the “flood” problem remains as work-in-progress.

Client application processes can also use the PMIx Event Notification system to request that the resource manager notify its peers of internal events, and notify the resource manager of events detected by the application process. Examples of the latter include network communication errors that may not have been detected by the fabric manager itself (e.g., data corruption). The client must direct the notification to the appropriate target (RM or peers) using the corresponding range parameter.

Multiple event handlers registered against the same event are processed in a chain-like manner based on the order in which they were registered, as modified by directive. Registrations against specific event codes are processed first, followed by registrations against multiple event codes and then any default registrations. At each point in the chain, an event handler is called by the PMIx progress thread and given a function to call when that handler has completed its operation. The handler callback notifies PMIx that the handler is done, returning a status code to indicate the result of its work:

In addition to returning a status, each handler can return an array of pmix_info_t values that provide further information on the actions taken by that handler. The results are appended to the array of prior results, with the returned values combined into an array within a single pmix_info_t as follows:

There will always be a _pmix_info_t entry in the results array for each prior handler, and the array within that entry will always contain at least one element, as shown in the following diagram:

Event Notify Results Fig

The following set of data is provided to each registered event handler:

Note that the PMIx library itself does not register for event notifications. Internal events (e.g., unexpected client disconnect or message protocol failures) are resolved in code paths outside of the event notification system. However, such errors will generate events that can be received by the application and/or host resource manager if appropriate handlers have been registered.

Event Registration

Registration of event callbacks is accomplished via the PMIx_Register_event_handler API. The function takes the following set of parameters:

Registration of event callbacks that do not provide an array of info keys (beyond the optional PMIX_EVENT_HDLR_NAME) are considered default registrations for purposes of servicing order.

RM-Host Registrations

The RM host daemon is not required to register for any PMIx notifications. The daemon will automatically be notified (without registration) of client connection and finalize, plus any client service requests (including requests to distribute client-generated notifications), via the appropriate server callback functions, if provided. However, internal PMIx server errors (e.g., message protocol violations) will only be reported to the host RM if the RM daemon has registered for event notification, and will specify a NULL value for the target recipients.

Note that PMIx does request that the host RM daemon register for PMIx notifications so that any notifications targeted to the resource manager itself can be delivered.

Client Registrations

Application processes may request event notification via the PMIx_Register_event_handler API. Registrations are first recorded in the client’s notification callback stack based on the order in which calls to PMIx_Register_event_handler were issued, subject to adjustments per the provided info keys. This order will dictate the precedence given to event processing.

Once locally recorded, a registration request is sent to the local PMIx server for handling.

PMIx Server Registrations

The PMIx server acts as a proxy for client registrations. Once a registration request is received from a local client, the PMIx server records the registration and checks to see if the client is requesting notification of environmental events. If so, then the server checks to see if it is already registered with the host RM for matching events. If already registered, then no further action is required – otherwise, the PMIx server will register with the host RM for the specified events.

Once registration is complete, the server acks the request to the client, and then transmits any matching cached events to the client for local notification. Cached events are retained until the ring buffer becomes full, at which point the oldest events are ejected first.

Notifications
RM Notifications

PMIx expects that all RM notifications pertaining to an allocated session will be distributed to the RM daemons within that allocation. Job-specific events, and events for which the PMIx server has registered, are to be delivered upon receipt to the local PMIx server via the PMIx_Notify_event function. All environmental events are to be delivered to the PMIx server only if that server has previously registered for matching events.

Once the PMIx server has been notified of an event, it performs the following operations:

Upon receipt of a notification message, the PMIx client will scan its list of registered callback functions to identify appropriate recipients according to the following precedence rules:

The scan is continued until either a callback returns PMIX_EVENT_ACTION_COMPLETE, thereby indicating that the event has been handled and no further action is required, or all relevant callbacks have been executed. Return of any other status indicates that the procedure is to continue, with the returned status added to the results array passed along with the event. These updates are presented in a form where the key is the “name” given to the event callback (provided during registration), and the value is the returned status. Thus, subsequent event handlers can scan the incoming info key’s to see what prior event handlers reported.

Once the client has completed handling of an event, the received notification message is released. No return message is sent to the notifying server – it is assumed that any such action will be taken directly by an event handler if required.

Client-Based Notifications

The client may also choose to generate notifications, either by the application itself (e.g., informing its peers of some internal event) or by the PMIx client library for use by its host application. Examples of the latter include notification of loss of contact with the local PMIx server, which indicates that the process has become isolated and may be used to trigger a “suicide”.

Internal PMIx client library notifications are never transmitted to the local PMIx server. These notifications are only for use by the host application, and are provided based on registration by the application for events. Event registration by the client application does not differentiate between locally internal and external events. Thus, the user must differentiate by registering for specific internal error constants to separately respond to internal events. Currently supported internal events include:

Users are advised to check the release notes for their version for updates to this list.

Notifications generated by the application itself (via calls to PMIx_Notify_event) are transmitted to the local PMIx server for distribution. Since the PMIx server does not itself have the ability to communicate across nodes, it will pass the events on to the host RM daemon for distribution according to the provided _pmix_data_range_t parameter.

Protoype Implementation

The PMIx library implementation is covered in the PMIx Event Notification – Reimplementation pull request. The prototype has been tested against Open MPI as referenced in the Enable the PMIx event notification capability pull request.

Author(s)

Ralph H. Castain
Intel, Inc.
Github: rhc54