OpenPMIx

Reference Implementation of the Process Management Interface Exascale (PMIx) standard

View the Project on GitHub

Downloads   Privacy Policy   Security Policy   Publications   Community   Contribute

RFC0032

Extends:
RFC0010: Extension of Tool Interaction Support

Title

Extension of PMIx Logging Support

Abstract

The PMIx_Log API supports requests for logging information to a variety of channels. Initial support was only for stdout/stderr – this RFC extends that support to additional channels such as syslog and smtp.

Labels

[EXTENSION][ATTRIBUTES]

Action

[APPROVED]

Copyright (c) 2018 Intel, Inc. All rights reserved.

This document is subject to all provisions relating to code contributions to the PMIx community as defined in the community’s LICENSE file. Code Components extracted from this document must include the License text as described in that file.

Description

RFC0010 added a new PMIx_Log_nb API for requesting logging of provided data in some global data store or to standard output locations (e.g., stderr or syslog), subject to available services from the host environment. A limited number of initial keys were provided by which the logging request could direct the data to specific channels on an as-available basis, and the host RM interface was extended through the addition of a “log” interface by which client requests that are not directly supported by the PMIx library can be passed to the host RM for handling – e.g., when the logging data is directed to the stderr or stdout channels.

The illustration below highlights some of the capabilities provided by the extended logging support. Note that the illustration is not intended to be comprehensive in its coverage as the number of possible use-cases is too large to capture in a single drawing.

Debug stdio Fig

Supported uses include:

Note that system libraries can also use this feature to record job-affecting events (e.g., network failures) that might have impacted the application during its execution, perhaps linking them to more detailed information stored in a RAS database.

Note that data can be “logged” without specifying the output channel. In this case, the PMIx library will default to logging a copy of the data to each available channel (in the reference implementation, this is done by MCA parameter in addition to the usual build/configuration constraints). The caller can optionally control the logging behavior by providing multiple channel attributes in order of desired priority, subject to the availability of the specified channel. For example, an application could ask that data be emailed to a given user, or logged to a global syslog, or logged to local syslog by specifying first the PMIX_LOG_EMAIL attribute, followed by the PMIX_LOG_GLOBAL_SYSLOG and the PMIX_LOG_LOCAL_SYSLOG attributes, with the PMIX_LOG_ONCE attribute being included to indicate that only one log channel should be used. If PMIX_LOG_ONCE is not indicated, then the data will be logged to all three channels. In this case:

This provides flexibility with minimal code complexity when operating in multiple environments that support differing output channels.

Logging attributes can also utilize the “required” flag in the pmix_info_t structure to indicate that the data must be logged via the specified channel. If given, failure to complete the operation on that channel will result in return of the PMIX_ERR_OPERATION_FAILED error. Otherwise, use of a given channel is considered “optional” and errors are reported according to the above rules.

Specifying a prioritized list of logging channels on each call to PMIx_Log can impact the performance of the API itself as it requires the PMIx library to scan available channels to create an ordered list, and this might in turn require multiple passes over the available options. The PMIx reference library provides an MCA parameter to help reduce this impact. A user can control the default order of channel delivery by setting the “plog_base_order” MCA parameter to a comma-delimited, prioritized list of channel names based on the corresponding attribute by extracting the characters following “PMIX_LOG_”, as follows:

and so on. Marking a given channel in the list as “required” can be done by adding “:req” to the channel name – e.g., plog_base_order = “local_syslog:req,global_syslog,email”. Parsing of this MCA parameter is case-insensitive, and the parser will accept any “required” flag that starts with “req” – e.g., “reqd” and “required”. Similarly, the PMIX_LOG_ONCE attribute can be set by default using the “plog_base_log_once” MCA parameter. Note that this is specific to the PMIx reference library and is not part of the standard – users are advised to check their local implementation for similar support.

Channels that are not recognized by the PMIx library will automatically be directed to the host RM for processing. This allows for RM-proprietary channel support without committing those channel names to the PMIx Standard.

Proposed Changes

The proposed changes covered by this RFC are organized into four categories: introduction of a new server “type”, API additions, new attributes, and clarification of behavior expected from prior attributes.

Gateway Servers

This RFC introduces the concept of a “gateway” server – i.e., a server designated at time of PMIx_server_init using the new PMIX_SERVER_GATEWAY attribute. Gateway servers act as routers for requests that cannot be serviced on backend nodes. Typical examples include logging to email or syslog on a central system management server.

#define PMIX_SERVER_GATEWAY                 "pmix.srv.gway"         // (bool) Server is acting as a gateway for PMIx requests
                                                                    //        that cannot be serviced on backend nodes
                                                                    //        (e.g., logging to email)

API Changes

One new API, a blocking form of the earlier PMIx_Log_nb API, is proposed:

PMIX_EXPORT pmix_status_t PMIx_Log(const pmix_info_t data[], size_t ndata,
                                   const pmix_info_t directives[], size_t ndirs);

In addition, a comment in pmix_server.h was modified to clarify the required behavior of the pmix_server_log_fn_t server module function:

/* Log data on behalf of a client. Calls to the host thru this
 * function must _NOT_ call the PMIx_Log API as this can
 * trigger an infinite loop. Instead, the implementation must
 * perform one of three operations:
 *
 * (a) transfer the data+directives to a "gateway" server
 *     where they can be logged. Gateways are designated
 *     servers on nodes (typically service nodes) where
 *     centralized logging is supported. The data+directives
 *     may be passed to the PMIx_Log API once arriving at
 *     that destination.
 *
 * (b) transfer the data to a logging channel outside of
 *     PMIx, but directly supported by the host
 *
 * (c) return an error to the caller indicating that the
 *     requested action is not supported
 */

New Attributes

New attributes covered by this RFC are shown below along with their intended usage as data vs directives. Note that data attributes specify a channel and the information to be logged via that channel, while directives affect the behavior of the API itself:

In addition, a query attribute was added by which a user can request the list of available channels:

#define PMIX_QUERY_LOG_CHANNELS      "pmix.qry.lchans"      // (char*) comma-delimited list of available logging channels

Clarifications

The definition and usage of several existing attributes have been clarified for this RFC. Updated definitions include:

Also clarified are existing attributes to be used in the data array versus the directives array of the PMIx_Log APIs. A list of data attributes includes:

The list of directives includes:

Advisories


Advice to users:
The available channel support on a system can be queried via the PMIx_Query_nb API using the PMIX_QUERY_LOG_CHANNELS key should the application developer wish to tailor their code accordingly – this will always report the channels directly supported by the PMIx library. However, channels supported by the host RM will be included only if the RM itself supports such queries.

The PMIx_Log API should never be used for streaming data as it is not a “performant” transport and can perturb the application since it involves the local PMIx server and host RM daemon.

There is some ambiguity regarding what information is provided in the “data” parameter vs the “directives” parameter when calling PMIx_Log. In general, information that is to be included in the log should be provided in the “data” array. Examples would include:

PMIX_INFO_LOAD(&data[0], PMIX_LOG_STDOUT, "my message string", PMIX_STRING);
PMIX_INFO_LOAD(&data[1], PMIX_LOG_STDERR, "my error message", PMIX_STRING);

Note that in the above examples, the attribute key defines the channel to be used for the provided data.

On the other hand, “directives” should be used to request behaviors of the PMIx_Log function. Examples might include:

PMIX_INFO_LOAD(&directives[0], PMIX_LOG_GENERATE_TIMESTAMP, NULL, PMIX_BOOL);
PMIX_INFO_LOAD(&directives[1], PMIX_LOG_XML_OUTPUT, NULL, PMIX_BOOL);
PMIX_INFO_REQUIRED(&directives[1]);

The first example instructs PMIx_Log to generate a timestamp indicating when the log message was created, and to include that in the log. The second example requires that the log be written in XML format – an error must be returned if XML output support is not available.


Advice to implementers:
Calls relayed to the host thru the pmix_server_log_fn_t function provided in the pmix_server_module must NOT call the PMIx_Log API as this can result in an infinite loop. Instead, the implementation must perform one of three operations: * transfer the data+directives to a “gateway” server where they can be logged. Gateways are designated servers on nodes (typically service nodes) where centralized logging is supported (e.g., writing to a global syslog). The data+directives may be passed to the PMIx_Log API once arriving at that destination. If the PMIx server cannot support the provided request, it will call up to the host for final disposition – this is why the host must not call back into the server.

A complete PMIx implementation shall respond to PMIx_Query_info_nb calls requesting logging channels supported by the RM with a comma-delimited string of channel keys – e.g., “pmix.log.lsys,pmix.log.email”, indicating that PMIX_LOG_LOCAL_SYSLOG and PMIX_LOG_EMAIL are supported by including the value of those keys in the returned string.


Protoype Implementation

The prototype implementation of the extended logging support is provided in the topic/log branch of the PMIx project repository, and the topic/log branch of the PMIx Reference RTE.

Author(s)

Ralph H. Castain
Intel, Inc.
Github: rhc54