slurm

Improved interaction with the SLURM ecosystem.

This module includes: functions to separate labelled SLURM logs.

Functions

separate_labelled_log

separate_labelled_log(
    fp: TextIO, remove_trailing_whitespaces: bool = False
) -> dict[int | str, list[str]]

Seperate SLURM labelled log by process IDs.

Almost each line in the SLURM labelled log starts with the process number followed by a colon:

22:  USING DEFAULTS : area_radius1 =   3360.00000000000
26:  USING DEFAULTS : area_radius1 =   3360.00000000000
 0:  USING DEFAULTS : area_radius1 =   3360.00000000000
 0:  GETIN area_radius1 =    3360.00000000000
12:  USING DEFAULTS : area_rotation_pre =  0.000000000000000E+000
12:  USING DEFAULTS : area_rotation =  0.000000000000000E+000

This function reads such log file and groups lines coming from the same process. If a line doesn't have the label, which can happen when it is an error coming from srun or sbatch, it is placed under the slurm key.

This function might take a lot of memory for big logs, because it loads all the processed lines into one dictionary. Consider using separate_labelled_log_to_files() to separate files into files on-the-fly.

Parameters:

fp
(TextIO) –

text I/O stream with the log's content
remove_trailing_whitespaces
(bool, default: False ) –

remove trailing whitespaces and empty lines

Returns: dict: labelled log lines grouped by processes.

Source code in ipsl_common/slurm.py

def separate_labelled_log(
    fp: TextIO, remove_trailing_whitespaces: bool = False
) -> dict[int | str, list[str]]:
    """
    Seperate SLURM labelled log by process IDs.

    Almost each line in the SLURM labelled log starts with the process number
    followed by a colon:

    ```
    22:  USING DEFAULTS : area_radius1 =   3360.00000000000
    26:  USING DEFAULTS : area_radius1 =   3360.00000000000
     0:  USING DEFAULTS : area_radius1 =   3360.00000000000
     0:  GETIN area_radius1 =    3360.00000000000
    12:  USING DEFAULTS : area_rotation_pre =  0.000000000000000E+000
    12:  USING DEFAULTS : area_rotation =  0.000000000000000E+000
    ```

    This function reads such log file and groups lines coming from the same process.
    If a line doesn't have the label, which can happen when it is an error coming
    from `srun` or `sbatch`, it is placed under the `slurm` key.

    This function might take a lot of memory for big logs, because it loads all
    the processed lines into one dictionary. Consider using
    `separate_labelled_log_to_files()` to separate files into files on-the-fly.

    Args:
        fp: text I/O stream with the log's content
        remove_trailing_whitespaces: remove trailing whitespaces and empty lines
    Returns:
        dict: labelled log lines grouped by processes.
    """
    processed_log = {}

    for line in fp:
        # Group only process-laballed lines
        if m := _LABELLED_LOG_LINE.search(line):
            process_id = int(m.group(1))
            process_msg = m.group(2)
            if process_id not in processed_log:
                processed_log[process_id] = []
            if remove_trailing_whitespaces:
                process_msg = process_msg.rstrip()
                if not process_msg:
                    continue
            processed_log[process_id].append(process_msg)
        # Other, append under the `slurm` key
        else:
            if "slurm" not in processed_log:
                processed_log["slurm"] = []
            if remove_trailing_whitespaces:
                line = line.rstrip()
                if not line:
                    continue
            processed_log["slurm"].append(line.rstrip("\n"))
    return processed_log

separate_labelled_log_to_files

separate_labelled_log_to_files(
    fp: TextIO,
    output_dir: Path,
    output_name: str,
    remove_trailing_whitespaces: bool = False,
) -> None

Seperate SLURM labelled log by process IDs, on-the-fly, into dedicated log files.

This function will not load the log content into memory like separate_labelled_log() does. Instead, it will process it line-by-line and write to dedicated log files. Use this function for big logs.

Parameters:

fp
(TextIO) –

text I/O stream with the log's content
output_dir
(Path) –

directory where dedicated logs will be placed (must exist)
output_name
(str) –

the name of the output file (must contain {process_id} as a placeholder of the process ID)
remove_trailing_whitespaces
(bool, default: False ) –

remove trailing whitespaces and empty lines

Source code in ipsl_common/slurm.py

def separate_labelled_log_to_files(
    fp: TextIO,
    output_dir: Path,
    output_name: str,
    remove_trailing_whitespaces: bool = False,
) -> None:
    """
    Seperate SLURM labelled log by process IDs, on-the-fly, into dedicated log files.

    This function will not load the log content into memory like `separate_labelled_log()` does.
    Instead, it will process it line-by-line and write to dedicated log files.
    Use this function for big logs.

    Args:
        fp: text I/O stream with the log's content
        output_dir: directory where dedicated logs will be placed (must exist)
        output_name: the name of the output file (must contain `{process_id}` as a placeholder of the process ID)
        remove_trailing_whitespaces: remove trailing whitespaces and empty lines
    """
    output_dir = Path(output_dir)
    if not output_dir.exists():
        raise ValueError("The `output_dir` must exist")
    if not output_dir.is_dir():
        raise ValueError("The `output_dir` must be a directory")
    if "{process_id}" not in output_name:
        raise ValueError(
            "The `output_name` must contain `{process_id}` placeholder for the process ID."
        )
    output_files = {}
    for line in fp:
        # Write lines to per-process output files
        if m := _LABELLED_LOG_LINE.search(line):
            process_id = int(m.group(1))
            process_msg = m.group(2)
            if process_id not in output_files:
                output_files[process_id] = open(
                    output_dir / output_name.format(process_id=process_id), "w"
                )
            if remove_trailing_whitespaces:
                process_msg = process_msg.rstrip()
                if not process_msg:
                    continue
            output_files[process_id].write(f"{process_msg}\n")
        # Other, write to the `slurm` file
        else:
            if "slurm" not in output_files:
                output_files["slurm"] = open(
                    output_dir / output_name.format(process_id="slurm"), "w"
                )
            if remove_trailing_whitespaces:
                line = line.rstrip()
                if not line:
                    continue
            output_files["slurm"].write(line)

    for f in output_files.values():
        f.close()

slurm

Functions

separate_labelled_log

`fp`

`remove_trailing_whitespaces`

separate_labelled_log_to_files

`fp`

`output_dir`

`output_name`

`remove_trailing_whitespaces`