Skip to content

slurm

Improved interaction with the SLURM ecosystem.

This module includes: functions to separate labelled SLURM logs.

Functions

separate_labelled_log

separate_labelled_log(
    fp: TextIO, remove_trailing_whitespaces: bool = False
) -> dict[int | str, list[str]]

Seperate SLURM labelled log by process IDs.

Almost each line in the SLURM labelled log starts with the process number followed by a colon:

22:  USING DEFAULTS : area_radius1 =   3360.00000000000
26:  USING DEFAULTS : area_radius1 =   3360.00000000000
 0:  USING DEFAULTS : area_radius1 =   3360.00000000000
 0:  GETIN area_radius1 =    3360.00000000000
12:  USING DEFAULTS : area_rotation_pre =  0.000000000000000E+000
12:  USING DEFAULTS : area_rotation =  0.000000000000000E+000

This function reads such log file and groups lines coming from the same process. If a line doesn't have the label, which can happen when it is an error coming from srun or sbatch, it is placed under the slurm key.

This function might take a lot of memory for big logs, because it loads all the processed lines into one dictionary. Consider using separate_labelled_log_to_files() to separate files into files on-the-fly.

Parameters:

  • fp

    (TextIO) –

    text I/O stream with the log's content

  • remove_trailing_whitespaces

    (bool, default: False ) –

    remove trailing whitespaces and empty lines

Returns: dict: labelled log lines grouped by processes.

Source code in ipsl_common/slurm.py
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
def separate_labelled_log(
    fp: TextIO, remove_trailing_whitespaces: bool = False
) -> dict[int | str, list[str]]:
    """
    Seperate SLURM labelled log by process IDs.

    Almost each line in the SLURM labelled log starts with the process number
    followed by a colon:

    ```
    22:  USING DEFAULTS : area_radius1 =   3360.00000000000
    26:  USING DEFAULTS : area_radius1 =   3360.00000000000
     0:  USING DEFAULTS : area_radius1 =   3360.00000000000
     0:  GETIN area_radius1 =    3360.00000000000
    12:  USING DEFAULTS : area_rotation_pre =  0.000000000000000E+000
    12:  USING DEFAULTS : area_rotation =  0.000000000000000E+000
    ```

    This function reads such log file and groups lines coming from the same process.
    If a line doesn't have the label, which can happen when it is an error coming
    from `srun` or `sbatch`, it is placed under the `slurm` key.

    This function might take a lot of memory for big logs, because it loads all
    the processed lines into one dictionary. Consider using
    `separate_labelled_log_to_files()` to separate files into files on-the-fly.

    Args:
        fp: text I/O stream with the log's content
        remove_trailing_whitespaces: remove trailing whitespaces and empty lines
    Returns:
        dict: labelled log lines grouped by processes.
    """
    processed_log = {}

    for line in fp:
        # Group only process-laballed lines
        if m := _LABELLED_LOG_LINE.search(line):
            process_id = int(m.group(1))
            process_msg = m.group(2)
            if process_id not in processed_log:
                processed_log[process_id] = []
            if remove_trailing_whitespaces:
                process_msg = process_msg.rstrip()
                if not process_msg:
                    continue
            processed_log[process_id].append(process_msg)
        # Other, append under the `slurm` key
        else:
            if "slurm" not in processed_log:
                processed_log["slurm"] = []
            if remove_trailing_whitespaces:
                line = line.rstrip()
                if not line:
                    continue
            processed_log["slurm"].append(line.rstrip("\n"))
    return processed_log

separate_labelled_log_to_files

separate_labelled_log_to_files(
    fp: TextIO,
    output_dir: Path,
    output_name: str,
    remove_trailing_whitespaces: bool = False,
) -> None

Seperate SLURM labelled log by process IDs, on-the-fly, into dedicated log files.

This function will not load the log content into memory like separate_labelled_log() does. Instead, it will process it line-by-line and write to dedicated log files. Use this function for big logs.

Parameters:

  • fp

    (TextIO) –

    text I/O stream with the log's content

  • output_dir

    (Path) –

    directory where dedicated logs will be placed (must exist)

  • output_name

    (str) –

    the name of the output file (must contain {process_id} as a placeholder of the process ID)

  • remove_trailing_whitespaces

    (bool, default: False ) –

    remove trailing whitespaces and empty lines

Source code in ipsl_common/slurm.py
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
def separate_labelled_log_to_files(
    fp: TextIO,
    output_dir: Path,
    output_name: str,
    remove_trailing_whitespaces: bool = False,
) -> None:
    """
    Seperate SLURM labelled log by process IDs, on-the-fly, into dedicated log files.

    This function will not load the log content into memory like `separate_labelled_log()` does.
    Instead, it will process it line-by-line and write to dedicated log files.
    Use this function for big logs.

    Args:
        fp: text I/O stream with the log's content
        output_dir: directory where dedicated logs will be placed (must exist)
        output_name: the name of the output file (must contain `{process_id}` as a placeholder of the process ID)
        remove_trailing_whitespaces: remove trailing whitespaces and empty lines
    """
    output_dir = Path(output_dir)
    if not output_dir.exists():
        raise ValueError("The `output_dir` must exist")
    if not output_dir.is_dir():
        raise ValueError("The `output_dir` must be a directory")
    if "{process_id}" not in output_name:
        raise ValueError(
            "The `output_name` must contain `{process_id}` placeholder for the process ID."
        )
    output_files = {}
    for line in fp:
        # Write lines to per-process output files
        if m := _LABELLED_LOG_LINE.search(line):
            process_id = int(m.group(1))
            process_msg = m.group(2)
            if process_id not in output_files:
                output_files[process_id] = open(
                    output_dir / output_name.format(process_id=process_id), "w"
                )
            if remove_trailing_whitespaces:
                process_msg = process_msg.rstrip()
                if not process_msg:
                    continue
            output_files[process_id].write(f"{process_msg}\n")
        # Other, write to the `slurm` file
        else:
            if "slurm" not in output_files:
                output_files["slurm"] = open(
                    output_dir / output_name.format(process_id="slurm"), "w"
                )
            if remove_trailing_whitespaces:
                line = line.rstrip()
                if not line:
                    continue
            output_files["slurm"].write(line)

    for f in output_files.values():
        f.close()