8.2 KiB
SLURM ACCOUNTING (sacct)
Created: April 2019
Updated: November 2019
CAVEAT: This document was originally developed by referencing SLURM 18.08.1 used on Turing. I also tried to consult the newer version (master branch around July 2019). Newer version may introduce additional features, or features incompatible with this version. Please use a grain of salt when reading, and always consult with manual pages, source code, etc in case of doubt.
Update 2019-11-06: SLURM man page now contains the description of the accounting fields. Please look at https://slurm.schedmd.com/sacct.html#lbAF .
UNDERSTANDING SLURM ACCOUNTING FIELDS
SLURM accounting can produce very many fields.
JobID
:
The "cooked" job ID. Please see the discussion below.
JobIDRaw
:
The "raw" job ID.
In a vast majority of cases, the JobIDRaw
field is identical to JobID
except in the case of array jobs.
Please see the discussion below.
TimelimitRaw
:
The raw value of time limit, in minutes.
About SLURM Job IDs
SLURM produces one or more records in the accounting database for every job. When a user submits a job to SLURM, SLURM assigns that job a unique job number, like this:
$ sbatch calculation.job
Submitted batch job 8918299
However, internally within SLURM, there can be one or more "job steps" created and executed while this job is being launched and executed. (Things get more even complicated with newer "heterogenous job" feature, in which various parts of a job can require very different resources. See this documentation for more information.) The combination of all the job steps constitute the entire job. Each job step generates its own record in the SLURM accounting database.
Summary on Job ID
A single SLURM job will generate the "master record" which logs the
overall execution of the job.
In addition, there can be zero or more extra records generated by the
"job steps" triggered during the course of that job.
The master record includes the resource utilization usage (CPU,
memory, etc) of the child "job steps".
The master job record is characterized by a plain number in the
JobIDRaw
field.
Further, the User
field must not be empty.
The rest of this section goes into greater detail of the various
JobID
's.
Observed Job ID Patterns
Several regex patterns have observed in the JobID field (from Turing accounting):
-
[0-9]+
-
[0-9]+_[0-9]+
(for job arrays) -
[0-9]+\.[0-9]+
-
[0-9]+\.batch
For all cases, the JobIDRaw
is the same as JobID
except in the case of
/[0-9]+_[0-9]+/
, where the JobIDRaw
is a running number [0-9]+
.
This is the case where the submitter specifies an array of jobs.
From the slurm's sacct source code (src/sacct/print.c
) one can find that there
are other patterns too (look for string case PRINT_JOBIDRAW:
).
The key function in print.c
is print_fields
.
In particular look at the lengthy case
statement where it tackles
PRINT_JOBID
and PRINT_JOBIDRAW
cases.
A job can be of different types:
JOB
JOBSTEP
JOBCOMP
A JOBSTEP
can have several subtypes:
SLURM_BATCH_SCRIPT
, in which case JobIDRaw will obtain the.batch
suffix.SLURM_EXTERN_CONT
, in which case JobIDRaw will obtain the.extern
suffix. Apparently, this is meant to indicate "external" type of job steps, described further below.- many others; but in this case, it will print JobIDRaw in
[0-9]+\.[0-9]+
pattern - Other types (usually it will have index numbers like 0, 1, 2, ...)
Vanilla Job
A "vanilla" job entry corresponds to a single job submitted by a user to SLURM. This will not be a job array.
- Regexp match :
JobID ~ /^[0-9]+$/
.
From my observation, only simple single-core jobs that do not involve any MPI or other fancy stuff (no job array, for example) would not generate extra "child records" for job steps in the SLURM accounting database.
However, several job records with this type JobID will have no "User" field set. These are also not vanilla jobs.
Array Job
An "array" job entry corresponds to a single job as part of a job array submitted by a user to SLURM.
- Regexp match :
JobID ~ /^[0-9]+_[0-9]+$/
.
The Job ID contains two numbers separated by an underscore. The number before the underscore refers to the job ID as reported by sbatch upon the submission of the job.
NOTE: Newer version of SLURM will allow textual word instead of numbers to identify one job in an array. Those text-based job label (instead of integer) will be marked by square brackets around the job suffix:
- Characteristics (textual array label):
JobID ~ /^[0-9]+_\[.*\]+$/
.
Heterogenous Job
A heterogenous job entry corresponds to a part of a heterogenous job submitted by a user to SLURM.
- Regexp match:
JobID ~ /^[0-9]+\+[0-9]+$/
.
The Job ID contains two numbers separated by a plus sign. The number before the underscore refers to the job ID as reported by sbatch upon the submission of the job.
This will not be a job array.
Job Step: Batch script
This corresponds to the execution of the batch script (submitted to sbatch) when more than one CPU cores were requested by the job.
Characteristics of SLURM_BATCH_SCRIPT accounting records:
-
Regexp match:
JobIDRaw ~ /^[0-9]+\.batch$/
-
The record does NOT have user ID (field
User
) -
JobName
is alwaysbatch
Job Step: External
SLURM_EXTERN_CONT apparently is a way to account for "external processes". It is still not 100% obvious what this means, but from reading the source code, there are two types of stuff that will fall under this category:
-
Job prologue
-
Direct SSH access into an allocated compute node: in this case, the
pam_adopt_slurm
module will make the determination as to which SLURM job launches the ssh (if any) and attribute the portion of this computation to the calling job.
There were some other steps observed, whose JobIDRaw becomes NNNNN.N
.
I wonder if these "job steps" are due to the calls of "srun" within
the batch script, because the job names are indicative: pw.x
,
pmi_proxy
, etc..
(Example job: 5947279 , Nov 2018.)
Job Step: All the others
These correspond to job steps that were launched by srun
or other
similar mechanism instead the job script.
A prime example is the mpirun
launch, which will record a new job step.
Job Completion
JOBCOMP
appears to mark a job completion.
Not sure if this kind of record appears on Turing accounting;
that may be only when a specific "job completion" task is specified.
Questions & (Possible) Answers
- Why there is a separate "NNNNN.batch" record? Perhaps, this record was made when the job is multi-node. It appears to me that the ".batch" record is for accounting the batch script itself (which will run only on node #0 of the allocated resources).
The Takeaway
Why all this complicated explanation? My original goal was to find the accounting records which covers the whole-job statistics without getting bogged down by the minute details of each job. This is what I found after this exploration:
We only need to include accounting records where the
JobIDRaw
field contains only whole integers (i.e. matching regex^[0-9]+$
). Further,
References
sacct
manual page: https://slurm.schedmd.com/sacct.html
SLURM administrator's documentation contains helpful bits and pieces to decipher the accounting records; unfortunately in themselves they are not sufficient.
-
Accounting: https://slurm.schedmd.com/accounting.html .
-
Job Launch design guide: https://slurm.schedmd.com/job_launch.html .
This guide describes at a high level the processes which occur in order to initiate a job including the daemons and plugins involved in the process. It describes the process of job allocation, step allocation, task launch and job termination.
In SLURM, launching a job is a multistep process. Various "job steps" described this guide eventually make their own entries in the SLURM accounting database.
Working Notes
These are my private working notes:
- daily-notes/2019/20190326.slurm-acct.txt
- daily-notes/2019/20190411.slurm-acct.txt
- daily-notes/2019/20190430.slurm-acct-201811.txt
- docs/kb/turing-slurm/20180106.SLURM-accounting.txt