|
|
|
@ -1,7 +1,10 @@ |
|
|
|
|
SLURM ACCOUNTING (sacct) |
|
|
|
|
======================== |
|
|
|
|
|
|
|
|
|
CAVEAT: |
|
|
|
|
Created: April 2019<br> |
|
|
|
|
Updated: November 2019 |
|
|
|
|
|
|
|
|
|
**CAVEAT:** |
|
|
|
|
This document was originally developed by referencing SLURM |
|
|
|
|
18.08.1 used on Turing. |
|
|
|
|
I also tried to consult the newer version (master branch |
|
|
|
@ -11,6 +14,12 @@ incompatible with this version. |
|
|
|
|
Please use a grain of salt when reading, and always consult with |
|
|
|
|
manual pages, source code, etc in case of doubt. |
|
|
|
|
|
|
|
|
|
*Update 2019-11-06*: |
|
|
|
|
SLURM man page now contains the description of the accounting fields. |
|
|
|
|
Please look at |
|
|
|
|
<https://slurm.schedmd.com/sacct.html#lbAF> . |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
UNDERSTANDING SLURM ACCOUNTING FIELDS |
|
|
|
|
------------------------------------- |
|
|
|
@ -21,7 +30,10 @@ SLURM accounting can produce very many fields. |
|
|
|
|
The "cooked" job ID. Please see the discussion below. |
|
|
|
|
|
|
|
|
|
`JobIDRaw`: |
|
|
|
|
The "raw" job ID. Please see the discussion below. |
|
|
|
|
The "raw" job ID. |
|
|
|
|
In a vast majority of cases, the `JobIDRaw` field is identical to `JobID` |
|
|
|
|
except in the case of array jobs. |
|
|
|
|
Please see the discussion below. |
|
|
|
|
|
|
|
|
|
`TimelimitRaw`: |
|
|
|
|
The raw value of time limit, in minutes. |
|
|
|
@ -76,7 +88,7 @@ A `JOBSTEP` can have several subtypes: |
|
|
|
|
* `SLURM_BATCH_SCRIPT`, in which case JobIDRaw will obtain the `.batch` suffix. |
|
|
|
|
* `SLURM_EXTERN_CONT`, in which case JobIDRaw will obtain the `.extern` suffix. |
|
|
|
|
Apparently, this is meant to indicate "external" type of job steps, |
|
|
|
|
including. |
|
|
|
|
described further below. |
|
|
|
|
* many others; but in this case, it will print JobIDRaw in `[0-9]+\.[0-9]+` |
|
|
|
|
pattern |
|
|
|
|
* Other types (usually it will have index numbers like 0, 1, 2, ...) |
|
|
|
@ -87,7 +99,7 @@ A `JOBSTEP` can have several subtypes: |
|
|
|
|
A "vanilla" job entry corresponds to a single job submitted by a user to SLURM. |
|
|
|
|
This will not be a job array. |
|
|
|
|
|
|
|
|
|
* Characteristics : `JobID ~ /^[0-9]+$/`. |
|
|
|
|
* Regexp match : `JobID ~ /^[0-9]+$/`. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
#### Array Job |
|
|
|
@ -95,7 +107,7 @@ This will not be a job array. |
|
|
|
|
An "array" job entry corresponds to a single job as part of a job |
|
|
|
|
array submitted by a user to SLURM. |
|
|
|
|
|
|
|
|
|
* Characteristics : `JobID ~ /^[0-9]+_[0-9]+$/`. |
|
|
|
|
* Regexp match : `JobID ~ /^[0-9]+_[0-9]+$/`. |
|
|
|
|
|
|
|
|
|
The Job ID contains two numbers separated by an underscore. |
|
|
|
|
The number before the underscore refers to the job ID as reported by |
|
|
|
@ -114,7 +126,7 @@ square brackets around the job suffix: |
|
|
|
|
A heterogenous job entry corresponds to a part of a heterogenous job |
|
|
|
|
submitted by a user to SLURM. |
|
|
|
|
|
|
|
|
|
* Characteristics : `JobID ~ /^[0-9]+\+[0-9]+$/`. |
|
|
|
|
* Regexp match: `JobID ~ /^[0-9]+\+[0-9]+$/`. |
|
|
|
|
|
|
|
|
|
The Job ID contains two numbers separated by a plus sign. |
|
|
|
|
The number before the underscore refers to the job ID as reported by |
|
|
|
@ -130,7 +142,7 @@ sbatch) when more than one CPU cores were requested by the job. |
|
|
|
|
|
|
|
|
|
Characteristics of SLURM_BATCH_SCRIPT accounting records: |
|
|
|
|
|
|
|
|
|
* JobIDRaw =~ /^[0-9]+\.batch$/ |
|
|
|
|
* Regexp match: `JobIDRaw ~ /^[0-9]+\.batch$/` |
|
|
|
|
|
|
|
|
|
* The record does NOT have user ID (field `User`) |
|
|
|
|
|
|
|
|
@ -177,7 +189,7 @@ that may be only when a specific "job completion" task is specified. |
|
|
|
|
#### Questions & (Possible) Answers |
|
|
|
|
|
|
|
|
|
* Why there is a separate "NNNNN.batch" record? |
|
|
|
|
It is perhaps when the job is multi-node. |
|
|
|
|
Perhaps, this record was made when the job is multi-node. |
|
|
|
|
It appears to me that the ".batch" record is for accounting the batch script |
|
|
|
|
itself (which will run only on node #0 of the allocated resources). |
|
|
|
|
|
|
|
|
@ -192,7 +204,7 @@ This is what I found after this exploration: |
|
|
|
|
|
|
|
|
|
> We only need to include accounting records where the `JobIDRaw` field |
|
|
|
|
> contains only whole integers (i.e. matching regex `^[0-9]+$`). |
|
|
|
|
|
|
|
|
|
> Further, |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## References |
|
|
|
|