* Minor updates Nov 2019.

master
Wirawan Purwanto 5 years ago
parent 0d8b081ac7
commit 837e1c7d9f
  1. 30
      slurm/20190411.Slurm-accounting.md

@ -1,7 +1,10 @@
SLURM ACCOUNTING (sacct) SLURM ACCOUNTING (sacct)
======================== ========================
CAVEAT: Created: April 2019<br>
Updated: November 2019
**CAVEAT:**
This document was originally developed by referencing SLURM This document was originally developed by referencing SLURM
18.08.1 used on Turing. 18.08.1 used on Turing.
I also tried to consult the newer version (master branch I also tried to consult the newer version (master branch
@ -11,6 +14,12 @@ incompatible with this version.
Please use a grain of salt when reading, and always consult with Please use a grain of salt when reading, and always consult with
manual pages, source code, etc in case of doubt. manual pages, source code, etc in case of doubt.
*Update 2019-11-06*:
SLURM man page now contains the description of the accounting fields.
Please look at
<https://slurm.schedmd.com/sacct.html#lbAF> .
UNDERSTANDING SLURM ACCOUNTING FIELDS UNDERSTANDING SLURM ACCOUNTING FIELDS
------------------------------------- -------------------------------------
@ -21,7 +30,10 @@ SLURM accounting can produce very many fields.
The "cooked" job ID. Please see the discussion below. The "cooked" job ID. Please see the discussion below.
`JobIDRaw`: `JobIDRaw`:
The "raw" job ID. Please see the discussion below. The "raw" job ID.
In a vast majority of cases, the `JobIDRaw` field is identical to `JobID`
except in the case of array jobs.
Please see the discussion below.
`TimelimitRaw`: `TimelimitRaw`:
The raw value of time limit, in minutes. The raw value of time limit, in minutes.
@ -76,7 +88,7 @@ A `JOBSTEP` can have several subtypes:
* `SLURM_BATCH_SCRIPT`, in which case JobIDRaw will obtain the `.batch` suffix. * `SLURM_BATCH_SCRIPT`, in which case JobIDRaw will obtain the `.batch` suffix.
* `SLURM_EXTERN_CONT`, in which case JobIDRaw will obtain the `.extern` suffix. * `SLURM_EXTERN_CONT`, in which case JobIDRaw will obtain the `.extern` suffix.
Apparently, this is meant to indicate "external" type of job steps, Apparently, this is meant to indicate "external" type of job steps,
including. described further below.
* many others; but in this case, it will print JobIDRaw in `[0-9]+\.[0-9]+` * many others; but in this case, it will print JobIDRaw in `[0-9]+\.[0-9]+`
pattern pattern
* Other types (usually it will have index numbers like 0, 1, 2, ...) * Other types (usually it will have index numbers like 0, 1, 2, ...)
@ -87,7 +99,7 @@ A `JOBSTEP` can have several subtypes:
A "vanilla" job entry corresponds to a single job submitted by a user to SLURM. A "vanilla" job entry corresponds to a single job submitted by a user to SLURM.
This will not be a job array. This will not be a job array.
* Characteristics : `JobID ~ /^[0-9]+$/`. * Regexp match : `JobID ~ /^[0-9]+$/`.
#### Array Job #### Array Job
@ -95,7 +107,7 @@ This will not be a job array.
An "array" job entry corresponds to a single job as part of a job An "array" job entry corresponds to a single job as part of a job
array submitted by a user to SLURM. array submitted by a user to SLURM.
* Characteristics : `JobID ~ /^[0-9]+_[0-9]+$/`. * Regexp match : `JobID ~ /^[0-9]+_[0-9]+$/`.
The Job ID contains two numbers separated by an underscore. The Job ID contains two numbers separated by an underscore.
The number before the underscore refers to the job ID as reported by The number before the underscore refers to the job ID as reported by
@ -114,7 +126,7 @@ square brackets around the job suffix:
A heterogenous job entry corresponds to a part of a heterogenous job A heterogenous job entry corresponds to a part of a heterogenous job
submitted by a user to SLURM. submitted by a user to SLURM.
* Characteristics : `JobID ~ /^[0-9]+\+[0-9]+$/`. * Regexp match: `JobID ~ /^[0-9]+\+[0-9]+$/`.
The Job ID contains two numbers separated by a plus sign. The Job ID contains two numbers separated by a plus sign.
The number before the underscore refers to the job ID as reported by The number before the underscore refers to the job ID as reported by
@ -130,7 +142,7 @@ sbatch) when more than one CPU cores were requested by the job.
Characteristics of SLURM_BATCH_SCRIPT accounting records: Characteristics of SLURM_BATCH_SCRIPT accounting records:
* JobIDRaw =~ /^[0-9]+\.batch$/ * Regexp match: `JobIDRaw ~ /^[0-9]+\.batch$/`
* The record does NOT have user ID (field `User`) * The record does NOT have user ID (field `User`)
@ -177,7 +189,7 @@ that may be only when a specific "job completion" task is specified.
#### Questions & (Possible) Answers #### Questions & (Possible) Answers
* Why there is a separate "NNNNN.batch" record? * Why there is a separate "NNNNN.batch" record?
It is perhaps when the job is multi-node. Perhaps, this record was made when the job is multi-node.
It appears to me that the ".batch" record is for accounting the batch script It appears to me that the ".batch" record is for accounting the batch script
itself (which will run only on node #0 of the allocated resources). itself (which will run only on node #0 of the allocated resources).
@ -192,7 +204,7 @@ This is what I found after this exploration:
> We only need to include accounting records where the `JobIDRaw` field > We only need to include accounting records where the `JobIDRaw` field
> contains only whole integers (i.e. matching regex `^[0-9]+$`). > contains only whole integers (i.e. matching regex `^[0-9]+$`).
> Further,
## References ## References

Loading…
Cancel
Save