You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
159 lines
3.5 KiB
159 lines
3.5 KiB
# Snakemake: A General Introduction
|
|
|
|
|
|
Main-website:
|
|
https://snakemake.github.io/
|
|
|
|
Snakemake Documentation
|
|
https://snakemake.readthedocs.io/en/stable/
|
|
|
|
Tutorials:
|
|
|
|
* Official tutorial (complete steps)
|
|
https://snakemake.readthedocs.io/en/stable/tutorial/tutorial.html#tutorial
|
|
|
|
- Basic tutorial:
|
|
https://snakemake.readthedocs.io/en/stable/tutorial/basics.html#tutorial-basics
|
|
|
|
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~`
|
|
|
|
|
|
|
|
Standalone
|
|
|
|
|
|
This is the easiest way to run it, you will be allocated to a single node, and all resource is allocated to you, there is no cross-node parallelization in this way, you can use job script below:
|
|
|
|
|
|
|
|
#!/bin/bash
|
|
|
|
|
|
|
|
#SBATCH --exclusive
|
|
|
|
|
|
|
|
enable_lmod
|
|
|
|
module load container_env snakemake
|
|
|
|
|
|
|
|
crun snakemake
|
|
|
|
|
|
|
|
Break rules to their individual Slurm Jobs
|
|
|
|
|
|
This is my recommended way of running this package, it utilizes resource most effectively, however it is a little bit more complicated, so I have written some wrapper script to do most of legging work, you can just launch it like this:
|
|
|
|
|
|
|
|
#!/bin/bash
|
|
|
|
|
|
|
|
#SBATCH -c 2
|
|
|
|
|
|
|
|
enable_lmod
|
|
|
|
module load container_env snakemake
|
|
|
|
|
|
|
|
snakemake.helper -j 10
|
|
|
|
|
|
|
|
Please note that this script should only take 1 or 2 core to run, it is a master script, it does not do the real work, additional jobs will be launched by this job.
|
|
|
|
Instead of run "crun snakemake" directly, you run "snakemake.helper" instead, it's the helper script I mentioned, it will launch job within the cluster for you.
|
|
|
|
|
|
|
|
For cluster resource, I only enforce "threads" in snake rules, any other resource (mem,disk,….) might or might not be enforced by snakemake (I am not sure), at least I will not enforce it on the scheduler level.
|
|
|
|
|
|
|
|
rule map_reads:
|
|
|
|
input:
|
|
|
|
"data/genome.fa",
|
|
|
|
"data/samples/{sample}.fastq"
|
|
|
|
output:
|
|
|
|
"results/mapped/{sample}.bam"
|
|
|
|
threads: 2
|
|
|
|
shell:
|
|
|
|
"bwa mem {input} | samtools view -b - > {output}"
|
|
|
|
|
|
|
|
When setting threads in a rule, I will launch Slurm job with same "--cpus-per-task" configuration to match it. When not explicitly set, it will always be single thread.
|
|
|
|
|
|
|
|
My snakemake module also support MPI mode, but from what I observe your tasks usually involving running some code on a lot of input, instead of running a single multiple node code on a single input, so mode 2 should be most useful to you. If you do need to run MPI somehow, please let me know it will require some additional setup, especially when combined with "--use-conda" .
|
|
|
|
|
|
|
|
Snakemake conda is supported and tested, you should be able to install any conda package you want to. Unless it requires MPI, it usually works. When using conda, please make sure environment file is given and launch snakemake with "—use-conda":
|
|
|
|
|
|
|
|
# Snakefile
|
|
|
|
|
|
|
|
rule map_reads:
|
|
|
|
input:
|
|
|
|
"data/genome.fa",
|
|
|
|
"data/samples/{sample}.fastq"
|
|
|
|
output:
|
|
|
|
"results/mapped/{sample}.bam"
|
|
|
|
threads:
|
|
|
|
2
|
|
|
|
conda:
|
|
|
|
"envs/mapping.yaml"
|
|
|
|
shell:
|
|
|
|
"bwa mem {input} | samtools view -b - > {output}"
|
|
|
|
|
|
|
|
# job_script.sh
|
|
|
|
|
|
|
|
crun snakemake --use-conda # for standalone
|
|
|
|
snakemake.helper -j 10 --use-conda # for running in scheduler mode
|
|
|
|
|
|
|
|
Running other container modules is also possible, please just let me know what you need. If you can install what you need with conda, please use conda first since you can do it yourself.
|
|
|
|
|
|
|
|
You can find my sample job script in /home/jsun/snakemake , please let me know if you have any questions or issues.
|
|
|