Merge remote-tracking branch 'odu-git/master'

master
Wirawan Purwanto 9 months ago
commit efa580644c
  1. 17
      R/20230421.R-torch.md
  2. 20
      carpentries/20231204.Overview-of-Carpentries.md
  3. 9
      cloud-platforms/20230413.cloud-ai-platforms.md
  4. 30
      containers/20230828.Basil-container-tools.md
  5. 32
      deep-learning/20231002.Finetuning-LLMs.md
  6. 13
      deep-learning/20231204.PyTorch-install.md
  7. 37
      latex/20230217.bibtex-bibliography.md
  8. 16
      learning-platforms/20240318.grading-tools.md
  9. 17
      nvidia/20230707.NVIDIA-SMI.md
  10. 14
      portals/20230202.NSF-dashboards.md
  11. 80
      python/20230317.python-tutorials-1.md
  12. 25
      python/20230821.python-containers.md
  13. 135
      sci-cfd/20230531.cfl3d-install.md
  14. 159
      workflow/20230123.snakemake-software.md
  15. 248
      workflow/20230124.snakemake-intro.md
  16. 24
      workflow/20231012.workflow-tools.md

@ -0,0 +1,17 @@
Using Torch with R
==================
Reference Literature
--------------------
### Book: "Deep Learning and Scientific Computing with R torch"
- Author: Sigrid Keydana
- Publish-date: 2023-04-01
- Publisher: CRC Press
- Publisher-version: https://doi.org/10.1201/9781003275923
- License: Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License
- Github-source: https://github.com/skeydan/Deep-Learning-and-Scientific-Computing-with-R-torch
- Blog: https://blogs.rstudio.com/ai/posts/2023-04-05-deep-learning-scientific-computing-r-torch/
- Free-web-book: https://skeydan.github.io/Deep-Learning-and-Scientific-Computing-with-R-torch/

@ -0,0 +1,20 @@
Overview of the Carpentries (Software Carpentry, Data Carpentry, ...)
=====================================================================
"Software Carpentry September 2012 Introduction"
https://www.youtube.com/watch?v=hIGweDdrZ20&ab_channel=softwarecarpentry
* Greg Wilson's intro on the essentiality of computing skills for researchers.
* How effective learning works for novice programmers
(short- and long-term memory, chunking, ...).
This seems to be part of a series of videos in this playlist:
"Live Lecture September 2012"
https://www.youtube.com/watch?v=hIGweDdrZ20&list=PLhFTuW7KWApwoo2DHzpWLhA19zenWF1LF&ab_channel=softwarecarpentry
(28 video snippets)

@ -0,0 +1,9 @@
Platforms and Solutions for AI/ML in the Cloud
==============================================
H2O.ai (India-based)
https://h2o.ai/
Solutions / example use cases: https://h2o.ai/solutions/use-case/

@ -0,0 +1,30 @@
Basil: A Tool for Semi-Automatic Containerization, Deployment, and Execution of Scientific Applications on Cloud Computing and Supercomputing Platforms
-------------------------------------------------------------------------------------------------------------------------------------------------------
Project site: https://icompute.us/entry
NSF Award: 2314203
"[CSSI?] Elements: Basil: A Tool for Semi-Automatic Containerization,
Deployment, and Execution of Scientific Applications on Cloud
Computing and Supercomputing Platforms"
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2314203
From the abstract:
> "...this project is developing BASIL - a tool for
> semi-automatically containerizing the scientific applications,
> frameworks, and workflows. This project will deliver BASIL through
> a web portal, as a command-line tool, and through APIs. BASIL has
> a broad applicability across multiple domains of deep societal
> impact such as artificial intelligence, drug discovery, and
> earthquake engineering. By enabling the preservation of valuable
> legacy software and making them usable for several years in
> future, BASIL will save cost and time in software rewriting and
> software installations, and thus contribute towards advancing the
> prosperity of the society. The project will result in educational
> content on ?Introduction to Containerization? and students engaged
> in the project will develop valuable skills in the areas of
> national interest such as supercomputing/High Performance
> Computing (HPC) and cloud computing."

@ -0,0 +1,32 @@
Finetuning Large Language Models
================================
"Finetuning Large Language Models"
https://www.deeplearning.ai/short-courses/finetuning-large-language-models/
https://learn.deeplearning.ai/finetuning-large-language-models/lesson/1/introduction
Taught by Sharon Zhou
Free course, registration required.
From the course website:
"Join our new short course, Finetuning Large Language Models! Learn
from Sharon Zhou, Co-Founder and CEO of Lamini, and instructor for the
GANs Specialization and How Diffusion Models Work.
When you complete this course, you will be able to:
* Understand when to apply finetuning on LLMs
* Prepare your data for finetuning
* Train and evaluate an LLM on your data
With finetuning, you're able to take your own data to train the model
on it, and update the weights of the neural nets in the LLM, changing
the model compared to other methods like prompt engineering and
Retrieval Augmented Generation. Finetuning allows the model to learn
style, form, and can update the model with new knowledge to improve
results."

@ -0,0 +1,13 @@
PyTorch Installation
====================
Selecting PyTorch target (CPU/GPU) for conda install
----------------------------------------------------
See: https://pytorch.org/get-started/locally/https://pytorch.org/get-started/locally/
There are a selector box in the page above to select which commands to
use to install Torch.

@ -0,0 +1,37 @@
Tools and Utilities for BibTeX Citations
========================================
[2023-02-17]
Utilities for Generating BibTeX Citations
-----------------------------------------
When a lot of citations are involved, this becomes a troublesome task.
Having some tools would make our lives easier.
### Converting from text-plain citations to BibTeX entries
https://text2bib.economics.utoronto.ca/
From the site:
> This site converts a list of references in a wide range of styles to BibTeX.
> Minimal requirements for input file:
>
> * Either references are separated by blank lines or each line
> is a separate reference or each reference starts with
> `\bibitem{}`, `\bibitem{<label>}`, `\item`, `\bigskip`, or `\smallskip`.
> * Each reference either starts (after possibly one of the separator strings
> mentioned in the previous point) with a list of authors,
> which is followed by either a year or a title, or starts with a year;
> if the authors are followed by a year, the next string is the title.
>
> The conversion is not perfect, but is very good for many reference styles
### Harvesting DOI Metadata as BibTeX Entries
https://www.doi2bib.org/

@ -0,0 +1,16 @@
*Automatic) Grading Tools
=========================
Submitty
--------
https://github.com/Submitty/Submitty
Summary: "Homework Submission, Automated Grading, and TA grading system"
Some capabilities:
* Common multiple-choice answers
* Simple-response answers (text, numeric)
* PDF submission + annotating capability
* Notebook submission (support codes)

@ -0,0 +1,17 @@
Notes on NVIDIA-SMI
===================
NVIDIA-SMI for NVLink query
---------------------------
"Exploring NVIDIA NVLink `nvidia-smi` Commands"
https://www.exxactcorp.com/blog/HPC/exploring-nvidia-nvlink-nvidia-smi-commands
Example commands to consider:
```
$ nvidia-smi nvlink --help
$ nvidia-smi nvlink --status
```

@ -0,0 +1,14 @@
NSF-funded Dashboards & Portals
===============================
"NSF-powered dashboards you should bookmark"
--------------------------------------------
https://beta.nsf.gov/science-matters/nsf-powered-dashboards-you-should-bookmark
2022-11-03 - Jason Bates
## Impact of insects and diseases on forest health
Hosted by Purdue University, the Alien Forest Pest Explorer combines information from multiple sources to show the impact of different forest insects and diseases -- and the potential for further damage. The dashboards, also supported by the U.S. Forest Service, overlay pest data with related data about the status and health of the host tree species in forests and has resolution down to the county level.

@ -0,0 +1,80 @@
Python Tutorials & Short Courses (vol. 1)
=========================================
Mosh's Python for Beginners - Learn Python in 1 Hour
----------------------------------------------------
https://www.youtube.com/watch?v=kqtD5dpn9C8&ab_channel=ProgrammingwithMosh
Length: 1 hr
Key contents:
* Download & install Python
* Download & install PyCharm as Python IDE
* Three basic data types: strings, numbers, boolean (True/False)
* Variables
* Simple input (from terminal)
* String concatenation (`+` operator)
* Type conversion
* Simple calculator program
* String methods
"Python Tutorial - Python Full Course for Beginners"
https://www.youtube.com/watch?v=_uQrJ0TkZlc&ab_channel=ProgrammingwithMosh
TABLE OF CONTENT
```
00:00:00 Introduction
00:01:49 Installing Python 3
00:06:10 Your First Python Program
00:08:11 How Python Code Gets Executed
00:11:24 How Long It Takes To Learn Python
00:13:03 Variables
00:18:21 Receiving Input
00:22:16 Python Cheat Sheet
00:22:46 Type Conversion
00:29:31 Strings
00:37:36 Formatted Strings
00:40:50 String Methods
00:48:33 Arithmetic Operations
00:51:33 Operator Precedence
00:55:04 Math Functions
00:58:17 If Statements
01:06:32 Logical Operators
01:11:25 Comparison Operators
01:16:17 Weight Converter Program
01:20:43 While Loops
01:24:07 Building a Guessing Game
01:30:51 Building the Car Game
01:41:48 For Loops
01:47:46 Nested Loops
01:55:50 Lists
02:01:45 2D Lists
02:05:11 My Complete Python Course
02:06:00 List Methods
02:13:25 Tuples
02:15:34 Unpacking
02:18:21 Dictionaries
02:26:21 Emoji Converter
02:30:31 Functions
02:35:21 Parameters
02:39:24 Keyword Arguments
02:44:45 Return Statement
02:48:55 Creating a Reusable Function
02:53:42 Exceptions
02:59:14 Comments
03:01:46 Classes
03:07:46 Constructors
03:14:41 Inheritance
03:19:33 Modules
03:30:12 Packages
03:36:22 Generating Random Values
03:44:37 Working with Directories
03:50:47 Pypi and Pip
03:55:34 Project 1: Automation with Python
04:10:22 Project 2: Machine Learning with Python
04:58:37 Project 3: Building a Website with Django
```

@ -0,0 +1,25 @@
Python Containers on ODU HPC
============================
This article contains the list of Python containers available on ODU HPC platform for all HPC users.
> The background for the containerized Python can be found on the main documentation page for [Python on HPC Environment](/Software/Python). If
Python containers with TensorFlow and PyTorch
---------------------------------------------
Since TensorFlow and PyTorch have become wildly popular with the adoption of AI/ML into many scientific disciplines, RCS provides a handful of TensorFlow and PyTorch containers for general uses.
Looking for containers with newer Python?
-----------------------------------------
We have a few modules that contain newer versions of Python:
| Module name | Current Python version |
|---------------------------|-------------------------|
| `python3/2023.2-py39` | Intel Python 3.9.16 |
| `python3/2023.2-py310` | Intel Python 3.10.11 |

@ -0,0 +1,135 @@
CFL3D Installation Notes - ODU Wahab Cluster
============================================
About the CFL3D Software
------------------------
* Software home page: https://software.nasa.gov/software/LAR-16003-1
* Git repo: https://github.com/nasa/CFL3D
* Documentation: https://nasa.github.io/CFL3D/
Installation on ODU Cluster
---------------------------
* Base container: intel/2023.0 (ICC + Intel MPI)
* Following build instruction: https://nasa.github.io/CFL3D/Cfl3dv6/cfl3dv6_build.html#make
* Configuration: (This is called "Installation" stage in their lingo.)
From the `build` subfolder, issue: `./Install -noredirect -linux_compiler_flags=Intel`
* Build:
- `make cfl3d_seq`
- `make cfl3d_mpi`
- ... and so on. see the help doc issued by `make` with no target.
Usage Instruction
-----------------
(Initially written for Dr. Adem Ibrahim, 2023-05-31)
Dr. Ibrahim,
Below is an instruction to run CFL3D on our cluster:
The software is currently installed in your home directory at the following path:
~/CFL3D/bin
**Prerequisites for running CFL3D**
This software was built on top of the "intel/2023.0" container,
so the first thing you must do is to invoke the following commands on the shell:
```bash
module load container_env intel/2023.0
```
For serial runs, the main input file MUST be named to `cfl3d.inp`.
Assuming that this input file has existed in the current directory,
you will run the *serial* CFL3D software in this way:
```bash
crun.intel ~/CFL3D/bin/cfl3d_seq
```
There is an MPI (parallel) version of CFL3D, called `cfl3d_mpi`
that has been installed into the the same folder.
This is an example of SLURM job script to run CFL3D in serial (sequential) mode:
```bash
#!/bin/bash
#SBATCH --job-name cfl3d
#SBATCH --ntasks 1
module load container_env intel/2023.0
crun.intel ~/CFL3D/bin/cfl3d_seq
```
CFL3D has a lot of sample calculations located here:
https://nasa.github.io/CFL3D/Cfl3dv6/cfl3dv6_testcases.html
### Demo: Flat Plate Steady Flow
Source: https://nasa.github.io/CFL3D/Cfl3dv6/cfl3dv6_testcases.html#flatplate
Here are the commands I invoked:
```bash
module load container_env intel/2023.0
mkdir -p ~/LIONS/Cfl3dv6/examples
cd ~/LIONS/Cfl3dv6/examples
# download and unpack the input files
wget https://nasa.github.io/CFL3D/Cfl3dv6/2DTestcases/Flatplate/Flatplate.tar.Z
tar xvf Flatplate.tar.Z
cd Flatplate/
# split the input files and generate the unformatted grid file,
# which is grdflat5.bin
crun.intel ~/CFL3D/bin/splitter < split.inp_1blk
# copy the main input file as "cfl3d.inp" before running:
cp grdflat5.inp cfl3d.inp
srun crun.intel ~/CLF3D/bin/cfl3d_seq
```
Files will be unpacked to a subfolder called `Flatplate`,
and this folder is also where the calculation is taking place.
The main output file will go to a file named `cfl3d.out`.
> FIXME: Run in parallel. Still has issue on Wahab.
### Update 2023-06-02
A few notes:
1. The CFL3D software can only run on Wahab as the hardware was new enough
for the instruction sets used in the code.
Please do not run this on Turing as it will quit with an error message.
2. With version 6, there is no more need to recompile CFL3D
every time you want to run with a different physical system (model).
The code now allocates arrays dynamically, so `precfl3d` is not needed anymore.
3. I included the source code in <<#TODO>> directory in case you want to play
around and modify the source code.
4. The code was built to NOT read from stdin. Please do not run in this way:
```
crun.intel ~/CFL3D/bin/cfl3d_seq < MY_INPUT.inp ### WON'T WORK
```
Instead, run it in two steps:
```
cp MY_INPUT.inp cfl3d.inp
crun.intel ~/CFL3D/bin/cfl3d_seq
```

@ -0,0 +1,159 @@
# Snakemake: A General Introduction
Main-website:
https://snakemake.github.io/
Snakemake Documentation
https://snakemake.readthedocs.io/en/stable/
Tutorials:
* Official tutorial (complete steps)
https://snakemake.readthedocs.io/en/stable/tutorial/tutorial.html#tutorial
- Basic tutorial:
https://snakemake.readthedocs.io/en/stable/tutorial/basics.html#tutorial-basics
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~`
Standalone
This is the easiest way to run it, you will be allocated to a single node, and all resource is allocated to you, there is no cross-node parallelization in this way, you can use job script below:
#!/bin/bash
#SBATCH --exclusive
enable_lmod
module load container_env snakemake
crun snakemake
Break rules to their individual Slurm Jobs
This is my recommended way of running this package, it utilizes resource most effectively, however it is a little bit more complicated, so I have written some wrapper script to do most of legging work, you can just launch it like this:
#!/bin/bash
#SBATCH -c 2
enable_lmod
module load container_env snakemake
snakemake.helper -j 10
Please note that this script should only take 1 or 2 core to run, it is a master script, it does not do the real work, additional jobs will be launched by this job.
Instead of run "crun snakemake" directly, you run "snakemake.helper" instead, it's the helper script I mentioned, it will launch job within the cluster for you.
For cluster resource, I only enforce "threads" in snake rules, any other resource (mem,disk,….) might or might not be enforced by snakemake (I am not sure), at least I will not enforce it on the scheduler level.
rule map_reads:
input:
"data/genome.fa",
"data/samples/{sample}.fastq"
output:
"results/mapped/{sample}.bam"
threads: 2
shell:
"bwa mem {input} | samtools view -b - > {output}"
When setting threads in a rule, I will launch Slurm job with same "--cpus-per-task" configuration to match it. When not explicitly set, it will always be single thread.
My snakemake module also support MPI mode, but from what I observe your tasks usually involving running some code on a lot of input, instead of running a single multiple node code on a single input, so mode 2 should be most useful to you. If you do need to run MPI somehow, please let me know it will require some additional setup, especially when combined with "--use-conda" .
Snakemake conda is supported and tested, you should be able to install any conda package you want to. Unless it requires MPI, it usually works. When using conda, please make sure environment file is given and launch snakemake with "—use-conda":
# Snakefile
rule map_reads:
input:
"data/genome.fa",
"data/samples/{sample}.fastq"
output:
"results/mapped/{sample}.bam"
threads:
2
conda:
"envs/mapping.yaml"
shell:
"bwa mem {input} | samtools view -b - > {output}"
# job_script.sh
crun snakemake --use-conda # for standalone
snakemake.helper -j 10 --use-conda # for running in scheduler mode
Running other container modules is also possible, please just let me know what you need. If you can install what you need with conda, please use conda first since you can do it yourself.
You can find my sample job script in /home/jsun/snakemake , please let me know if you have any questions or issues.

@ -0,0 +1,248 @@
# Snakemake: A General Introduction
Snakemake is a workflow tool for managing and executing
interrelated sets of computation/analysis steps.
For more information, please visit
[Snakemake documentation](https://snakemake.readthedocs.io/en/stable/)
## Snakemake in a Nutshell
Imagine that you have a complex computation that consists of a series
of tasks that must be executed in order
(each taking the output from the previous step as its input):
input preprocessing => simulation => output postprocessing
In the example above, a complete computation consists of:
* step 1: input preprocessing of some sort, such as generating
a set of input files from a few parameters;
* step 2: compute-heavy simulation, potentially taking hours or days
on many CPU cores and/or GPUs;
* step 3: output postprocessing, such as calculating some statistics
from the simulation, determining molecular properties from
the simulation, creating a report with graph panels.
If there is only one computation to do, then it's no brainer to do the
three steps "by hand", i.e. creating up to three SLURM job scripts and
running them in sequence.
What if there are 1000 of such computations to perform?
What if the set of steps are complex (e.g. one step requires a few
inputs from different prior computations), or long?
Is there a better way than sitting in front of terminal submitting
1000+ jobs *in a particular order*?
The answer is, **yes!**
This is where we need *workflow tools* such as Snakemake.
In Snakemake, a complete workflow is stored in a specially formatted text
file named `Snakefile`.
Each processing step is expressed as a *rule* inside the snakefile.
Each rule may depend on one or more other rules; the input and output
file declaration determines this dependency rule.
> A more thorough introduction to Snakemake is beyond the scope of this article.
Readers are referred to some articles and tutorials linked at the end
of this article to learn more.
We assume that you have a basic idea of workflow tools in order to use Snakemake.
{.is-info}
## Snakemake on ODU HPC
On Wahab and Turing, there are two modes to use Snakemake:
Snakemake is installed, there are two modes to run this software:
- Cluster (job-based) execution
- Single-node (standalone) execution
We will describe cluster execution first, which is more scalable and powerful.
### Mode 1: Cluster Execution (recommended)
In the [Cluster (job-based) execution](https://snakemake.readthedocs.io/en/stable/executing/cluster.html),
Snakefile will turn each rule into an individual Slurm job and execute
them via the job scheduler.
This will allow the HPC resources to be used most efficiently,
therefore we recommend using this way to run your workflow, whenever possible.
We created a helper script called `snakemake.helper` to help executing
a snakefile workflow through a job scheduler.
Here is an example invocation (written as a Slurm job script):
```
#!/bin/bash
#SBATCH --cpus-per-task 2
#SBATCH --job-name Example_snakemake_cluster
enable_lmod
module load container_env snakemake
snakemake.helper -j 10
```
This script should only take 1 or 2 core to run, since it is *only* a master script.
It does not do the real work (e.g. it does not run the rule in this script).
When processing the input snakefile, Snakemake will spawn additional Slurm jobs
(where one snakefile rule == one Slurm job)
and monitor their completion in order to push the workflow forward to its completion.
Pros:
- Very scalable, this method can use as many CPU cores and compute nodes as
specified in the snakefile (or by the cluster policy);
- Complete control over resource specification per rule;
Cons:
- May not be efficient for small/simple rules.
> Important: In the cluster execution model, do not run `crun snakemake` directly!
You run `snakemake.helper` helper script instead.
{.is-info}
> For cluster resources, we only enforce `threads` in the snakefile rules.
Any other resource (memory, disk, ..) might or might not be enforced by
Snakemake.
{.is-info}
## Mode 2: Single-node (stand-alone) execution
In the single-node execution mode, you will acquire all the CPU
resources in a single node and can use all the cores to perform as
many tasks as can be parallelized to shorten the execution time.
(Snakemake will figure out what rules can be executed in parallel and
try to execute them concurrently, subject to the CPU core
constraints).
Use the following template to start your own job running Snakemake in
the single-node mode:
```
#!/bin/bash
#SBATCH --exclusive
#SBATCH --job-name Example_snakemake_1node
enable_lmod
module load container_env snakemake
crun snakemake
```
Pros:
- The easiest way to run Snakemake (no need to think about threads, etc.);
- You will have complete access to all compute resources on a single node
(CPU, memory, ...).
Cons:
- No parallelization across multiple nodes,
thus limiting the parallel scalability;
- Rules will be invoked within the same container as the Snakemake program;
if your program requires software in other containers, this will not work.
(Currently a containerized program [those with `crun` in its invocation]
cannot execute another program located inside a different container).
(I am not sure), at least I will not enforce it on the scheduler level.
rule map_reads:
input:
"data/genome.fa",
"data/samples/{sample}.fastq"
output:
"results/mapped/{sample}.bam"
threads: 2
shell:
"bwa mem {input} | samtools view -b - > {output}"
When setting threads in a rule, I will launch Slurm job with same "--cpus-per-task" configuration to match it. When not explicitly set, it will always be single thread.
My snakemake module also support MPI mode, but from what I observe your tasks usually involving running some code on a lot of input, instead of running a single multiple node code on a single input, so mode 2 should be most useful to you. If you do need to run MPI somehow, please let me know it will require some additional setup, especially when combined with "--use-conda" .
-
is installed, there are two modes to run this software:
Standalone
Snakemake conda is supported and tested, you should be able to install any conda package you want to. Unless it requires MPI, it usually works. When using conda, please make sure environment file is given and launch snakemake with "—use-conda":
# Snakefile
rule map_reads:
input:
"data/genome.fa",
"data/samples/{sample}.fastq"
output:
"results/mapped/{sample}.bam"
threads:
2
conda:
"envs/mapping.yaml"
shell:
"bwa mem {input} | samtools view -b - > {output}"
# job_script.sh
crun snakemake --use-conda # for standalone
snakemake.helper -j 10 --use-conda # for running in scheduler mode
Running other container modules is also possible, please just let me know what you need. If you can install what you need with conda, please use conda first since you can do it yourself.
You can find my sample job script in /home/jsun/snakemake , please let me know if you have any questions or issues.

@ -0,0 +1,24 @@
Survey of Workflow Tools
========================
DAGMan - Built-in workflow tool for HTCondor
--------------------------------------------
Built for high-throughput computing.
Nextflow
--------
Supposedly very popular!
Snakemake
---------
Supposedly easy to understand and learn for the first time.
Common Workflow Language (CWL)
------------------------------
Aiming to be the common standard of expressing workflows.
Loading…
Cancel
Save