SLURM HPC Plugin#

https://slurm.schedmd.com/documentation.html

This HPC environment can be selected via --execenv=hpc.slurm. In this HPC environment, SIERRA will run experiments spread across multiple nodes allocated by the SLURM scheduler. The following table describes the SLURM-SIERRA interface. Some SLURM environment variables are used by SIERRA to configure experiments during stage 1,2; if they are not defined SIERRA will throw an error.

SLURM-SIERRA interface#

Environment variable

SIERRA context

Command line override

PARALLEL

Used to transfer environment variables into the GNU parallel environment.

N/A

PARALLEL_SHELL

Used to set the shell used by GNU parallel to execute all commands in. Overwritten by SIERRA to /bin/bash.

N/A

LD_LIBRARY_PATH

Exported by SIERRA via PARALLEL to child GNU parallel processes. Can be undefined when SIERRA starts.

N/A

PYTHONPATH

Exported by SIERRA via PARALLEL to child GNU parallel processes. Can be undefined when SIERRA starts.

N/A

PATH

Exported by SIERRA via PARALLEL to child GNU parallel processes. Can be undefined when SIERRA starts.

N/A

SLURM_CPUS_PER_TASK

Used to set # threads per experimental node for each allocated compute node.

N/A

SLURM_TASKS_PER_NODE

Used to set # parallel jobs per allocated compute node.

--exec-jobs-per-node

SLURM_JOB_NODELIST

Obtaining the list of nodes allocated to a job which SIERRA can direct GNU parallel to use for experiments.

N/A

SLURM_JOB_ID

Creating the UUID nodelist file passed to GNU parallel, guaranteeing no collisions (i.e., simultaneous SIERRA invocations sharing allocated nodes if multiple jobs are started from the same directory).

N/A