This is a quickstart on the platform. In this page you will learn:
- creating a job dependency with Slurm
- using Picas pilot framework from Spider
7.1. Slurm job dependencies¶
A job can be given the constraint that it only starts after another job has finished. Lets say that you have two Slurm jobs, A and B. You want job B to start after job A has successfully completed. Here are the steps:
Submit job A and keep the returned job ID:
Submit job B with a condition to start after job A, by providing its assigned job ID. This way job B only starts after Job A has successfully completed:
sbatch --dependency=afterok:<jobID_A> jobB.sh
We can also tell Slurm to run job B, even if job A fails:
sbatch --dependency=afterany:<jobID_A> jobB.sh
If you want job B to start after several other jobs have completed, use the delimiter ‘:’ as:
sbatch --dependency=afterok:<jobID_A:jobID_C:jobID_D> jobB.sh
7.1.1. Job dependency example¶
Now we show an example of how to use the dependencies. In this example it is shown how to run a job that depends on another job to finish before it starts.
First make two files,
hello.sh which contains
#!/bin/bash date sleep 30 date echo "hello world"
bye.sh which contains
#!/bin/bash date echo "bye!"
And make the scripts executable with
chmod +x hello.sh and
chmod +x bye.sh.
sleep 30 command stops the code for 30 seconds and then the
date command shows you the time, so you can see that the second job waits for the dependent to finish before starting.
Now you can submit the jobs with:
jobid=$(sbatch --parsable hello.sh) sbatch --dependency=afterok:$jobid bye.sh Submitted batch job 2560644
The first command submits a job and returns only the JobID value with
--parsable, and saves this value to a variable called
jobid. The second command only starts after the job with jobid has finished in a success state with
afterok. For more information on the dependency flag, see the SLURM man-pages.
Now you can see your jobs in the queue with:
squeue -u homer JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 2560644 normal bye.sh homer PD 0:00 1 (Dependency) 2560643 normal hello.sh homer R 0:11 1 wn-hb-04
bye.sh script is pending (PD), waiting until
hello.sh is finished, which is running (R). A bit later the jobs are finished:
squeue -u homer JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
And now we can see the output of the log files:
cat slurm-2560643.out Wed Aug 17 11:27:25 CEST 2022 Wed Aug 17 11:27:55 CEST 2022 hello world
cat slurm-2560644.out Wed Aug 17 11:27:56 CEST 2022 bye!
And we see that the first job slept for 30 seconds and the second job waited until the first was finished!
For more information on job dependencies, see also the
-d, --dependency section in the man page of the sbatch command.
When you run many jobs on Spider it can be difficult to keep track of the state of these jobs, especially when you start running hundreds to thousands of jobs. Although Slurm offers some functionality for tracking the status of the jobs, via the Slurm job ID, in many cases a Pilot job framework, such as PiCaS, is necessary for this purpose.
Access on PiCaS is not provided by default to the Spider projects. To request for PiCaS access, please contact our our helpdesk.
If you already have access on PiCaS, then you can use it directly from Spider, i.e. you can establish a connection to your CouchDB database and use the python PiCaS client either from the login node or the worker nodes.
To connect with your PiCaS database you need to provide your credentials
(username, password, database name). It is possible to specify the password on the
command line, however for security reasons this should be avoided on shared systems
(like the login node) because it can allow other local users to read the password (e.g. with
ps command). Also to avoid having to type these credentials
every time your client connects to your database or using them within your jobs,
we advice you to authenticate to PiCaS with the steps below.
- Create a PiCaS configuration directory in your home directory. Here we will call this directory
picas_cfg, but you are free to give it any other name.
mkdir /home/[USERNAME]/picas_cfg chmod go-rwx /home/[USERNAME]/picas_cfg
- Check the settings of your directory with
ls -la. The output should be similar to:
ls -la /home/homer/picas_cfg drwx------ 1 homer homer 3 May 7 08:33 picas_cfg
- Create a new file called
cd /home/[USERNAME]]/picas_cfg touch picasconfig.py
- Add the following lines to the
PICAS_HOST_URL="https://picas.surfsara.nl:6984" PICAS_DATABASE="[YOUR_DATABASE_NAME]" PICAS_USERNAME="[YOUR_USERNAME]" PICAS_PASSWORD="[YOUR_PASSWORD]"
- Storing cleartext passwords in any medium is dangerous, so we need to make sure it is not readable by others. Save the
picasconfig.pyfile and for additional security set it to read-write (rw) access for you only:
chmod go-rw /home/[USERNAME]/picas_cfg/picasconfig.py
- Check the permissions of your
ls -la. The output should be similar to:
ls -la /home/homer/picas_cfg/picasconfig.py -rw------- 1 homer homer 126 May 7 08:33 picasconfig.py
- Finally, add the
picas_cfgdirectory to your PYTHONPATH environment variable so that python can locate it. We recommend that you set this variable in your /home/[USERNAME]]/.bashrc file by adding the following lines to it:
PYTHONPATH=/home/[USERNAME]/picas_cfg:$PYTHONPATH export PYTHONPATH
You are now ready to start using your PiCaS credentials without having to type them each time you or your jobs need to connect to the PiCaS server. Good practices to build worflows with PiCaS can be found in PiCaS example.
Still need help? Contact our helpdesk
7.3. Cron jobs¶
If you need to automate (part of) your workflow it is possible to set up cronjobs on Spider. Please note that cronjobs on Spider can be used for testing purposes only and we do not offer this functionality as part of our service. If you wish to use cron jobs for production workflows please contact our helpdesk.
There are some restrictions when setting up a cronjob on Spider. The Spider login
is automatically directed to two different login nodes
ui-02.spider.surfsara.nl and cronjobs will be linked to the UI where they where created.
If you would like to make changes in your cronjob you need to login directly to the login node (ui-01 or ui-02)
where it was created (tip: to check which node you are on, you can type the command
may also affect your workflows in case of maintenance on the login node you run your cronjobs.
For a more sustainable option for automated jobs see the next subsection.
7.3.1. Recurring jobs¶
Cron is tied to a single machine, which in the case of Spider is a user-interface (ui) node or a worker node (wn). If a cron job is setup on one of these machines and this machine is not available, your job will not run.
For more robust recurring jobs, there is
scrontab, a SLURM integration of cron.
scrontab has identical syntax to
cron. However, the job you submit is added to the SLURM database and run on the defined schedule when the resources are available. The job will only start once the resources are available, so your job may not always run at the exact same time (unlike
cron). Your job will be scheduled to run on a worker node, regardless of where it was submitted.
Therefore we advice users to use
scrontab instead of
crontab when they want to set up recurring jobs. The
scrontab documentation can be found here.
Jobs submitted with scrontab are treated as regular jobs in the SLURM queue and will keep running until they finished or killed. To avoid filling up the queue with test jobs that run long, please use short lived jobs like
hello world when testing.