Run your first job#
In this example, we use the WGS-Germline-Snps-Indels workflow and provide you with all the resources you will need to run your first job on SeqsLab.
The goal of this quickstart guide is to provide a hands-on example on how you can use SeqsLab to run your workflows. For a more in-depth explanation of the test drive process, see Test drive SeqsLab.
Important
This guide assumes that you have some familiarity with genomic data analysis. It would also be helpful to have some experience in using a command line interface (CLI) tool.
Step 1. Test drive SeqsLab#
SeqsLab is available as a managed application on Azure Marketplace. You can test drive SeqsLab and try out its features for free.
Go to the Azure Marketplace listing.
Click Test Drive.
A login window displays.Sign in to Microsoft Azure marketplace using a Microsoft email address.
A confirmation message displays.Enable the checkbox to grant access to your basic profile information and then click Continue.
Step 2. Pull and run SeqsLab CLI#
The SeqsLab CLI is deployed as a Docker container. Before you can use the CLI, you must first complete the following steps:
On a command line application, run
docker pull seqslabmain.azurecr.io/seqslab_cli
.Find and define the subdomain name of the SeqsLab API as
PRIVATE_NAME
. For example, the domainatgenomix.seqslab.net
usesatgenomix
as itsPRIVATE_NAME
.Run
export PRIVATE_NAME="yourCompanyName"
.Run the SeqsLab CLI:
docker run --rm --name cli \ -e PRIVATE_NAME="testdrive" \ -e WORKSPACE="tdwestus2" \ --privileged \ -it seqslabmain.azurecr.io/seqslab_cli
Set up the environment keyring and install the required package inside the SeqsLab CLI:
dbus-run-session -- bash echo $RANDOM | gnome-keyring-daemon --unlock apt install zip wget -y
Sign in with your Microsoft Entra ID account and password.
Note
Go to the Access information section of the marketplace test drive page to get the login credentials.
Optional: Set up the Multi-Factor Authentication (MFA) for the given Microsoft Entra ID account. You can skip this process since the test drive expires in two days.
seqslab auth signin -i
Step 3. Register the sample and reference files#
Register the FASTQ sample files:
wget https://seqslabbundles.blob.core.windows.net/static/data/sample.json cat sample.json | seqslab datahub register-blob \ --stdin file-blob --workspace "${WORKSPACE}"
Register the reference files:
wget https://seqslabbundles.blob.core.windows.net/static/data/static.json cat static.json | seqslab datahub register-blob \ --stdin file-blob --workspace "${WORKSPACE}"
Step 4: Register a tool#
Get the workflow from Dockstore:
mkdir working_dir && \ wget https://dockstore.org/api/workflows/18394/zip/158152 \ -O working_dir/workflow.zip cd working_dir && unzip workflow.zip
Register the workflow as a new tool:
export TOOL_ID="wgs_germline_gatk4_snp_indel_`date '+%Y%m%d%H%M%S'`" seqslab tools tool \ --id "${TOOL_ID}" \ --name "testdrive-${TOOL_ID}" \ --description "test drive workflow"
Register a new tool version:
seqslab tools version \ --descriptor-type WDL \ --tool-id "${TOOL_ID}" \ --id "1.0" \ --workspace "${WORKSPACE}" \ --images '[{"image_type": "docker", "image_name": "atgenomix/seqslab_runtime-1.5_ubuntu-20.04_preprocessgatk4-4.2.0.0:2022-10-05-02-36", "registry_host": "tmuhwus24434bacr.azurecr.io", "size": 3349109245, "checksum": "sha256:020be4ed428c7dfec67d6c640aaec07d239406c1222295b22a8da6be2ec38ad1"}]'
Upload the tool files to the newly created tool version:
seqslab tools file \ --descriptor-type WDL \ --tool-id "${TOOL_ID}" \ --version-id "1.0" \ --working-dir `pwd`/GATK-Germline-Snps-Indels/ \ --file-info execs/gatk.parallel.hg19.0713.execs.json
Step 5. Run a job#
Create a run request file:
seqslab jobs request \ --run-name testdrive \ --working-dir `pwd` \ --execs GATK-Germline-Snps-Indels/execs/gatk.parallel.hg19.0713.execs.json \ --workflow-url \ "https://testdrive.seqslab.net/trs/v2/tools/${TOOL_ID}/versions/1.0/WDL/files/" \ --runtimes \ GermlineSnpsIndelsGatk4Hg19=acu-m64l
Submit a workflow run:
seqslab jobs run \ --workspace "${WORKSPACE}" \ --working-dir `pwd` \ --response-path run.json
Step 6. Check the job status#
Check the status of the workflow run:
seqslab jobs run-state --run-id "yourRunID"
When the job reaches the COMPLETE state, get the detailed information of your workflow run:
seqslab jobs get --run-id "yourRunID"
Step 7. Download the results#
Download the workflow output files to the local machine:
% seqslab datahub download \
--workspace seqslabwus2 \
--dst ~/Downloads/ \
--self-uri drs://api.seqslab.net/drs_ODlEMzEKxhxwc43 drs://api.seqslab.net/drs_Otr1u9pIYAe2JLr