Run your first job#

In this example, we use the WGS-Germline-Snps-Indels workflow and provide you with all the resources you will need to run your first job on SeqsLab.

The goal of this quickstart guide is to provide a hands-on example on how you can use SeqsLab to run your workflows. For a more in-depth explanation of the test drive process, see Test drive SeqsLab.

Important

This guide assumes that you have some familiarity with genomic data analysis. It would also be helpful to have some experience in using a command line interface (CLI) tool.

Step 1. Test drive SeqsLab#

SeqsLab is available as a managed application on Azure Marketplace. You can test drive SeqsLab and try out its features for free.

  1. Go to the Azure Marketplace listing.

  2. Click Test Drive.
    A login window displays.

  3. Sign in to Microsoft Azure marketplace using a Microsoft email address.
    A confirmation message displays.

  4. Enable the checkbox to grant access to your basic profile information and then click Continue.

Step 2. Pull and run SeqsLab CLI#

The SeqsLab CLI is deployed as a Docker container. Before you can use the CLI, you must first complete the following steps:

  1. On a command line application, run docker pull seqslabmain.azurecr.io/seqslab_cli.

  2. Find and define the subdomain name of the SeqsLab API as PRIVATE_NAME. For example, the domain atgenomix.seqslab.net uses atgenomix as its PRIVATE_NAME.

  3. Run export PRIVATE_NAME="yourCompanyName".

  4. Run the SeqsLab CLI:

    docker run --rm --name cli \
        -e PRIVATE_NAME="testdrive" \
        -e WORKSPACE="tdwestus2" \
        --privileged \
        -it seqslabmain.azurecr.io/seqslab_cli
    
  5. Set up the environment keyring and install the required package inside the SeqsLab CLI:

    dbus-run-session -- bash
    echo $RANDOM | gnome-keyring-daemon --unlock
    apt install zip wget -y
    
  6. Sign in with your Azure AD account and password.

    Note

    Go to the Access information section of the marketplace test drive page to get the login credentials.

  7. Optional: Set up the Multi-Factor Authentication (MFA) for the given Azure AD account. You can skip this process since the test drive expires in two days.

    seqslab auth signin -i
    

Step 3. Register the sample and reference files#

  1. Register the FASTQ sample files:

    wget https://seqslabbundles.blob.core.windows.net/static/data/sample.json
    cat sample.json | seqslab datahub register-blob \
        --stdin file-blob --workspace "${WORKSPACE}"
    
  2. Register the reference files:

    wget https://seqslabbundles.blob.core.windows.net/static/data/static.json
    cat static.json | seqslab datahub register-blob \
        --stdin file-blob --workspace "${WORKSPACE}"
    

Step 4: Register a tool#

  1. Get the workflow from Dockstore:

    mkdir working_dir && \
        wget https://dockstore.org/api/workflows/18394/zip/158152 \
        -O working_dir/workflow.zip
    cd working_dir && unzip workflow.zip
    
  2. Register the workflow as a new tool:

    export TOOL_ID="wgs_germline_gatk4_snp_indel_`date '+%Y%m%d%H%M%S'`"
    seqslab tools tool \
        --id "${TOOL_ID}" \    
        --name "testdrive-${TOOL_ID}" \
        --description "test drive workflow"
    
  3. Register a new tool version:

    seqslab tools version \
        --descriptor-type WDL  \
        --tool-id "${TOOL_ID}" \
        --id "1.0" \
        --workspace "${WORKSPACE}" \
        --images '[{"image_type": "docker", "image_name": "atgenomix/seqslab_runtime-1.5_ubuntu-20.04_preprocessgatk4-4.2.0.0:2022-10-05-02-36", "registry_host": "tmuhwus24434bacr.azurecr.io", "size": 3349109245, "checksum": "sha256:020be4ed428c7dfec67d6c640aaec07d239406c1222295b22a8da6be2ec38ad1"}]'
    
  4. Upload the tool files to the newly created tool version:

    seqslab tools file \
        --descriptor-type WDL \
        --tool-id "${TOOL_ID}" \
        --version-id "1.0" \
        --working-dir `pwd`/GATK-Germline-Snps-Indels/ \
        --file-info execs/gatk.parallel.hg19.0713.execs.json
    

Step 5. Run a job#

  1. Create a run request file:

    seqslab jobs request \
        --run-name testdrive \
        --working-dir `pwd` \
        --execs GATK-Germline-Snps-Indels/execs/gatk.parallel.hg19.0713.execs.json \
        --workflow-url \
            "https://testdrive.seqslab.net/trs/v2/tools/${TOOL_ID}/versions/1.0/WDL/files/" \
        --runtimes \
            GermlineSnpsIndelsGatk4Hg19=acu-m64l
    
  2. Submit a workflow run:

    seqslab jobs run \
        --workspace "${WORKSPACE}" \
        --working-dir `pwd` \
        --response-path run.json
    

Step 6. Check the job status#

  1. Check the status of the workflow run:

    seqslab jobs run-state --run-id "yourRunID"
    
  2. When the job reaches the COMPLETE state, get the detailed information of your workflow run:

    seqslab jobs get --run-id "yourRunID"
    

Step 7. Download the results#

Download the workflow output files to the local machine:

% seqslab datahub download \
    --workspace seqslabwus2 \
    --dst ~/Downloads/ \
    --self-uri drs://api.seqslab.net/drs_ODlEMzEKxhxwc43 drs://api.seqslab.net/drs_Otr1u9pIYAe2JLr