Developing a tool

The tool development process can be broken down into three main steps:

1. Choose a standardized workflow language.

WDL is a mainstream standardized workflow language that is natively supported on SeqsLab. You can follow the OpenWDL development guide (external link) when drafting your WDL workflows. For more information, see WDL workflows on SeqsLab and WDL best practice guidelines.

2. Containerize the runtime environment.

To ensure consistency, you should containerize the runtime environment for your tools as described in the Cromwell guide for running WDL container workflows (external link). Furthermore, SeqsLab provides its own base Docker runtime image that was built and tested by Atgenomix. We recommend using this as a base when building your Docker runtime image to ensure that the image will run on SeqsLab. The SeqsLab runtime images are provided in a publicly accessible Docker Container Registry on Azure.

To pull the images, use the following command:

docker pull seqslabmain.azurecr.io/seqslab_runtime-1.4_ubuntu-18.04:latest

Or simply create Dockerfile to build from the image:

FROM seqslabmain.azurecr.io/seqslab_runtime-1.4_ubuntu-18.04:latest
RUN ...

SeqsLab supports tools that use one or more Docker runtime images, depending on the granularity. Atgenomix recommends setting the Docker runtime granularity at sub-workflow level for more balanced granularity. For example, you can have multiple WDL tasks in a sub-workflow to be run on a Docker runtime image, and one or more Docker runtime images that will be used in a main workflow.

You can also opt to register your Docker runtime images in the SeqsLab workspace container registry using the following steps.

a. List existing workspace.

seqslab workspace list
[
    {
        "id": "/subscriptions/xxxxxxxxx-aaaa-bbbb-cccc-zzzzzzzzzzzz/resourceGroups/seqslabwus2",
        "name": "seqslabwus2",
        "location": "westus2"
    }
]

b. Show container registry access information.

seqslab workspace container-registry --workspace seqslabwus2
{
    host: seqslabwus2.azurecr.io,
    account: seqslabwus2,
    password: Xaade134=fJaQNmODNkhN1pE73+MWVEKx3
}

c. Log in to the Docker container registry.

docker login seqslabwus2.azurecr.io -u seqslabwus2 -p Xaade134=fJaQNmODNkhN1pE73+MWVEKx3

d. Re-tag and push the built Docker runtime images to the container registry.

docker tag germline-gatk4-snpindel-1.0_ubuntu-18.04:2022-03-01-01-03 seqslabwus2.azurecr.io/germline-gatk4-snpindel-1.0_ubuntu-18.04:2022-03-01-01-03
docker push seqslabwus2.azurecr.io/germline-gatk4-snpindel-1.0_ubuntu-18.04:2022-03-01-01-03

e. Check if the images were successfully registered using the SeqsLab CLI command tools images.

seqslab tools images --workspace seqslabwus2

3. Choose an execution engine.

When the workflows and Docker runtime images are ready, the final step is to test them using execution engines (external link). The SeqsLab platform extends the Cromwell (external link) execution engine by using an Azure backend, which provides a container-based Spark (external link) cluster using Azure Batch (external link) and the AKS (external link) computation infrastructure.

Typical tool development

The following diagram shows a typical development flow from writing the WDL draft, preparing the runtime images, and testing with a local Cromwell backend. The expected outputs include a standardized WDL workflow, Docker runtime image(s), and an inputs.json file.

TRS-overview