Automated jobs#

Automated jobs enable you to perform reproducible data processing tasks, such as read alignment and variant detection on sequencing data, using various bioinformatics tools. You can automate the execution of your WDL workflows in one or more distributed CPU/GPU clusters to quickly convert data to results.


Create and manage an automated job using the CLI#

Prerequisites#

  • Running instance of the SeqsLab CLI

  • Registered tool

Execution#

Command: jobs request

Description: Creates a run-request.json file

When the run-request.json files are ready, you can use the jobs run command to launch all WES runs based on the run-request.json file in the working directory. The jobs run command then responds with a JSON file indicating the submitted run_id and run_name.

Example:

seqslab jobs request \
--working-dir /home/ubuntu/src/ \
--workflow-url https://dev-api.seqslab.net/trs/v2/tools/trs_wgs_snp_indel/versions/1.0/WDL/files/ \
--workspace seqslabwus2 \ 
--execs execs/germline-gatk4-snpindel.json \  
--name demo-run

seqslab jobs run \
    --workspace seqslabwus2 \
    --working-dir /home/ubuntu/src/ \
    --response-path result.json

Monitoring#

Command: jobs run-state

Description: Checks the status of a job

Example:

seqslab jobs run-state --run-id run_DdtSfRfOr2AVTSe
{"run_id": "run_DdtSfRfOr2AVTSe", "state": "COMPLETE"}

Command: job get

Description: Retrieves the job details

Running this command returns the detailed WES run information in JSON format. The response includes the following information:

  • Basic attributes: run_id, run_name, state, start_time, and end_time

  • logs section: Contains execution details for each WDL task, including rendered command, start_time, end_time, exit_code, storage_url, and outputs

  • outputs section: Contains the WDL main-workflow output mapping from FQN, DRS self-URI, and local file name

Example:

seqslab jobs get --run-id run_DdtSfRfOr2AVTSe
{
    "id": "run_DdtSfRfOr2AVTSe",
    "name": "2022_02_11_WGS_22010402",
    "outputs": {
        "outputs": {
            "GermlineCalling.outFileBam": "1.aligned.duplicates_marked.recalibrated.bam",
            "GermlineCalling.outFileVcf": "1.vcf.gz",
            "GermlineCalling.outFileGvcf": "1.g.vcf.gz",
            ...
        },
        "datasets": {
            "GermlineCalling.outFileBam": "drs://staging.seqslab.net/drs_test_2023-06-27-07-35:GermlineCalling.preprocessing.ApplyBQSR:x.outFileRecalibratedBam_run_DdtSfRfOr2AVTSe_IpFV",
            "GermlineCalling.outFileVcf": "drs://staging.seqslab.net/drs_test_2023-06-27-07-35:GermlineCalling.calling.CaseGenotype:x.outFileVcf_run_DdtSfRfOr2AVTSe_HzjU",
            "GermlineCalling.outFileGvcf": "drs://staging.seqslab.net/drs_test_2023-06-27-07-35:GermlineCalling.calling.HaplotypeCaller:x.outFileGvcf_run_DdtSfRfOr2AVTSe_HVdM",
            ...
        }
    },
    "logs": [
        {
            "name": "Fastp:x",
            "id": 10184,
            "cmd": "set -e -o pipefail\n\n/usr/local/seqslab/fastp/fastp \\\n  --trim_poly_g \\\n  --cut_tail \\\n  --thread \"16\" \\\n  --in1 \"1.R1.fastq.gz\" \\\n  --in2 \"1.R2.fastq.gz\" \\\n  --out1 \"1.trimmed.R1.fastq.gz\" \\\n  --out2 \"1.trimmed.R2.fastq.gz\" \\\n  --report_title \"1\" \\\n  --json \"1.fastp.json\" \\\n  --html \"1.fastp.html\"",
            "start_time": "2023-06-27T07:48:28Z",
            "end_time": "2023-06-27T08:00:57Z",
            "exit_code": 0,
            "outputs": {
                "outputs": {
                    "GermlineCalling.mapping.Fastp.outFileR1Fq": "1.trimmed.R1.fastq.gz",
                    "GermlineCalling.mapping.Fastp.outFileR2Fq": "1.trimmed.R2.fastq.gz",
                    ...
                },
                "datasets": {
                    "GermlineCalling.mapping.Fastp.outFileR1Fq": "drs://staging.seqslab.net/drs_test_2023-06-27-07-35:GermlineCalling.mapping.Fastp:x.outFileR1Fq_run_DdtSfRfOr2AVTSe_spO5",
                    "GermlineCalling.mapping.Fastp.outFileR2Fq": "drs://staging.seqslab.net/drs_test_2023-06-27-07-35:GermlineCalling.mapping.Fastp:x.outFileR2Fq_run_DdtSfRfOr2AVTSe_A4cN",
                    ...
                }
            },
            "stdout": [
                "https://atgxtestws62fccstorage.blob.core.windows.net/seqslab/wes/run_OVbeipQ18wyYtqa/GermlineCalling.mapping.Fastp_x/stdout/0-stdout.log?st=2023-06-30T01%3A50%3A21Z&se=2023-06-30T02%3A50%3A21Z&sp=racwle&spr=https&sv=2020-06-12&sr=d&sdd=4&sig=s/x39tqVY0GvgUdpW4OiABiHXw08dKqFVLL%2BixN53pg%3D",
                ...
            ],
            "stderr": [
                "https://atgxtestws62fccstorage.blob.core.windows.net/seqslab/wes/run_OVbeipQ18wyYtqa/GermlineCalling.mapping.Fastp_x/stderr/0-stderr.log?st=2023-06-30T01%3A50%3A21Z&se=2023-06-30T02%3A50%3A21Z&sp=racwle&spr=https&sv=2020-06-12&sr=d&sdd=4&sig=eGC0Pu9U3k3pFgm6Zp4Li87ZhrOrAlT7DUK3q/IFP/Q%3D",
                ...
            ],
            "activity": [
                "https://atgxtestws62fccstorage.blob.core.windows.net/seqslab/wes/run_OVbeipQ18wyYtqa/audit.log?st=2023-06-30T01%3A50%3A21Z&se=2023-06-30T02%3A50%3A21Z&sp=racwle&spr=https&sv=2020-06-12&sr=b&sig=7kAJ9R7R8yEssuyCJq9ZmqjuDsr6YI6VEEflYLCrun8%3D"
            ]
            },
        ...
    ],
    "state": "COMPLETE",
    "request": {
        "id": 283,
        "name": "2022_01_18_2_WGS_22010402",
        "description": null,
        "workflow_type": "WDL",
        "workflow_type_version": "1.0",
        "workflow_params": { ...
        }
        "workflow_backend_params": { ...
        },
        "workflow_url": "https://api.seqslab.net/trs/v2/tools/trs_wgs_snp_indel/versions/1.0/WDL/files/",
        "tags": []
    },
    "start_time": "2022-02-11T10:41:28Z",
    "end_time": "2022-02-11T13:47:20Z"
}

Output retrieval#

Command: datahub download

Description: Retrieves the job output when the job reaches the Complete state

Running this command downloads the pipeline run output files to a directory on the local computer. You can specify multiple DRS self-URIs or multiple DRS IDs.

Example:

% seqslab datahub download \
    --workspace seqslabwus2 \
    --dst ~/Downloads/ \
    --self-uri drs://api.seqslab.net/drs_ODlEMzEKxhxwc43 drs://api.seqslab.net/drs_Otr1u9pIYAe2JLr