Automated jobs#
Automated jobs enable you to perform reproducible data processing tasks, such as read alignment and variant detection on sequencing data, using various bioinformatics tools. You can automate the execution of your WDL workflows in one or more distributed CPU/GPU clusters to quickly convert data to results.
Create and manage an automated job using the CLI#
Execution#
Command: jobs request
Description: Creates a run-request.json
file
When the run-request.json
files are ready, you can use the jobs run
command to launch all WES runs based on the run-request.json
file in the working directory. The jobs run
command then responds with a JSON file indicating the submitted run_id
and run_name
.
Example:
seqslab jobs request \
--working-dir /home/ubuntu/src/ \
--workflow-url https://dev-api.seqslab.net/trs/v2/tools/trs_wgs_snp_indel/versions/1.0/WDL/files/ \
--workspace seqslabwus2 \
--execs execs/germline-gatk4-snpindel.json \
--name demo-run
seqslab jobs run \
--workspace seqslabwus2 \
--working-dir /home/ubuntu/src/ \
--response-path result.json
Monitoring#
Command: jobs run-state
Description: Checks the status of a job
Example:
seqslab jobs run-state --run-id run_DdtSfRfOr2AVTSe
{"run_id": "run_DdtSfRfOr2AVTSe", "state": "COMPLETE"}
Command: job get
Description: Retrieves the job details
Running this command returns the detailed WES run information in JSON format. The response includes the following information:
Basic attributes:
run_id
,run_name
,state
,start_time
, andend_time
logs section: Contains execution details for each WDL task, including
rendered command
,start_time
,end_time
,exit_code
,storage_url
, andoutputs
outputs section: Contains the WDL main-workflow output mapping from FQN, DRS self-URI, and local file name
Example:
seqslab jobs get --run-id run_DdtSfRfOr2AVTSe
{
"id": "run_DdtSfRfOr2AVTSe",
"name": "2022_02_11_WGS_22010402",
"outputs": {
"outputs": {
"GermlineCalling.outFileBam": "1.aligned.duplicates_marked.recalibrated.bam",
"GermlineCalling.outFileVcf": "1.vcf.gz",
"GermlineCalling.outFileGvcf": "1.g.vcf.gz",
...
},
"datasets": {
"GermlineCalling.outFileBam": "drs://staging.seqslab.net/drs_test_2023-06-27-07-35:GermlineCalling.preprocessing.ApplyBQSR:x.outFileRecalibratedBam_run_DdtSfRfOr2AVTSe_IpFV",
"GermlineCalling.outFileVcf": "drs://staging.seqslab.net/drs_test_2023-06-27-07-35:GermlineCalling.calling.CaseGenotype:x.outFileVcf_run_DdtSfRfOr2AVTSe_HzjU",
"GermlineCalling.outFileGvcf": "drs://staging.seqslab.net/drs_test_2023-06-27-07-35:GermlineCalling.calling.HaplotypeCaller:x.outFileGvcf_run_DdtSfRfOr2AVTSe_HVdM",
...
}
},
"logs": [
{
"name": "Fastp:x",
"id": 10184,
"cmd": "set -e -o pipefail\n\n/usr/local/seqslab/fastp/fastp \\\n --trim_poly_g \\\n --cut_tail \\\n --thread \"16\" \\\n --in1 \"1.R1.fastq.gz\" \\\n --in2 \"1.R2.fastq.gz\" \\\n --out1 \"1.trimmed.R1.fastq.gz\" \\\n --out2 \"1.trimmed.R2.fastq.gz\" \\\n --report_title \"1\" \\\n --json \"1.fastp.json\" \\\n --html \"1.fastp.html\"",
"start_time": "2023-06-27T07:48:28Z",
"end_time": "2023-06-27T08:00:57Z",
"exit_code": 0,
"outputs": {
"outputs": {
"GermlineCalling.mapping.Fastp.outFileR1Fq": "1.trimmed.R1.fastq.gz",
"GermlineCalling.mapping.Fastp.outFileR2Fq": "1.trimmed.R2.fastq.gz",
...
},
"datasets": {
"GermlineCalling.mapping.Fastp.outFileR1Fq": "drs://staging.seqslab.net/drs_test_2023-06-27-07-35:GermlineCalling.mapping.Fastp:x.outFileR1Fq_run_DdtSfRfOr2AVTSe_spO5",
"GermlineCalling.mapping.Fastp.outFileR2Fq": "drs://staging.seqslab.net/drs_test_2023-06-27-07-35:GermlineCalling.mapping.Fastp:x.outFileR2Fq_run_DdtSfRfOr2AVTSe_A4cN",
...
}
},
"stdout": [
"https://atgxtestws62fccstorage.blob.core.windows.net/seqslab/wes/run_OVbeipQ18wyYtqa/GermlineCalling.mapping.Fastp_x/stdout/0-stdout.log?st=2023-06-30T01%3A50%3A21Z&se=2023-06-30T02%3A50%3A21Z&sp=racwle&spr=https&sv=2020-06-12&sr=d&sdd=4&sig=s/x39tqVY0GvgUdpW4OiABiHXw08dKqFVLL%2BixN53pg%3D",
...
],
"stderr": [
"https://atgxtestws62fccstorage.blob.core.windows.net/seqslab/wes/run_OVbeipQ18wyYtqa/GermlineCalling.mapping.Fastp_x/stderr/0-stderr.log?st=2023-06-30T01%3A50%3A21Z&se=2023-06-30T02%3A50%3A21Z&sp=racwle&spr=https&sv=2020-06-12&sr=d&sdd=4&sig=eGC0Pu9U3k3pFgm6Zp4Li87ZhrOrAlT7DUK3q/IFP/Q%3D",
...
],
"activity": [
"https://atgxtestws62fccstorage.blob.core.windows.net/seqslab/wes/run_OVbeipQ18wyYtqa/audit.log?st=2023-06-30T01%3A50%3A21Z&se=2023-06-30T02%3A50%3A21Z&sp=racwle&spr=https&sv=2020-06-12&sr=b&sig=7kAJ9R7R8yEssuyCJq9ZmqjuDsr6YI6VEEflYLCrun8%3D"
]
},
...
],
"state": "COMPLETE",
"request": {
"id": 283,
"name": "2022_01_18_2_WGS_22010402",
"description": null,
"workflow_type": "WDL",
"workflow_type_version": "1.0",
"workflow_params": { ...
}
"workflow_backend_params": { ...
},
"workflow_url": "https://api.seqslab.net/trs/v2/tools/trs_wgs_snp_indel/versions/1.0/WDL/files/",
"tags": []
},
"start_time": "2022-02-11T10:41:28Z",
"end_time": "2022-02-11T13:47:20Z"
}
Output retrieval#
Command: datahub download
Description: Retrieves the job output when the job reaches the Complete state
Running this command downloads the pipeline run output files to a directory on the local computer. You can specify multiple DRS self-URIs or multiple DRS IDs.
Example:
% seqslab datahub download \
--workspace seqslabwus2 \
--dst ~/Downloads/ \
--self-uri drs://api.seqslab.net/drs_ODlEMzEKxhxwc43 drs://api.seqslab.net/drs_Otr1u9pIYAe2JLr