Workflow execution

After the WGS-Germline-Snps-Indels workflow is registered in the SeqsLab tool registry, the next step is to submit a workflow run. The following section provides step-by-step instructions on how to execute a workflow.

Run a job

The first step is to create a run request file that includes all the workflow run configuration.

Use the following jobs request command:

seqslab jobs request \
    --run-name testdrive \
    --working-dir `pwd` \
    --execs GATK-Germline-Snps-Indels/execs/gatk.parallel.hg19.0713.execs.json \
    --workflow-url \
        "https://testdrive.seqslab.net/trs/v2/tools/${TOOL_ID}/versions/1.0/WDL/files/" \
    --runtimes \
        GermlineSnpsIndelsGatk4Hg19=acu-m64l

The next step is to use the jobs run command to submit a workflow run.

seqslab jobs run \
    --workspace "${WORKSPACE}" \
    --working-dir `pwd` \
    --response-path run.json

A JSON file indicating the run_id and run_name will be returned, so that you can check the status and result of the workflow run.

cat `pwd`/run.json
[
    {
        "run_id": "yourRunID",
        "run_name": "testdrive"
    }
]

Check the status

After the workflow run is submitted, use the jobs run-state command to check the status of the workflow run.

seqslab jobs run-state --run-id "yourRunID"

When the workflow run reaches the COMPLETE state, use the jobs get command to obtain the detailed information of the workflow run.

seqslab jobs get --run-id "yourRunID"

The response contains an output section describing all the output files of the workflow run, and a logs section describing the execution details of all WDL workflows and tasks.

{
  "id": "yourRunID",
  "name": "testdrive",
  "outputs": [
    {
      "fqn": "GermlineSnpsIndelsGatk4.outFileBam",
      "cloud": [
        "drs://testdrive.seqslab.net/drs_C5rmw6tL8DRv6IS"
      ],
      "local": [
        "HG003.hg19.aligned.duplicates_marked.recalibrated.bam"
      ]
    },
    {
      "fqn": "GermlineSnpsIndelsGatk4.outFileDuplicationMetrics",
      "cloud": [
        "drs://testdrive.seqslab.net/drs_7bV8rdUw9NqE0Yp"
      ],
      "local": [
        "HG003.hg19.duplicate_metrics"
      ]
    },
    {
      "fqn": "GermlineSnpsIndelsGatk4.outFileBai",
      "cloud": [
        "drs://testdrive.seqslab.net/drs_48xz41QXQWjGCna"
      ],
      "local": [
        "HG003.hg19.aligned.duplicates_marked.recalibrated.bai"
      ]
    },
    {
      "fqn": "GermlineSnpsIndelsGatk4.outFileVCFIdx",
      "cloud": [
        "drs://testdrive.seqslab.net/drs_f0nXUfDyr0vpNkW"
      ],
      "local": [
        "HG003.hg19.aligned.duplicates_marked.recalibrated.g.vcf.gz.tbi"
      ]
    },
    {
      "fqn": "GermlineSnpsIndelsGatk4.outFileVCF",
      "cloud": [
        "drs://testdrive.seqslab.net/drs_ibLt5Yf5sGqq0UD"
      ],
      "local": [
        "HG003.hg19.aligned.duplicates_marked.recalibrated.g.vcf.gz"
      ]
    },
    {
      "fqn": "GermlineSnpsIndelsGatk4.outFileBqsrReport",
      "cloud": [
        "drs://testdrive.seqslab.net/drs_F7a6uCH6ALaNZDz"
      ],
      "local": [
        "HG003.hg19.recal_data.csv"
      ]
    }
  ],
  "logs": [
    {
      "id": 33,
      "name": "bwamem-x-a1004-run-4hyaycrojam5kjw",
      "cmd": "set -o pipefail\nset -e\n\n# set the bash variable needed for the command-line\nbash_ref_fasta=${ref_fasta}\n\n# bwa reference preload.\n# ${bwa_path}/bwa shm ${ref_fasta}\n\n${bwa_path}${bwa_commandline} \\\n  -R \"@RG\\tID:${sampleName}\\tLB:${sampleName}\\tSM:${sampleName}\\tPL:ILLUMINA\" \\\n  ${inFileFastqR1} ${inFileFastqR2} \\\n| \\\nsamtools view -1 - > ${sampleName}.bam",
      "start_time": "2022-06-29T08:51:27Z",
      "end_time": "2022-06-29T09:48:38Z",
      "exit_code": 0,
      "outputs": [
        {
          "fqn": "GermlineSnpsIndelsGatk4.PreProcessingForVariantDiscovery_GATK4.BwaMem.output_bam",
          "cloud": [
            "drs://testdrive.seqslab.net/drs_mRX1tMOM7vjZWmw"
          ],
          "local": [
            "HG003.bam"
          ]
        }
      ],
      "stdout": "abfss://seqslab@tdwestus262fccstorage.dfs.core.windows.net/wes/run_4hyAyCRoJaM5Kjw/GermlineSnpsIndelsGatk4.PreProcessingForVariantDiscovery_GATK4.BwaMem_x/stdout",
      "stderr": "abfss://seqslab@tdwestus262fccstorage.dfs.core.windows.net/wes/run_4hyAyCRoJaM5Kjw/GermlineSnpsIndelsGatk4.PreProcessingForVariantDiscovery_GATK4.BwaMem_x/stderr",
      "activity": "abfss://seqslab@tdwestus262fccstorage.dfs.core.windows.net/wes/run_4hyAyCRoJaM5Kjw/audit.log"
    },
    {
      "id": 34,
      "name": "markduplicates-x-46998-run-4hyaycrojam5kjw",
      "cmd": "${gatk_path} --java-options \"-Dsamjdk.compression_level=${compression_level} -Xmx${maxMem}G -Xms${command_mem_gb}G\" \\\n  MarkDuplicates \\\n  --INPUT ${input_bam} \\\n  --OUTPUT ${output_bam_basename}.bam \\\n  --METRICS_FILE ${metrics_filename} \\\n  --VALIDATION_STRINGENCY SILENT \\\n  --OPTICAL_DUPLICATE_PIXEL_DISTANCE 2500 \\\n  --ASSUME_SORT_ORDER \"queryname\" \\\n  --CREATE_MD5_FILE true \\\n  --READ_NAME_REGEX null",
      "start_time": "2022-06-29T09:48:53Z",
      "end_time": "2022-06-29T09:54:44Z",
      "exit_code": 0,
      "outputs": [
        {
          "fqn": "GermlineSnpsIndelsGatk4.PreProcessingForVariantDiscovery_GATK4.MarkDuplicates.output_bam",
          "cloud": [
            "drs://testdrive.seqslab.net/drs_NRLyAzQMA758gYq"
          ],
          "local": [
            "HG003.hg19.aligned.unsorted.duplicates_marked.bam"
          ]
        },
        {
          "fqn": "GermlineSnpsIndelsGatk4.PreProcessingForVariantDiscovery_GATK4.MarkDuplicates.duplicate_metrics",
          "cloud": [
            "drs://testdrive.seqslab.net/drs_7bV8rdUw9NqE0Yp"
          ],
          "local": [
            "HG003.hg19.duplicate_metrics"
          ]
        }
      ],
      "stdout": "abfss://seqslab@tdwestus262fccstorage.dfs.core.windows.net/wes/run_4hyAyCRoJaM5Kjw/GermlineSnpsIndelsGatk4.PreProcessingForVariantDiscovery_GATK4.MarkDuplicates_x/stdout",
      "stderr": "abfss://seqslab@tdwestus262fccstorage.dfs.core.windows.net/wes/run_4hyAyCRoJaM5Kjw/GermlineSnpsIndelsGatk4.PreProcessingForVariantDiscovery_GATK4.MarkDuplicates_x/stderr",
      "activity": "abfss://seqslab@tdwestus262fccstorage.dfs.core.windows.net/wes/run_4hyAyCRoJaM5Kjw/audit.log"
    },
    {
      "id": 35,
      "name": "sortandfixtags-x-5afb7-run-4hyaycrojam5kjw",
      "cmd": "set -o pipefail\n\n${gatk_path} --java-options \"-Dsamjdk.compression_level=${compression_level} -Xmx${maxMem}G -Xms${command_mem_gb_sort}G\" \\\n  SortSam \\\n  --INPUT ${input_bam} \\\n  --OUTPUT /dev/stdout \\\n  --SORT_ORDER \"coordinate\" \\\n  --CREATE_INDEX false \\\n  --CREATE_MD5_FILE false \\\n| \\\n${gatk_path} --java-options \"-Dsamjdk.compression_level=${compression_level} -Xmx${maxMem}G -Xms${command_mem_gb_fix}G\" \\\n  SetNmMdAndUqTags \\\n  --INPUT /dev/stdin \\\n  --OUTPUT ${output_bam_basename}.bam \\\n  --CREATE_INDEX true \\\n  --CREATE_MD5_FILE true \\\n  --REFERENCE_SEQUENCE ${ref_fasta}",
      "start_time": "2022-06-29T09:54:46Z",
      "end_time": "2022-06-29T10:02:35Z",
      "exit_code": 0,
      "outputs": [
        {
          "fqn": "GermlineSnpsIndelsGatk4.PreProcessingForVariantDiscovery_GATK4.SortAndFixTags.output_bam_index",
          "cloud": [
            "drs://testdrive.seqslab.net/drs_BCijeRE3KqjzLTJ"
          ],
          "local": [
            "HG003.hg19.aligned.duplicate_marked.sorted.bai"
          ]
        },
        {
          "fqn": "GermlineSnpsIndelsGatk4.PreProcessingForVariantDiscovery_GATK4.SortAndFixTags.output_bam_md5",
          "cloud": [
            "drs://testdrive.seqslab.net/drs_jqEqh57s8K9CkYK"
          ],
          "local": [
            "HG003.hg19.aligned.duplicate_marked.sorted.bam.md5"
          ]
        },
        {
          "fqn": "GermlineSnpsIndelsGatk4.PreProcessingForVariantDiscovery_GATK4.SortAndFixTags.output_bam",
          "cloud": [
            "drs://testdrive.seqslab.net/drs_HEtUywpf2B8KRWh"
          ],
          "local": [
            "HG003.hg19.aligned.duplicate_marked.sorted.bam"
          ]
        }
      ],
      "stdout": "abfss://seqslab@tdwestus262fccstorage.dfs.core.windows.net/wes/run_4hyAyCRoJaM5Kjw/GermlineSnpsIndelsGatk4.PreProcessingForVariantDiscovery_GATK4.SortAndFixTags_x/stdout",
      "stderr": "abfss://seqslab@tdwestus262fccstorage.dfs.core.windows.net/wes/run_4hyAyCRoJaM5Kjw/GermlineSnpsIndelsGatk4.PreProcessingForVariantDiscovery_GATK4.SortAndFixTags_x/stderr",
      "activity": "abfss://seqslab@tdwestus262fccstorage.dfs.core.windows.net/wes/run_4hyAyCRoJaM5Kjw/audit.log"
    },
    {
      "id": 38,
      "name": "baserecalibrator-x-fcf7c-run-4hyaycrojam5kjw",
      "cmd": "${gatk_path} --java-options \"-Xmx${maxMem}G -Xms${command_mem_gb}G\" \\\n  BaseRecalibrator \\\n  -R ${ref_fasta} \\\n  -I ${input_bam} \\\n  --use-original-qualities \\\n  -O ${recalibration_report_filename} \\\n  --known-sites ${dbSNP_vcf} \\\n  --known-sites ${sep=\" --known-sites \" known_indels_sites_VCFs}",
      "start_time": "2022-06-29T10:02:46Z",
      "end_time": "2022-06-29T10:15:26Z",
      "exit_code": 0,
      "outputs": [
        {
          "fqn": "GermlineSnpsIndelsGatk4.PreProcessingForVariantDiscovery_GATK4.BaseRecalibrator.recalibration_report",
          "cloud": [
            "drs://testdrive.seqslab.net/drs_F7a6uCH6ALaNZDz"
          ],
          "local": [
            "HG003.hg19.recal_data.csv"
          ]
        }
      ],
      "stdout": "abfss://seqslab@tdwestus262fccstorage.dfs.core.windows.net/wes/run_4hyAyCRoJaM5Kjw/GermlineSnpsIndelsGatk4.PreProcessingForVariantDiscovery_GATK4.BaseRecalibrator_x/stdout",
      "stderr": "abfss://seqslab@tdwestus262fccstorage.dfs.core.windows.net/wes/run_4hyAyCRoJaM5Kjw/GermlineSnpsIndelsGatk4.PreProcessingForVariantDiscovery_GATK4.BaseRecalibrator_x/stderr",
      "activity": "abfss://seqslab@tdwestus262fccstorage.dfs.core.windows.net/wes/run_4hyAyCRoJaM5Kjw/audit.log"
    },
    {
      "id": 42,
      "name": "applybqsr-x-1b41e-run-4hyaycrojam5kjw",
      "cmd": "${gatk_path} --java-options \"-Xmx${maxMem}G -Xms${command_mem_gb}G\" \\\n  ApplyBQSR \\\n  -R ${ref_fasta} \\\n  -I ${input_bam} \\\n  -O ${output_bam_basename}.bam \\\n  -bqsr ${recalibration_report} \\\n  --static-quantized-quals 10 --static-quantized-quals 20 --static-quantized-quals 30 \\\n  --add-output-sam-program-record \\\n  --create-output-bam-md5 \\\n  --use-original-qualities",
      "start_time": "2022-06-29T10:15:36Z",
      "end_time": "2022-06-29T10:25:08Z",
      "exit_code": 0,
      "outputs": [
        {
          "fqn": "GermlineSnpsIndelsGatk4.PreProcessingForVariantDiscovery_GATK4.ApplyBQSR.recalibrated_bam",
          "cloud": [
            "drs://testdrive.seqslab.net/drs_vmemeS8iMrLHnaM"
          ],
          "local": [
            "HG003.hg19.aligned.duplicates_marked.recalibrated.bam"
          ]
        }
      ],
      "stdout": "abfss://seqslab@tdwestus262fccstorage.dfs.core.windows.net/wes/run_4hyAyCRoJaM5Kjw/GermlineSnpsIndelsGatk4.PreProcessingForVariantDiscovery_GATK4.ApplyBQSR_x/stdout",
      "stderr": "abfss://seqslab@tdwestus262fccstorage.dfs.core.windows.net/wes/run_4hyAyCRoJaM5Kjw/GermlineSnpsIndelsGatk4.PreProcessingForVariantDiscovery_GATK4.ApplyBQSR_x/stderr",
      "activity": "abfss://seqslab@tdwestus262fccstorage.dfs.core.windows.net/wes/run_4hyAyCRoJaM5Kjw/audit.log"
    },
    {
      "id": 43,
      "name": "PreProcessingForVariantDiscovery_GATK4",
      "cmd": "WorkflowLifeCycle",
      "start_time": "2022-06-29T08:51:15Z",
      "end_time": "2022-06-29T10:25:09Z",
      "exit_code": 0,
      "outputs": [],
      "stdout": "",
      "stderr": "",
      "activity": ""
    },
    {
      "id": 44,
      "name": "indexbam-x-aa9e1-run-4hyaycrojam5kjw",
      "cmd": "set -e -o pipefail\n\nsamtools index -@ 16 -b ${inFileBam} ${outFileBamPrefix}.bai\nmv ${inFileBam} ${outFileBamPrefix}.bam",
      "start_time": "2022-06-29T10:25:16Z",
      "end_time": "2022-06-29T10:26:29Z",
      "exit_code": 0,
      "outputs": [
        {
          "fqn": "GermlineSnpsIndelsGatk4.IndexBam.outFileBai",
          "cloud": [
            "drs://testdrive.seqslab.net/drs_48xz41QXQWjGCna"
          ],
          "local": [
            "HG003.hg19.aligned.duplicates_marked.recalibrated.bai"
          ]
        },
        {
          "fqn": "GermlineSnpsIndelsGatk4.IndexBam.outFileBam",
          "cloud": [
            "drs://testdrive.seqslab.net/drs_C5rmw6tL8DRv6IS"
          ],
          "local": [
            "HG003.hg19.aligned.duplicates_marked.recalibrated.bam"
          ]
        }
      ],
      "stdout": "abfss://seqslab@tdwestus262fccstorage.dfs.core.windows.net/wes/run_4hyAyCRoJaM5Kjw/GermlineSnpsIndelsGatk4.IndexBam_x/stdout",
      "stderr": "abfss://seqslab@tdwestus262fccstorage.dfs.core.windows.net/wes/run_4hyAyCRoJaM5Kjw/GermlineSnpsIndelsGatk4.IndexBam_x/stderr",
      "activity": "abfss://seqslab@tdwestus262fccstorage.dfs.core.windows.net/wes/run_4hyAyCRoJaM5Kjw/audit.log"
    },
    {
      "id": 47,
      "name": "haplotypecaller-x-9e122-run-4hyaycrojam5kjw",
      "cmd": "set -e\n\n${samtools_path} index -b ${input_bam}\n\n${gatk_path} --java-options \"-Xmx${maxMem}G -Xms${command_mem_gb}G ${java_opt}\" \\\n  HaplotypeCaller \\\n  -L chr20 \\\n  -R ${ref_fasta} \\\n  -I ${input_bam} \\\n  -O ${output_filename} \\\n  -contamination ${default=\"0\" contamination} \\\n  -G StandardAnnotation -G StandardHCAnnotation ${true=\"-G AS_StandardAnnotation\" false=\"\" make_gvcf} \\\n  -GQB 10 -GQB 20 -GQB 30 -GQB 40 -GQB 50 -GQB 60 -GQB 70 -GQB 80 -GQB 90 \\\n  ${true=\"-ERC GVCF\" false=\"\" make_gvcf} \\\n  ${bamout_arg}\n\n# Cromwell doesn't like optional task outputs, so we have to touch this file.\ntouch ${vcf_basename}.bamout.bam",
      "start_time": "2022-06-29T10:25:26Z",
      "end_time": "2022-06-29T11:09:24Z",
      "exit_code": 0,
      "outputs": [
        {
          "fqn": "GermlineSnpsIndelsGatk4.HaplotypeCallerGvcf_GATK4.HaplotypeCaller.output_vcf",
          "cloud": [
            "drs://testdrive.seqslab.net/drs_ibLt5Yf5sGqq0UD"
          ],
          "local": [
            "HG003.hg19.aligned.duplicates_marked.recalibrated.g.vcf.gz"
          ]
        },
        {
          "fqn": "GermlineSnpsIndelsGatk4.HaplotypeCallerGvcf_GATK4.HaplotypeCaller.output_vcf_index",
          "cloud": [
            "drs://testdrive.seqslab.net/drs_f0nXUfDyr0vpNkW"
          ],
          "local": [
            "HG003.hg19.aligned.duplicates_marked.recalibrated.g.vcf.gz.tbi"
          ]
        },
        {
          "fqn": "GermlineSnpsIndelsGatk4.HaplotypeCallerGvcf_GATK4.HaplotypeCaller.bamout",
          "cloud": [
            "drs://testdrive.seqslab.net/drs_N8Bkj8zwDHF9FSu"
          ],
          "local": [
            "HG003.hg19.aligned.duplicates_marked.recalibrated.g.vcf.gz.bamout.bam"
          ]
        }
      ],
      "stdout": "abfss://seqslab@tdwestus262fccstorage.dfs.core.windows.net/wes/run_4hyAyCRoJaM5Kjw/GermlineSnpsIndelsGatk4.HaplotypeCallerGvcf_GATK4.HaplotypeCaller_x/stdout",
      "stderr": "abfss://seqslab@tdwestus262fccstorage.dfs.core.windows.net/wes/run_4hyAyCRoJaM5Kjw/GermlineSnpsIndelsGatk4.HaplotypeCallerGvcf_GATK4.HaplotypeCaller_x/stderr",
      "activity": "abfss://seqslab@tdwestus262fccstorage.dfs.core.windows.net/wes/run_4hyAyCRoJaM5Kjw/audit.log"
    },
    {
      "id": 48,
      "name": "HaplotypeCallerGvcf_GATK4",
      "cmd": "WorkflowLifeCycle",
      "start_time": "2022-06-29T10:25:12Z",
      "end_time": "2022-06-29T11:09:25Z",
      "exit_code": 0,
      "outputs": [],
      "stdout": "",
      "stderr": "",
      "activity": ""
    },
    {
      "id": 49,
      "name": "germlinesnpsindelsgatk4-7556f-run-4hyaycrojam5kjw",
      "cmd": "{\"started\":\"2022-06-29T08:51:27\",\"initializing\":\"2022-06-29T08:51:32\",\"running\":\"2022-06-29T09:48:34\",\"terminated\":\"2022-06-29T11:09:27\"}",
      "start_time": "2022-06-29T08:51:27Z",
      "end_time": "2022-06-29T11:09:27Z",
      "exit_code": 0,
      "outputs": [],
      "stdout": "",
      "stderr": "",
      "activity": ""
    },
    {
      "id": 50,
      "name": "GermlineSnpsIndelsGatk4",
      "cmd": "WorkflowLifeCycle",
      "start_time": "2022-06-29T08:51:13Z",
      "end_time": "2022-06-29T11:09:27Z",
      "exit_code": 0,
      "outputs": [],
      "stdout": "",
      "stderr": "",
      "activity": ""
    }
  ],
  "state": "COMPLETE",
  "request": {
    "id": 13,
    "name": "testdrive",
    "description": null,
    "workflow_type": "WDL",
    "workflow_type_version": "1.0",
    "workflow_params": {},
    "workflow_backend_params": {},
  "start_time": "2022-06-29T08:51:06Z",
  "end_time": "2022-06-29T11:10:38Z"
}

Download the results

All the resulting files generated by the workflow run will be registered on the SeqsLab platform. If you want to further investigate the data, you can get the URIs reported in the outputs section of the run.json file.

For example, the URIs for the WDL FQNs GermlineSnpsIndelsGatk4.outFileVCF and GermlineSnpsIndelsGatk4.outFileBam are drs://testdrive.seqslab.net/drs_ibLt5Yf5sGqq0UD and drs://testdrive.seqslab.net/drs_C5rmw6tL8DRv6IS, respectively.

Lastly, use the datahub download command to download the output files.

To download the GermlineSnpsIndelsGatk4.outFileVCF file, use the following command:

seqslab datahub download \
    --workspace "${WORKSPACE}" \
    --dst ~/Downloads/ \
    --uri "drs://testdrive.seqslab.net/drs_ibLt5Yf5sGqq0UD"

To download the GermlineSnpsIndelsGatk4.outFileBam file, use the following command:

seqslab datahub download \
    --workspace "${WORKSPACE}" \
    --dst ~/Downloads/ \
    --uri "drs://testdrive.seqslab.net/drs_C5rmw6tL8DRv6IS"