(jobs:automated)=

# Automated jobs

Automated jobs enable you to perform reproducible data processing tasks, such as read alignment and variant detection on sequencing data, using various bioinformatics tools. You can automate the execution of your WDL workflows in one or more distributed CPU/GPU clusters to quickly convert data to results.

---

## Create and manage an automated job using the CLI

### Prerequisites

- Running instance of the SeqsLab [CLI](cli:tutorial-getting-started)
- Registered [tool](cli:tutorial-trs)

### Execution

*Command*: `jobs request`  

*Description*: Creates a `run-request.json` file  

When the `run-request.json` files are ready, you can use the `jobs run` command to launch all WES runs based on the `run-request.json` file in the working directory. The  `jobs run` command then responds with a JSON file indicating the submitted `run_id` and `run_name`.

*Example:*  

```
seqslab jobs request \
--working-dir /home/ubuntu/src/ \
--workflow-url https://dev-api.seqslab.net/trs/v2/tools/trs_wgs_snp_indel/versions/1.0/WDL/files/ \
--workspace seqslabwus2 \ 
--execs execs/germline-gatk4-snpindel.json \  
--name demo-run

seqslab jobs run \
    --workspace seqslabwus2 \
    --working-dir /home/ubuntu/src/ \
    --response-path result.json
```

### Monitoring

*Command:* `jobs run-state`  

*Description:* Checks the status of a job  

*Example:*  

```
seqslab jobs run-state --run-id run_DdtSfRfOr2AVTSe
{"run_id": "run_DdtSfRfOr2AVTSe", "state": "COMPLETE"}
```

*Command:* `job get`  

*Description:* Retrieves the job details

Running this command returns the detailed WES run information in JSON format. The response includes the following information:
- Basic attributes: `run_id`, `run_name`, `state`, `start_time`, and `end_time`
- *logs* section: Contains execution details for each WDL task, including `rendered command`, `start_time`, `end_time`, `exit_code`, `storage_url`, and `outputs` 
- *outputs* section: Contains the WDL main-workflow output mapping from FQN, DRS self-URI, and local file name  

*Example:*  
```
seqslab jobs get --run-id run_DdtSfRfOr2AVTSe
{
    "id": "run_DdtSfRfOr2AVTSe",
    "name": "2022_02_11_WGS_22010402",
    "outputs": {
        "outputs": {
            "GermlineCalling.outFileBam": "1.aligned.duplicates_marked.recalibrated.bam",
            "GermlineCalling.outFileVcf": "1.vcf.gz",
            "GermlineCalling.outFileGvcf": "1.g.vcf.gz",
            ...
        },
        "datasets": {
            "GermlineCalling.outFileBam": "drs://staging.seqslab.net/drs_test_2023-06-27-07-35:GermlineCalling.preprocessing.ApplyBQSR:x.outFileRecalibratedBam_run_DdtSfRfOr2AVTSe_IpFV",
            "GermlineCalling.outFileVcf": "drs://staging.seqslab.net/drs_test_2023-06-27-07-35:GermlineCalling.calling.CaseGenotype:x.outFileVcf_run_DdtSfRfOr2AVTSe_HzjU",
            "GermlineCalling.outFileGvcf": "drs://staging.seqslab.net/drs_test_2023-06-27-07-35:GermlineCalling.calling.HaplotypeCaller:x.outFileGvcf_run_DdtSfRfOr2AVTSe_HVdM",
            ...
        }
    },
    "logs": [
        {
            "name": "Fastp:x",
            "id": 10184,
            "cmd": "set -e -o pipefail\n\n/usr/local/seqslab/fastp/fastp \\\n  --trim_poly_g \\\n  --cut_tail \\\n  --thread \"16\" \\\n  --in1 \"1.R1.fastq.gz\" \\\n  --in2 \"1.R2.fastq.gz\" \\\n  --out1 \"1.trimmed.R1.fastq.gz\" \\\n  --out2 \"1.trimmed.R2.fastq.gz\" \\\n  --report_title \"1\" \\\n  --json \"1.fastp.json\" \\\n  --html \"1.fastp.html\"",
            "start_time": "2023-06-27T07:48:28Z",
            "end_time": "2023-06-27T08:00:57Z",
            "exit_code": 0,
            "outputs": {
                "outputs": {
                    "GermlineCalling.mapping.Fastp.outFileR1Fq": "1.trimmed.R1.fastq.gz",
                    "GermlineCalling.mapping.Fastp.outFileR2Fq": "1.trimmed.R2.fastq.gz",
                    ...
                },
                "datasets": {
                    "GermlineCalling.mapping.Fastp.outFileR1Fq": "drs://staging.seqslab.net/drs_test_2023-06-27-07-35:GermlineCalling.mapping.Fastp:x.outFileR1Fq_run_DdtSfRfOr2AVTSe_spO5",
                    "GermlineCalling.mapping.Fastp.outFileR2Fq": "drs://staging.seqslab.net/drs_test_2023-06-27-07-35:GermlineCalling.mapping.Fastp:x.outFileR2Fq_run_DdtSfRfOr2AVTSe_A4cN",
                    ...
                }
            },
            "stdout": [
                "https://atgxtestws62fccstorage.blob.core.windows.net/seqslab/wes/run_OVbeipQ18wyYtqa/GermlineCalling.mapping.Fastp_x/stdout/0-stdout.log?st=2023-06-30T01%3A50%3A21Z&se=2023-06-30T02%3A50%3A21Z&sp=racwle&spr=https&sv=2020-06-12&sr=d&sdd=4&sig=s/x39tqVY0GvgUdpW4OiABiHXw08dKqFVLL%2BixN53pg%3D",
                ...
            ],
            "stderr": [
                "https://atgxtestws62fccstorage.blob.core.windows.net/seqslab/wes/run_OVbeipQ18wyYtqa/GermlineCalling.mapping.Fastp_x/stderr/0-stderr.log?st=2023-06-30T01%3A50%3A21Z&se=2023-06-30T02%3A50%3A21Z&sp=racwle&spr=https&sv=2020-06-12&sr=d&sdd=4&sig=eGC0Pu9U3k3pFgm6Zp4Li87ZhrOrAlT7DUK3q/IFP/Q%3D",
                ...
            ],
            "activity": [
                "https://atgxtestws62fccstorage.blob.core.windows.net/seqslab/wes/run_OVbeipQ18wyYtqa/audit.log?st=2023-06-30T01%3A50%3A21Z&se=2023-06-30T02%3A50%3A21Z&sp=racwle&spr=https&sv=2020-06-12&sr=b&sig=7kAJ9R7R8yEssuyCJq9ZmqjuDsr6YI6VEEflYLCrun8%3D"
            ]
            },
        ...
    ],
    "state": "COMPLETE",
    "request": {
        "id": 283,
        "name": "2022_01_18_2_WGS_22010402",
        "description": null,
        "workflow_type": "WDL",
        "workflow_type_version": "1.0",
        "workflow_params": { ...
        }
        "workflow_backend_params": { ...
        },
        "workflow_url": "https://api.seqslab.net/trs/v2/tools/trs_wgs_snp_indel/versions/1.0/WDL/files/",
        "tags": []
    },
    "start_time": "2022-02-11T10:41:28Z",
    "end_time": "2022-02-11T13:47:20Z"
}
```

### Output retrieval

*Command:* `datahub download`  

*Description:* Retrieves the job output when the job reaches the *Complete* state  

Running this command downloads the pipeline run output files to a directory on the local computer. You can specify multiple DRS self-URIs or multiple DRS IDs.  

*Example:*  
```
% seqslab datahub download \
    --workspace seqslabwus2 \
    --dst ~/Downloads/ \
    --self-uri drs://api.seqslab.net/drs_ODlEMzEKxhxwc43 drs://api.seqslab.net/drs_Otr1u9pIYAe2JLr 
```
