(cli:run-sheet-routine)=
# Use the SeqsLab Run Sheet with the CLI

The Run Sheet contains all the mapping and configuration information about the data, workflows, and pipeline execution.  

The SeqsLab CLI can take the Run Sheet as a parameter, which simplifies the entire sequencing data processing process, from taking the sequencer output to retrieving the analysis results into just a few SeqsLab CLI commands. You can also eventually transform this flow into a fully automated process.  

## Objective

This tutorial will help you use the SeqsLab [Run Sheet](cli:run-sheet) with the SeqsLab CLI to automate your sequencing data processing flows.

## Prerequisites

Before you begin, you will need the following:

- A SeqsLab Run Sheet
- A running instance of the SeqsLab CLI tool. For details, see [](cli:tutorial-getting-started).

## 1. Upload and register sample files

As previously [explained](cli:tutorial-drs), you can use the SeqsLab CLI to upload either individual files or entire directories to the Data Hub using the ***datahub upload*** command. Alternatively, you can use the [SeqsLab Run Sheet](cli:run-sheet) to upload sample FASTQ files by preparing the Run Sheet file and then running the ***datahub upload-runsheet*** command. Doing so outputs the `upload_response.json` in `stdout`. The CLI uses the return code `0` to indicate that all files in the **src** path were uploaded successfully. Whenever a non-zero return code appears, it means that some 
of the files failed to upload due to a network issue. When this happens, just run the command again to complete the upload process. 

The SeqsLab platform uses the [Azure Block List API](https://docs.microsoft.com/en-us/rest/api/storageservices/put-block-list) (![external link](../images/external-link.png)) whenever you run the SeqsLab CLI ***datahub upload*** command. This enables files to be programmatically broken up into blocks, uploaded in parallel, and re-assembled in the cloud
storage as a [block blob](https://docs.microsoft.com/en-us/rest/api/storageservices/understanding-block-blobs--append-blobs--and-page-blobs) (![external link](../images/external-link.png)). As such, even if the ***datahub upload*** command is executed multiple times, all successfully uploaded blocks are 
kept in the Azure cloud storage as cache and only the failed blocks will be re-transmitted, resulting to a highly efficient and fault-resilient data transmission.

```
seqslab datahub upload-runsheet \
    --run-sheet /home/run-2022-02-26.csv \
    --input-dir /volume/fastq/2022-02-14/ \
    --workspace seqslabwus2 > upload.json
```

Running the ***datahub upload-runsheet*** command provides an `upload_response.json` object for each uploaded sample file, as shown below. Apart from automatically populating the storage related fields, the metadata fields are also filled out based on the Sample Sheet information that was extracted from the Run Sheet.

```
{
    "name": "NA12878-R1_001_R1.fastq.gz",
    "mime_type": "application/gzip",
    "file_type": "fastq.gz",
    "size": 136614814,
    "created_time": "2022-03-03T06:08:23.405391",
    "access_methods": [
        {
            "type": "https",
            "access_url": {
                "url": "https://seqslabapi32b21storage.blob.core.windows.net/seqslab/drs/usr_gNGAlr1m0EYMbEx/seqslab/seqslab/mntcbdh/TestSample/2022_01_18_2/FASTQ/22010262-3M_S39_R1_001.fastq.gz",
                "headers": {
                    "Authorization": null
                }
            },
            "access_tier": "hot",
            "region": "westus2"
        }
    ],
    "checksums": [
        {
            "checksum": "73c643e2d4d473ab339af2360599086b23890a249e3bbc38ca8344606ba109d9",
            "type": "sha256"
        }
    ],
    "status": "complete",
    "description": null,
    "metadata": {
        "dates": [
            {
                "date": "20230803",
                "type": {
                    "value": "sequencing"
                }
            }
        ],
        "types": [
            {
                "method": {
                    "value": "NextSeq FASTQ Only"
                },
                "platform": {
                    "value": "Illumina"
                }
            }
        ],
        "privacy": "",
        "licenses": [],
        "contributors": [],
        "extra_properties": [
            {
                "values": "NA12878",
                "category": "Sample_ID"
            },
            {
                "values": "WGSPanCancerTest",
                "category": "Description"
            },
            {
                "values": "{$.extra_properties[?category=Date][values]}-{$.extra_properties[?category=Description][values]}-{$.extra_properties[?category=Sample_ID][values]}-{$.extra_properties[?category=Pair][values]}",
                "category": "DRS_ID"
            },
            {
                "values": "2023-08-03_WGS_01",
                "category": "Run_Name"
            },
            {
                "values": "WGSPanCancerTest/inputRead/1",
                "category": "Read1_Label"
            },
            {
                "values": "WGSPanCancerTest/inputRead/2",
                "category": "Read2_Label"
            },
            {
                "values": "https://api.seqslab.net/trs/v2/tools/WGSPanCancerTest/versions/0.1.0/WDL/files/",
                "category": "Workflow_URL"
            },
            {
                "values": "phenopacket_9k55CH8eowf6PLA",
                "category": "phenopacketID"
            },
            {
                "values": "biosample_34sjeicjekqk3ji4",
                "category": "BiosampleID"
            },
            {
                "values": "https://api.seqslab.net/trs/v2/tools/WGSPanCancerTest/versions/0.1.0/",
                "category": "DiseaseID"
            },
            {
                "values": "1",
                "category": "Order_Overall"
            },
            {
                "values": "1",
                "category": "Pair"
            },
            {
                "values": "5",
                "category": "IEMFileVersion"
            },
            {
                "values": "2023/08/03",
                "category": "Date"
            }
        ],
        "primary_publication": [],
        "alternate_identifiers": []
    },
    "tags": [
        "2022-03-03_WGS_NA12878/wgs/inputRead"
    ],
    "aliases": [],
    "id": "2022_01_18_2_PGS_22010262-3M_1"
}
```

After the sample files are uploaded, you can then use the ***datahub register*** command to complete the DRS registration process.

```
seqslab datahub register \
    file-blob \
    --workspace seqslabwus2 \
    --stdin < upload.json > register.json
```

## 2. Execute a job with WES

After the sample files are uploaded and registered as DRS objects, you can proceed to executing a job using the Workflow Execution Service (WES). The SeqsLab CLI provides a ***jobs request-runsheet*** command to create a `run-request.json` for each WES run defined in a [Run Sheet](cli:run-sheet). If a Run Sheet defines multiple WES runs, then the ***jobs request-runsheet*** command will generate multiple `run-request.json` files. When the `run-request.json` files are ready, you can use the ***jobs run*** command to launch all WES runs based on the `run-request.json` files in the working directory. The  ***jobs run*** command will then respond with a JSON file indicating the submitted *run_id* and *run_name*.

The following is an example:

```
mkdir /home/run-2022-02-26/

seqslab jobs request-runsheet \
    --working-dir /home/run-2022-02-26/ \
    --run-sheet /home/run-2022-02-26.csv

seqslab jobs run \
    --workspace seqslabwus2 \
    --working-dir /home/run-2022-02-26/ \
    --response-path result.json
```

## 3. Monitor a job run
You can use the ***jobs run-state*** command to check the status of each run until it reaches the COMPLETE state.

```
seqslab jobs run-state --run-id run_DdtSfRfOr2AVTSe
{"run_id": "run_DdtSfRfOr2AVTSe", "state": "COMPLETE"}
```

If you want to get the full run information, you can use the ***job get*** command. Running this command returns the detailed WES run information in JSON format. The response includes basic attributes like the *run_id*, *run_name*, *state*, *start_time*, and *end_time* for run monitoring. It also includes a logs section containing a list of detailed execution information for each WDL task, such as the *rendered command*, *start_time*, *end_time*, *exit_code*, *storage_url*, and *outputs*. Lastly, it includes an outputs section containing a list of the WDL main-workflow level output mapping from FQN, DRS self-URI, and local file name.  

```
seqslab jobs get --run-id run_DdtSfRfOr2AVTSe
{
    "id": "run_DdtSfRfOr2AVTSe",
    "name": "2022_02_11_WGS_22010402",
    "outputs": [
        {
            "fqn": "WGS.sampleMutect2Vcf",
            "cloud": [
                "drs://api.seqslab.net/drs_FxjCfOIBJ8mm89L"
            ],
            "local": [
                "22010402_Mutect2_tumor.vcf.gz"
            ]
        },
        ...
    ],
    "logs": [
        {
            "id": 1558,
            "name": "bwa-x-4643c-run-ddtsfrfor2avtse",
            "cmd": "set -e -o pipefail\n\n/home/tools/bwa-0.7.17/bwa \\\n    mem -M -t 14 ${refFa} \\\n    -R \"@RG\\tID:NextSeq550_${day}\\tSM:${sampleName}\\tPL:NextSeq\\tPI:550\" \\\n    ${inFileFastq} > \\\n    ${outPathSam} 2>> ${outPathLog}\n\n/home/tools/samtools-1.9/samtools \\\n    view -bS \\\n    ${outPathSam} \\\n    -o tmp.bam\n\n/home/tools/samtools-1.9/samtools \\\n    sort tmp.bam \\\n    -o ${outPathBam}",
            "start_time": "2022-02-11T10:41:43Z",
            "end_time": "2022-02-11T11:10:47Z",
            "stdout": "stdout",
            "stderr": "stderr",
            "activity": "../audit.log",
            "storage_url": "abfss://seqslab@seqslabapi32b21storage.dfs.core.windows.net/outputs/wes/run_DdtSfRfOr2AVTSe/WGS.NIPT.Bwa_x/",
            "exit_code": 0,
            "outputs": [
                {
                    "fqn": "WGS.NIPT.Bwa.outFileBam",
                    "cloud": [
                        "drs://api.seqslab.net/drs_DjKkaETD7x7gZBA"
                    ],
                    "local": [
                        "22010402.bam"
                    ]
                },
                {
                    "fqn": "WGS.NIPT.Bwa.outFileLog",
                    "cloud": [
                        "drs://api.seqslab.net/drs_9Er0LWEMDbbCokV"
                    ],
                    "local": [
                        "22010402_Bwa.log"
                    ]
                }
            ]
        },
        ...
    ],
    "state": "COMPLETE",
    "request": {
        "id": 283,
        "name": "2022_01_18_2_WGS_22010402",
        "description": null,
        "workflow_type": "WDL",
        "workflow_type_version": "1.0",
        "workflow_params": { ...
        }
        "workflow_backend_params": { ...
        },
        "workflow_url": "https://api.seqslab.net/trs/v2/tools/trs_wgs_snp_indel/versions/1.0/WDL/files/",
        "tags": []
    },
    "start_time": "2022-02-11T10:41:28Z",
    "end_time": "2022-02-11T13:47:20Z"
}
```

## 4. Retrieve results
Once the run reaches the COMPLETE state, you can retrieve the run result using the ***datahub download*** command. Doing so downloads the pipeline run output files to the local machine. This command can take either multiple DRS self-URIs or multiple DRS IDs, and then downloads them into a destination directory.  

```
% seqslab datahub download \
    --workspace seqslabwus2 \
    --dst ~/Downloads/ \
    --self-uri drs://api.seqslab.net/drs_ODlEMzEKxhxwc43 drs://api.seqslab.net/drs_Otr1u9pIYAe2JLr 
```
