(cli:tutorial-trs-execs)=
# Configure the execution file

After [developing a tool](cli:tutorial-trs-dev), the WDL directory (containing all imported OpenWDL files), `inputs.json` file, and registered Docker runtime images should be ready. At this point, you can proceed to the next step of the [tool onboarding process](cli:tutorial-trs), which is preparing the SeqsLab `execs.json` file.

The following diagram provides an overview of the entire process, which includes several manual steps.
 
![TRS-execs](../images/TRS-execs2.png)

## Create the `execs.json` file
The first step is to generate an `execs.json` template using the SeqsLab CLI ***tools execs*** command.  

```
seqslab tools execs \
    --working-dir /home/ubuntu/seqslab_workflows/src/ \
    --inputs inputs/inputs_germline-gatk4-snpindel_hg38.json \
    --main-wdl wdl/germline-gatk4-snpindel.wdl \
    --output execs/germline-gatk4-snpindel.json
```   

The `execs.json` file is extended from `inputs.json`. In addition to the **inputs** section, the `execs.json` file contains the **connections**, **workflows**, **calls**, **configs**, and **operator_pipelines** sections. Running the ***tools execs*** command
generates a template for `execs.json` containing the information from the `inputs.json` file, default SeqsLab configuration settings, and the additional sections which you will need to manually complete.  

## Inputs section

### Sample-specific FQNs / static-reference FQNs
The inputs section includes the mapping of fully qualified names (FQNs) of a WDL workflow to their corresponding values. The FQNs can be categorized as either a static-reference group, which remains unchanged for all samples run on this workflow, or as a sample-specific group, which is assigned in a sample-by-sample manner.  

The following example uses the GATK4Fq2Gvcf workflow, where `GATK4Fq2Gvcf.refFasta`, `GATK4Fq2Gvcf.dbSNPVcf`, and `GATK4Fq2Gvcf.knownIndelsSitesVCFs` all belong to the
static-reference group since they are static reference genome files. Meanwhile, `GATK4Fq2Gvcf.fastqFiles` and `GATK4Fq2Gvcf.sampleName` both belong to the sample-specific group.  

```   
{
  # sample-specific FQNs
  "GATK4Fq2Gvcf.fastqFiles": [
    "/mnt/reads/NA12878_r1.fq.gz",
    "/mnt/reads/NA12878_r2.fq.gz"
  ],
  "GATK4Fq2Gvcf.sampleName": "NA12878",

  # static-reference FQNs
  "GATK4Fq2Gvcf.refFasta": "hg38/Homo_sapiens_assembly38.fasta",
  "GATK4Fq2Gvcf.refFastaIndex": "hg38/Homo_sapiens_assembly38.fasta.fai",
  "GATK4Fq2Gvcf.refBwt": "hg38/Homo_sapiens_assembly38.fasta.64.bwt",
  "GATK4Fq2Gvcf.refDict": "hg38/Homo_sapiens_assembly38.dict",
  "GATK4Fq2Gvcf.gatkPath": "/gatk/gatk-4.2.0.0/gatk",
  ...
}
```

It is important to differentiate the two types of FQNs because it affects how a tool is registered on the SeqsLab TRS. For the static-reference group, the FQNs remain constant. Meanwhile, for the sample-specific group, the FQNs change according to each given sample file, which is presumably registered as a Data Repository Service (DRS) object.  


### Sample-DRS-metadata template
SeqsLab platform provides a sample-DRS-metadata template syntax, in the format of `~{FQN:metadata.key}`, to render constant 
sample-specific FQN values based on the ***DRS metadata of a specific FQN***. DRS metadata are registered during the [sample files upload and registration process](cli:run-sheet-routine) based on the [Run Sheet](cli:run-sheet) information.
 
In the previous example, we configured the FQN `GATK4Fq2Gvcf.sampleName` as `"~{GATK4Fq2Gvcf.fastqFiles:sample.Sample_ID}"`, indicating that SeqsLab should render `GATK4Fq2Gvcf.sampleName` based on the metadata `sample.Sample_ID` of the DRS object assigned to the FQN of `GATK4Fq2Gvcf.fastqFiles`. The DRS metadata is attached in the sample data upload and register process, which is managed with the [***datahub upload-runsheet***](cli:run-sheet-routine) command information. As such, the constant FQN `GATK4Fq2Gvcf.sampleName` can be altered along with the FQN `GATK4Fq2Gvcf.fastqFiles`, and the TRS tool configured accordingly can be used with many different samples.

```   
{
  # sample-specific FQNs
  "GATK4Fq2Gvcf.fastqFiles": [
    "/mnt/reads/NA12878_r1.fq.gz",
    "/mnt/reads/NA12878_r2.fq.gz"
  ],
  "GATK4Fq2Gvcf.sampleName": "~{GATK4Fq2Gvcf.fastqFiles:sample.Sample_ID}",

  # static-reference FQNs
  "GATK4Fq2Gvcf.refFasta": "hg38/Homo_sapiens_assembly38.fasta",
  "GATK4Fq2Gvcf.refFastaIndex": "hg38/Homo_sapiens_assembly38.fasta.fai",
  "GATK4Fq2Gvcf.refBwt": "hg38/Homo_sapiens_assembly38.fasta.64.bwt",
  "GATK4Fq2Gvcf.refDict": "hg38/Homo_sapiens_assembly38.dict",
  "GATK4Fq2Gvcf.gatkPath": "/gatk/gatk-4.2.0.0/gatk",
  ...
}
```

### Hard code sample-specific information 
For cases where the DRS objects are not registered with metadata from the [Run Sheet](cli:run-sheet) information, the 
sample-specific FQNs should be treated as static-reference FQNs, meaning all information should be specified 
without sample-DRS-metadata template mechanism.

## Connections section 

### DRS_ID for file FQNs
The **connections** section provides a mapping of each WDL file-typed FQN to its local and cloud paths. The file-typed FQNs can also be separated into a sample-specific FQN group and a static-reference FQN group. The `execs.json` template has the FQNs and the local paths filled based on the `inputs.json` file, but the cloud paths are left blank, as shown in the following example. 

```
"connections": [
    # sample-specific FQNs
    {
        "fqn": "GATK4Fq2Gvcf.fastqFiles",
        "local": [
            "/mnt/reads/NA12878_r1.fq.gz",
            "/mnt/reads/NA12878_r2.fq.gz"
        ],
        "cloud": []
    },
    
    # static-reference FQNs
    {
        "fqn": "GATK4Fq2Gvcf.refFasta",
        "local": [
            "hg38/Homo_sapiens_assembly38.fasta"
        ],
        "cloud": []
    },
    {
        "fqn": "GATK4Fq2Gvcf.refFastaIndex",
        "local": [
            "hg38/Homo_sapiens_assembly38.fasta.fai"
        ],
        "cloud": []
    },
    {
        "fqn": "GATK4Fq2Gvcf.refSa",
        "local": [
            "hg38/Homo_sapiens_assembly38.fasta.64.sa"
        ],
        "cloud": []
    },
...
]
```

#### Static-reference FQNs
For each static-reference FQNs, you will need to fill in the DRS URI in the cloud section, so as to identify which DRS object will actually be used when the tool is executed on the SeqsLab platform.

Atgenomix recommends two methods for retrieving the DRS URI. The first method uses the ***datahub search*** command that can take either the DRS object tag or DRS object name as a query parameter to find the corresponding DRS URI.

```
seqslab datahub search --name Homo_sapiens_assembly38.fasta
seqslab datahub search --tag hg38/Homo_sapiens_assembly38-fasta
```

Below is an example output for the above command:

```
{
    "objects": [
        {
            "self_uri": "drs://api.seqslab.net/drs_010MAvDKw23Y5yb",
            "name": "Homo_sapiens_assembly38.fasta",
            "id": "hg38_Homo_sapiens_assembly38-fasta",
            "tags": [
                "hg38/Homo_sapiens_assembly38.fasta"
            ]
        }
    ]
}
```

The second method makes use of custom DRS IDs and tags to simplify the query process. As described in the [Customizing the DRS metadata](cli:tutorial-drs) section, you can create the customized DRS ID **hg38_Homo_sapiens_assembly38-fasta** based on a simple conversion rule using the local path information (**hg38/Homo_sapiens_assembly38.fasta**). As such, the DRS URI of the corresponding DRS object can be directly inferred from the hostname (`drs://api.seqslab.net/`) and the local path.

#### Sample-specific FQNs
For the sample-specific FQNs, on the other hand, Atgenomix recommends leaving the cloud section empty, so that the SeqsLab Workflow Execution Service (WES) runtime DRS object resolving mechanism will take effect. The mechanism resolves the DRS object at runtime based on the DRS object tags, which are specified in the *Run_Name*, *Read1_Tag*, and *Read2_Tag* columns of the [Run Sheet](cli:run-sheet).

```
"connections": [
    # sample-specific FQNs, leave it blank for WES runtime DRS to resolve
    {
        "fqn": "GATK4Fq2Gvcf.fastqFiles",
        "local": [
            "/mnt/reads/NA12878_r1.fq.gz",
            "/mnt/reads/NA12878_r2.fq.gz"
        ],
        "cloud": []
    },
    
    # static-reference FQNs, fill DRS ID for the cloud section of each FQN
    {
        "fqn": "GATK4Fq2Gvcf.refFasta",
        "local": [
            "hg38/Homo_sapiens_assembly38.fasta"
        ],
        "cloud": ["drs://api.seqslab.net/drs_010MAvDKw23Y5yb"]
    },
    {
        "fqn": "GATK4Fq2Gvcf.refFastaIndex",
        "local": [
            "hg38/Homo_sapiens_assembly38.fasta.fai"
        ],
        "cloud": ["drs://api.seqslab.net/drs_bIW4jKMob4tEijO"]
    },
    {
        "fqn": "GATK4Fq2Gvcf.refSa",
        "local": [
            "hg38/Homo_sapiens_assembly38.fasta.64.sa"
        ],
        "cloud": ["drs://api.seqslab.net/drs_A25LXPutuqxiYHt"]
    },
...
]
```

### Hard code sample-specific information
For cases where the DRS objects are not registered with metadata from the [Run Sheet](cli:run-sheet) information, the 
sample-specific FQNs should be treated as static-reference FQNs, meaning all information should be specified 
without sample-DRS-metadata template mechanism.


## Workflow section
The **workflow** section provides a list of files that will be registered in the TRS object, where all WDL files, the `inputs.json` file, and the `execs.json` file should be included. For each of the WDL files, the `execs.json` template has the *file_type*, *path*, and *name* properties filled out.  

You will need to provide the Docker runtime images information. You will also need to replace the instances of "inputs.json" and "execs.json" in the template with their relative paths in the working directory.
   
```
"workflow": [
    {
        "name": "e2e-gatk4-germline-snp-indels.wdl",
        "path": "e2e-workflows/atgx/e2e-gatk4-germline-snp-indels.wdl",
        "file_type": "PRIMARY_DESCRIPTOR",
        "image_name": ""
    },
    {
        "name": "processing-for-variant-discovery-gatk4.wdl",
        "path": "gatk4-data-processing/processing-for-variant-discovery-gatk4.wdl",
        "file_type": "SECONDARY_DESCRIPTOR",
        "image_name": ""
    },
    {
        "name": "haplotypecaller-gvcf-gatk4.wdl",
        "path": "gatk4-germline-snps-indels/haplotypecaller-gvcf-gatk4.wdl",
        "file_type": "SECONDARY_DESCRIPTOR",
        "image_name": ""
    },
    {
        "path": "inputs.json",
        "file_type": "TEST_FILE"
    },
    {
        "path": "exec.json",
        "file_type": "EXECUTION_FILE"
    }
],
```

The following is an example of a completed workflow section:

```
"workflow": [
    {
        "name": "e2e-gatk4-germline-snp-indels.wdl",
        "path": "e2e-workflows/atgx/e2e-gatk4-germline-snp-indels.wdl",
        "file_type": "PRIMARY_DESCRIPTOR",
        "image_name": "germline-gatk4-snpindel-1.0_ubuntu-20.04:2022-03-01-01-03"
    },
    {
        "name": "processing-for-variant-discovery-gatk4.wdl",
        "path": "gatk4-data-processing/processing-for-variant-discovery-gatk4.wdl",
        "file_type": "SECONDARY_DESCRIPTOR",
        "image_name": "germline-gatk4-snpindel-1.0_ubuntu-20.04:2022-03-01-01-03"
    },
    {
        "name": "haplotypecaller-gvcf-gatk4.wdl",
        "path": "gatk4-germline-snps-indels/haplotypecaller-gvcf-gatk4.wdl",
        "file_type": "SECONDARY_DESCRIPTOR",
        "image_name": "germline-gatk4-snpindel-1.0_ubuntu-20.04:2022-03-01-01-03"
    },
    {
        "path": "inputs/hg38-e2e-gatk4-germline-snp-indels.json",
        "file_type": "TEST_FILE"
    },
    {
        "path": "execs/hg38-e2e-gatk4-germline-snp-indels.json",
        "file_type": "EXECUTION_FILE"
    }
],
```

## Config section
The **config** section provides a full list of file-typed internal FQNs of the workflows, and their corresponding *operator_pipeline* settings. By default, all file-typed internal FQNs will be assigned to the default *operator_pipeline* setting ***opp_generic-singular_auto***, which does not apply any data parallelization scheme. As such, for a tool designed to be run without parallel execution enhancement, the config section can be left as is. However, for tools that require data parallelization, additional steps are required. For details, see [Pipeline operators](operators:pipeline-operators).

## Call section
The **call** section provides all the call-names, such as nodes in the WDL workflow DAG graph, which might be a task or sub-workflow. The SeqsLab platform supports call-name based runtime options assignment in [Run Sheet](cli:run-sheet-routine) for execution optimization.