Create a SeqsLab Run Sheet for sequencing experiments#
The SeqsLab Run Sheet is a CSV file that was directly extended from the Sample Sheet (),
a file format used by sequencer providers for storing biological sample information and metadata associated with a given experiment.
Objective#
This tutorial will help you create a SeqsLab Run Sheet.
Prerequisites#
Before you begin, you will need the following:
A sample dataset
Sample Sheet example#
[Data] |
|||||||||
Sample_ID |
Sample_Name |
Sample_Plate |
Sample_Well |
I7_Index_ID |
index |
I5_Index_ID |
index2 |
Sample_Project |
Description |
21120276 |
A701 |
ATCACGAC |
A501 |
AAGGTTCA |
WGS |
||||
21120287 |
A701 |
ATCACGAC |
A501 |
AAGGTTCA |
WES |
||||
21070477 |
A701 |
ATCACGAC |
A501 |
AAGGTTCA |
RNASeq |
||||
21120248 |
A701 |
ATCACGAC |
A501 |
AAGGTTCA |
RNASeq |
||||
21120275 |
A701 |
ATCACGAC |
A501 |
AAGGTTCA |
RNASeq |
||||
21120249-t |
A701 |
ATCACGAC |
A501 |
AAGGTTCA |
somatic |
||||
21120249-n |
A701 |
ATCACGAC |
A501 |
AAGGTTCA |
somatic |
How the Run Sheet is different#
Unlike the Sample Sheet, the Run Sheet further defines six additional columns for each row of data: DRS_ID, Read1_Tag, Read2_Tag, Run_Name, Workflow_URL, and Runtimes. The Run Sheet serves as a dry lab overview plan, specifying the mapping among sequencing sample files, DRS objects, TRS workflows, and WES executions for all samples submitted in a sequencer run. The Run Sheet is also a critical input for the SeqsLab CLI in the dry lab daily routine, serving as a link for the entire process from sample FASTQ uploading, DRS registration, DRS objects to TRS workflows mapping, and eventually to WES execution.
To create your own Run Sheet, you will need to modify the sample sheet template to include the following fields:
Column |
Description |
---|---|
DRS_ID |
Associates physical sample files to the data-virtualized DRS object by assigning a DRS_ID rule using the existing Sample Sheet metadata. |
Read1_Tag |
Associates a read1 DRS object to a specific TRS workflow by assigning a WDL FQN as a DRS object tag to a read 1 DRS object, e.g., WGS.read/1. |
Read2_Tag |
Associates a read2 DRS object to a specific TRS workflow by assigning a WDL FQN as a DRS object tag to a read 2 DRS object, e.g., WGS.read/2. |
Run_Name |
Associates DRS objects to a specific WES run by assigning a unique and indicative run name, generally based on the Sample Sheet information, e.g., 2022-02-23_WGS_NA12878. The Run_Name is also used as the Base_Tag for the DRS object. |
Workflow_URL |
TRS workflow_url, which specifies the TRS to be used for the WES run. |
Runtimes |
Specifies the WES execution runtimes configuration in the format of key-value pairs of WDL call-name and WES runtime options, e.g., WGS_main_workflow=SeqsLab.Accelerate.GCH1:BWA_mapping_workflow=SeqsLab.Accelerate.GCS1. The default value is an empty string. |
DRS_ID rule and supported Sample Sheet metadata list#
The SeqsLab CLI uses the sample-sheet package () to do Run Sheet parsing, and generates the Sample Sheet metadata for each individual sequenced FASTQ file during the samples upload and registration process. By default, the Sample Sheet metadata is categorized into header, sample, and file, as shown in the following example.
"metadata": {
"header": {
"IEMFileVersion": "5",
"Date": "2022_02_24",
"Workflow": "FASTQ",
"Application": "NextSeq FASTQ Only",
"Instrument_Type": "NextSeq",
"Assay": "QIASeq FX and cfDNA",
"Index_Adapters": "QIASeq FX and cfDNA (Plate)",
"Description": "",
"Chemistry": "Amplicon"
},
"sample": {
"Sample_ID": "NA12878",
"Sample_Name": "",
"Sample_Plate": "",
"Sample_Well": "",
"Index_Plate_Well": "C10",
"I7_Index_ID": "N000",
"index": "TGACCAGC",
"I5_Index_ID": "S000",
"index2": "TCTTCCAT",
"Sample_Project": "",
"Description": "WGS"
},
"file": {
"Pair": "1"
}
An example DRS_ID rule can use a nested dictionary syntax that is chained with a hyphen (-
) as a separator character. For example,
{header.Date}-{sample.Description}-{sample.Sample_ID}-{file.Pair}, which will render the DRS_ID as
20220224-WGS-NA12878-1 for the sample FASTQ file NA12878_r1.fastq.gz
.
Run_Name#
Specify a unique name for each WES run. Atgenomix recommends creating a Run_Name
based on the Sample Sheet metadata to make it both unique and meaningful.
For a multi-sample WES run use case, we recommend using the template *{header.Date}_{sample.Description}*
for the Run_Name
. For example, you can use 2022-02-23_WGS for a WES run that will be shared by multiple WGS samples in a sequencing run from 2022-02-23.
For a single-sample WES run use case, we recommend using the template *{header.Date}_{sample.Description}_{sample.Sample_ID}*
for a sample-specific Run_Name
. For example, you can use 2022-02-23_WGS_NA12878` for a specific WGS NA12878 sample in a sequencing run from 2022-02-23.
Workflow_URL#
Specify a TRS object that is going to be applied on the given sample. By using the SeqsLab CLI tools list command, you can get the URL corresponding to each TRS tool version, and the workflow_url
can be obtained by appending the string {descriptor_type}/files/
to the TRS tool version URL. For example, WDL/files/
.
seqslab tools list | grep \"url\"
"url": "https://api.seqslab.net/trs/v2/tools/trs_wgs/versions/1.0/"
"url": "https://api.seqslab.net/trs/v2/tools/trs_wgs/versions/2.0/"
"url": "https://api.seqslab.net/trs/v2/tools/trs_wgs/versions/3.0/"
...
Runtimes#
Specify the runtime computation configuration for the WES run. This parameter takes the format of concatenated colon separated key-value pairs, with the key and value respectively indicating a call name between the workflow object model (WOM) graph () of the TRS object and the cluster specification. By default, this field can be left blank, indicating that the main workflow of the TRS object will be executed using the SeqsLab default cluster acu-m8.
Customization fields#
Apart from the 6 additional columns, the Run Sheet can take extra customized columns to facilitate other dry lab integration as long as the extended Run Sheet is in the CSV file format. For example, you can add the column Download_FQNs to specify a list of WDL output FQNs. Customized scripts taking the Run Sheet can then be used to parse the given Download_FQNs for each sample, and then download the WDL output FQNs by using the datahub download command.
Run Sheet example#
[Data] |
|||||||||||||||
Sample_ID |
Sample_Name |
Sample_Plate |
Sample_Well |
I7_Index_ID |
index |
I5_Index_ID |
index2 |
Sample_Project |
Description |
DRS_ID |
Run_Name |
Read1_Tag |
Read2_Tag |
Workflow_URL |
Runtimes |
21120276 |
A701 |
ATCACGAC |
A501 |
AAGGTTCA |
WGS |
{header.Date}-{sample.Description}-{sample.Sample_ID}-{file.Pair} |
2021-07-29_WGS_21120276 |
WGS/inputRead/1 |
WGS/inputRead/2 |
https://api.seqslab.net/trs/v2/tools/trs_wgs/versions/1.0/WDL/files/ |
|||||
21120287 |
A701 |
ATCACGAC |
A501 |
AAGGTTCA |
WES |
{header.Date}-{sample.Description}-{sample.Sample_ID}-{file.Pair} |
2021-07-29_WES_21120287 |
WES/inputRead/1 |
WES/inputRead/2 |
https://api.seqslab.net/trs/v2/tools/trs_wes/versions/1.0/WDL/files/ |
WES=acu-m8:bamPartition=acu-m16 |
||||
21070477 |
A701 |
ATCACGAC |
A501 |
AAGGTTCA |
RNASeq |
{header.Date}-{sample.Description}-{sample.Sample_ID}-{file.Pair} |
2021-07-29_RNASeq |
RNASeq/inputRead/1/1 |
RNASeq/inputRead/1/2 |
https://api.seqslab.net/trs/v2/tools/trs_rnaseq/versions/1.0/WDL/files/ |
|||||
21120248 |
A701 |
ATCACGAC |
A501 |
AAGGTTCA |
RNASeq |
{header.Date}-{sample.Description}-{sample.Sample_ID}-{file.Pair} |
2021-07-29_RNASeq |
RNASeq/inputRead/2/1 |
RNASeq/inputRead/2/2 |
https://api.seqslab.net/trs/v2/tools/trs_rnaseq/versions/1.0/WDL/files/ |
|||||
21120275 |
A701 |
ATCACGAC |
A501 |
AAGGTTCA |
RNASeq |
{header.Date}-{sample.Description}-{sample.Sample_ID}-{file.Pair} |
2021-07-29_RNASeq |
RNASeq/inputRead/3/1 |
RNASeq/inputRead/3/2 |
https://api.seqslab.net/trs/v2/tools/trs_rbaseq/versions/1.0/WDL/files/ |
|||||
21120249-t |
A701 |
ATCACGAC |
A501 |
AAGGTTCA |
somatic |
{header.Date}-{sample.Description}-{sample.Sample_ID}-{file.Pair} |
2021-07-29_somatic |
somatic/inputReadTumor/1 |
somatic/inputReadTumor/2 |
https://api.seqslab.net/trs/v2/tools/trs_somatic/versions/1.0/WDL/files/ |
somatic=acu-m64:Calling=acu-m8 |
||||
21120249-n |
A701 |
ATCACGAC |
A501 |
AAGGTTCA |
somatic |
{header.Date}-{sample.Description}-{sample.Sample_ID}-{file.Pair} |
2021-07-29_somatic |
somatic/inputReadNormal/1 |
somatic/inputReadNormal/2 |
https://api.seqslab.net/trs/v2/tools/trs_somatic/versions/1.0/WDL/files/ |
somatic=acu-m64:Calling=acu-m8 |