Create a SeqsLab Run Sheet for sequencing experiments#
The SeqsLab Run Sheet is a CSV file that was directly extended from the Sample Sheet (), a file format used by sequencer providers for storing biological sample information and metadata associated with a given experiment.
Objective#
This tutorial will help you create a SeqsLab Run Sheet.
Sample Sheet example#
[Data] |
|||||||||
Sample_ID |
Sample_Name |
Sample_Plate |
Sample_Well |
I7_Index_ID |
index |
I5_Index_ID |
index2 |
Sample_Project |
Description |
21120276 |
A701 |
ATCACGAC |
A501 |
AAGGTTCA |
WGS |
||||
21120287 |
A701 |
ATCACGAC |
A501 |
AAGGTTCA |
WES |
||||
21070477 |
A701 |
ATCACGAC |
A501 |
AAGGTTCA |
RNASeq |
||||
21120248 |
A701 |
ATCACGAC |
A501 |
AAGGTTCA |
RNASeq |
||||
21120275 |
A701 |
ATCACGAC |
A501 |
AAGGTTCA |
RNASeq |
||||
21120249-t |
A701 |
ATCACGAC |
A501 |
AAGGTTCA |
somatic |
||||
21120249-n |
A701 |
ATCACGAC |
A501 |
AAGGTTCA |
somatic |
How the Run Sheet is different#
Unlike the Sample Sheet, the Run Sheet further defines six additional columns for each row of data: DRS_ID, Read1_Label, Read2_Label, Run_Name, Workflow_URL, and Runtimes. The Run Sheet serves as a dry lab overview plan, specifying the mapping among sequencing sample files, DRS objects, TRS workflows, and WES executions for all samples submitted in a sequencer run. The Run Sheet is also a critical input for the SeqsLab CLI in the dry lab daily routine, serving as a link for the entire process from sample FASTQ uploading, DRS registration, DRS objects to TRS workflows mapping, and eventually to WES execution.
To create your own Run Sheet, you will need to modify the sample sheet template to include the following fields:
Column |
Description |
---|---|
DRS_ID |
Associates physical sample files to the data-virtualized DRS object by assigning a DRS_ID rule using the existing Sample Sheet metadata. |
Read1_Label |
Associates a read1 DRS object to a specific TRS workflow by assigning a WDL FQN as a DRS object label to a read 1 DRS object, e.g., WGS.read/1. |
Read2_Label |
Associates a read2 DRS object to a specific TRS workflow by assigning a WDL FQN as a DRS object label to a read 2 DRS object, e.g., WGS.read/2. |
Run_Name |
Associates DRS objects to a specific WES run by assigning a unique and indicative run name, generally based on the Sample Sheet information, e.g., 2022-02-23_WGS_NA12878. The Run_Name is also used as the Base_Label for the DRS object. |
Workflow_URL |
TRS workflow_url, which specifies the TRS to be used for the WES run. |
Runtimes |
Specifies the WES execution runtimes configuration in the format of key-value pairs of WDL call-name and WES runtime options, e.g., WGS_main_workflow=SeqsLab.Accelerate.GCH1:BWA_mapping_workflow=SeqsLab.Accelerate.GCS1. The default value is an empty string. |
DRS_ID rule and supported Sample Sheet metadata list#
The SeqsLab CLI uses the sample-sheet package () to do Run Sheet parsing, and generates the Sample Sheet metadata for each individual sequenced FASTQ file during the samples upload and registration process.
"metadata": {
"dates": [
{
"date": "20230803",
"type": {
"value": "sequencing"
}
}
],
"types": [
{
"method": {
"value": "NextSeq FASTQ Only"
},
"platform": {
"value": "Illumina"
}
}
],
"privacy": "",
"licenses": [],
"contributors": [],
"extra_properties": [
{
"values": "NA12878",
"category": "Sample_ID"
},
{
"values": "WGSPanCancerTest",
"category": "Description"
},
{
"values": "{$.extra_properties[?category=Date][values]}-{$.extra_properties[?category=Description][values]}-{$.extra_properties[?category=Sample_ID][values]}-{$.extra_properties[?category=Pair][values]}",
"category": "DRS_ID"
},
{
"values": "2023-08-03_WGS_01",
"category": "Run_Name"
},
{
"values": "WGSPanCancerTest/inputRead/1",
"category": "Read1_Label"
},
{
"values": "WGSPanCancerTest/inputRead/2",
"category": "Read2_Label"
},
{
"values": "https://api.seqslab.net/trs/v2/tools/WGSPanCancerTest/versions/0.1.0/WDL/files/",
"category": "Workflow_URL"
},
{
"values": "phenopacket_9k55CH8eowf6PLA",
"category": "phenopacketID"
},
{
"values": "biosample_34sjeicjekqk3ji4",
"category": "BiosampleID"
},
{
"values": "https://api.seqslab.net/trs/v2/tools/WGSPanCancerTest/versions/0.1.0/",
"category": "DiseaseID"
},
{
"values": "1",
"category": "Order_Overall"
},
{
"values": "1",
"category": "Pair"
},
{
"values": "5",
"category": "IEMFileVersion"
},
{
"values": "2023/08/03",
"category": "Date"
}
],
"primary_publication": [],
"alternate_identifiers": []
}
An example DRS_ID rule can given based on jsonpath that is chained with a hyphen (-
) as a separator character. For example, {$.extra_properties[?category=Date][values]}-{$.extra_properties[?category=Description][values]}-{$.extra_properties[?category=Sample_ID][values]}-{$.extra_properties[?category=Pair][values]}
, which will render the DRS_ID as 20230803-WGSPanCancerTest-NA12878-1 for the sample FASTQ file NA12878_r1.fastq.gz
.
Run_Name#
Specify a unique name for each WES run. Atgenomix recommends creating a Run_Name
based on the Sample Sheet metadata to make it both unique and meaningful.
For a multi-sample WES run use case, we recommend using the template *{$.extra_properties[?category=Date][values]}-{$.extra_properties[?category=Description][values]}*
for the Run_Name
. For example, you can use 20230803-WGSPanCancerTest for a WES run that will be shared by multiple WGS samples in a sequencing run from 20230803.
For a single-sample WES run use case, we recommend using the template *{$.extra_properties[?category=Date][values]}-{$.extra_properties[?category=Description][values]}-{$.extra_properties[?category=Sample_ID][values]}*
for a sample-specific Run_Name
. For example, you can use 20230803_WGSPanCancerTest_NA12878` for a specific WGS NA12878 sample in a sequencing run from 2023-08-03.
DRS Labels#
The SeqsLab DRS service supports labeling, and the Run Sheet uses the labels of {Run_Name}/{Read1_Label}
and {Run_Name}/{Read2_Label}
to associate the DRS object, TRS object, and WES run. To establish the relationship between DRS and WES, Run Sheet uses the Run_Name
to associate all DRS objects with the {Run_Name}
as the root label and the corresponding WES run.
To establish the relationship between DRS and TRS, Run Sheet uses Read1_Label
and Read2_Label
to associate the corresponding sequencing sample FASTQ read 1 and read 2 files to a WDL FQN of a TRS object. For example, for a TRS object wrapping a WGS GATK4 SNP/INDEL, the WDL defines the input FASTQ files as WGS_HaplotypeCallerGvcf_GATK4.fastq_files
.
{
"WGS_HaplotypeCallerGvcf_GATK4.fastq_files": [
"NA12878_r1.fq.gz",
"NA12878_r2.fq.gz"
],
...
}
By assigning the Read1_Label
and Read2_Label
as follows, we can associate the DRS objects of the NA12878_r1.fq.gz
and NA12878_r1.fq.gz
files to the corresponding WDL FQN of WGS_HaplotypeCallerGvcf_GATK4.fastq_files
. The SeqsLab DRS labels support a directory-like, hierarchical query and the FQN separator .
is replaced with /
in the Read1_Label
and Read2_Label
to enhance future data accessibility.
[Data] |
|||||||
Sample_ID |
Description |
DRS_ID |
Run_Name |
Read1_Label |
Read2_Label |
Workflow_URL |
Runtimes |
NA12878 |
WGS |
{header.Date}-{sample.Description}-{sample.Sample_ID}-{file.Pair} |
2022-02-23_WGS_NA12878 |
WGS_HaplotypeCallerGvcf_GATK4/fastq_files/1 |
WGS_HaplotypeCallerGvcf_GATK4/fastq_files/2 |
https://api.seqslab.net/trs/v2/tools/trs_WGS/versions/1.0/WDL/files/ |
This sample mechanism can be extended to a multiple-sample WDL example by assigning an additional layer of labeling to match the two-dimension array of the WDL FQN example.
{
"WGS_HaplotypeCallerGvcf_GATK4.fastq_files": [
[
"NA12878_r1.fq.gz",
"NA12878_r2.fq.gz"
],
[
"NA12879_r1.fq.gz",
"NA12879_r2.fq.gz"
],
],
...
}
[Data] |
|||||||
Sample_ID |
Description |
DRS_ID |
Run_Name |
Read1_Label |
Read2_Label |
Workflow_URL |
Runtimes |
NA12878 |
WGS |
{header.Date}-{sample.Description}-{sample.Sample_ID}-{file.Pair} |
2022-02-23_WGS |
WGS_HaplotypeCallerGvcf_GATK4/fastq_files/1/1 |
WGS_HaplotypeCallerGvcf_GATK4/fastq_files/1/2 |
https://api.seqslab.net/trs/v2/tools/trs_WGS/versions/1.0/WDL/files/ |
|
NA12879 |
WGS |
{header.Date}-{sample.Description}-{sample.Sample_ID}-{file.Pair} |
2022-02-23_WGS |
WGS_HaplotypeCallerGvcf_GATK4/fastq_files/2/1 |
WGS_HaplotypeCallerGvcf_GATK4/fastq_files/2/2 |
https://api.seqslab.net/trs/v2/tools/trs_WGS/versions/1.0/WDL/files/ |
Workflow_URL#
Specify a TRS object that is going to be applied on the given sample. By using the SeqsLab CLI tools list command, you can get the URL corresponding to each TRS tool version, and the workflow_url
can be obtained by appending the string {descriptor_type}/files/
to the TRS tool version URL. For example, WDL/files/
.
seqslab tools list | grep \"url\"
"url": "https://api.seqslab.net/trs/v2/tools/trs_wgs/versions/1.0/"
"url": "https://api.seqslab.net/trs/v2/tools/trs_wgs/versions/2.0/"
"url": "https://api.seqslab.net/trs/v2/tools/trs_wgs/versions/3.0/"
...
Runtimes#
Specify the runtime computation configuration for the WES run. This parameter takes the format of concatenated colon separated key-value pairs, with the key and value respectively indicating a call name between the workflow object model (WOM) graph () of the TRS object and the cluster specification. By default, this field can be left blank, indicating that the main workflow of the TRS object will be executed using the SeqsLab default cluster acu-m8.
Customization fields#
Apart from the 6 additional columns, the Run Sheet can take extra customized columns to facilitate other dry lab integration as long as the extended Run Sheet is in the CSV file format. For example, you can add the column Download_FQNs to specify a list of WDL output FQNs. Customized scripts taking the Run Sheet can then be used to parse the given Download_FQNs for each sample, and then download the WDL output FQNs by using the datahub download command.
In another example, ga4gh phenopacket information, e.g. PhenopacketID
, BiosampleID
, and DiseaseID
can also be added as customization fields, so that association of phenotypic or clinical information to the DRS object and the WES can be achieved.
Column |
Description |
---|---|
PhenopacketID |
Associate DRS objects to a specific ga4gh phenopacket object indicating phenotypic information. |
BiosampleID |
Associate DRS objects to a specific ga4gh phenopacket biosample object indicating the biology sample information. |
DiseaseID |
Associate DRS objects to a specific disease or testing, usually an ontology ID |
Run Sheet example#
[Data] |
||||||||||||||||||
Sample_ID |
Sample_Name |
Sample_Plate |
Sample_Well |
I7_Index_ID |
index |
I5_Index_ID |
index2 |
Sample_Project |
Description |
DRS_ID |
Run_Name |
Read1_Label |
Read2_Label |
Workflow_URL |
Runtimes |
PhenopacketID |
BiosampleID |
DiseaseID |
21120276 |
A701 |
ATCACGAC |
A501 |
AAGGTTCA |
WGS |
{header.Date}-{sample.Description}-{sample.Sample_ID}-{file.Pair} |
2021-07-29_WGS_21120276 |
WGS/inputRead/1 |
WGS/inputRead/2 |
https://api.seqslab.net/trs/v2/tools/trs_wgs/versions/1.0/WDL/files/ |
PH_Me71Y8tCewj2Z |
BS_Me71Y8tCewj2Z |
DIS_493LaFKEkOf8I |
|||||
21120287 |
A701 |
ATCACGAC |
A501 |
AAGGTTCA |
WES |
{header.Date}-{sample.Description}-{sample.Sample_ID}-{file.Pair} |
2021-07-29_WES_21120287 |
WES/inputRead/1 |
WES/inputRead/2 |
https://api.seqslab.net/trs/v2/tools/trs_wes/versions/1.0/WDL/files/ |
WES=acu-m8:bamPartition=acu-m16 |
PH_7kQK62zwrxPkc |
BS_7kQK62zwrxPkc |
DIS_493LaFKEkOf8I |
||||
21070477 |
A701 |
ATCACGAC |
A501 |
AAGGTTCA |
RNASeq |
{header.Date}-{sample.Description}-{sample.Sample_ID}-{file.Pair} |
2021-07-29_RNASeq |
RNASeq/inputRead/1/1 |
RNASeq/inputRead/1/2 |
https://api.seqslab.net/trs/v2/tools/trs_rnaseq/versions/1.0/WDL/files/ |
PH_NvIhjULtvZol5 |
BS_NvIhjULtvZol5 |
DIS_FI9n2VwXWRzBd |
|||||
21120248 |
A701 |
ATCACGAC |
A501 |
AAGGTTCA |
RNASeq |
{header.Date}-{sample.Description}-{sample.Sample_ID}-{file.Pair} |
2021-07-29_RNASeq |
RNASeq/inputRead/2/1 |
RNASeq/inputRead/2/2 |
https://api.seqslab.net/trs/v2/tools/trs_rnaseq/versions/1.0/WDL/files/ |
PH_Q9tkQyWcGZ30x |
BS_Q9tkQyWcGZ30x |
DIS_FI9n2VwXWRzBd |
|||||
21120275 |
A701 |
ATCACGAC |
A501 |
AAGGTTCA |
RNASeq |
{header.Date}-{sample.Description}-{sample.Sample_ID}-{file.Pair} |
2021-07-29_RNASeq |
RNASeq/inputRead/3/1 |
RNASeq/inputRead/3/2 |
https://api.seqslab.net/trs/v2/tools/trs_rbaseq/versions/1.0/WDL/files/ |
PH_nNiKxmXxX2rYU |
BS_nNiKxmXxX2rYU |
DIS_FI9n2VwXWRzBd |
|||||
21120249-t |
A701 |
ATCACGAC |
A501 |
AAGGTTCA |
somatic |
{header.Date}-{sample.Description}-{sample.Sample_ID}-{file.Pair} |
2021-07-29_somatic |
somatic/inputReadTumor/1 |
somatic/inputReadTumor/2 |
https://api.seqslab.net/trs/v2/tools/trs_somatic/versions/1.0/WDL/files/ |
somatic=acu-m64:Calling=acu-m8 |
PH_fkhYMoRRyT05T |
BS_fkhYMoRRyT05T |
DIS_V7wjmISIvC7xD |
||||
21120249-n |
A701 |
ATCACGAC |
A501 |
AAGGTTCA |
somatic |
{header.Date}-{sample.Description}-{sample.Sample_ID}-{file.Pair} |
2021-07-29_somatic |
somatic/inputReadNormal/1 |
somatic/inputReadNormal/2 |
https://api.seqslab.net/trs/v2/tools/trs_somatic/versions/1.0/WDL/files/ |
somatic=acu-m64:Calling=acu-m8 |
PH_im731GhGK86n5 |
BS_im731GhGK86n5 |
DIS_V7wjmISIvC7xD |