Operator pipelines#
Atgenomix currently provides the following operator pipelines on SeqsLab.
Name |
File type |
Argument |
Description |
---|---|---|---|
|
All |
None |
Automatic workload pipeline for localizing either a file or directory, with the lifespan of the cluster |
|
All |
None |
Automatic workload pipeline for localizing a file per partition to each executor of the WDL task described as the FQN typed in Array of Files |
|
|
None |
File-based BGEN workload pipeline parallelized by its’ original index within the file. |
|
All |
None |
Automatic workload pipeline for localizing either a file or directory in a single node cluster. |
|
|
None |
File-based BAM workload pipeline with all BAM records in a single partition |
|
|
None |
File-based BAM workload pipeline with unmapped BAM records in a single partition |
|
|
|
File-based BAM workload pipeline with reads partitioned based on given |
|
|
|
File-based BAM workload pipeline with reads partitioned based on given |
|
|
|
File-based VCF workload pipeline with reads partitioned based on given partBed |
|
|
|
File-based BED workload pipeline with bed records partitioned based on given partBed |
|
|
|
File-based non-imputation BGEN workload pipeline parallelized by provided |
|
|
|
File-based FASTQ workload parallelization pipeline with |
|
|
None |
SparkSQL-based Delta Lake workload pipeline |
|
|
None |
SparkSQL-based VCF workload pipeline loading VCF file into Delta Table using Glow |
|
|
None |
File-based Csv/Tsv workload pipeline for SQL purposes |
|
|
None |
File-based Csv/Tsv workload pipeline with index for SQL purposes |
Name |
File type |
Argument |
Description |
---|---|---|---|
|
|
None |
File-based Regenie master file output pipeline. It will collect each partition’s master file, created by the input operator pipeline |
|
|
None |
File-based Delta Lake output pipeline with partitionBy settings for SQL purposes |
|
|
None |
File-based Csv/Tsv output pipeline with partition settings for SQL purposes |
|
|
None |
File-based Json output pipeline with partition settings for SQL purposes |
Description |
URI |
---|---|
HG38 primary contigs in 1 partitions |
https://seqslabbundles.blob.core.windows.net/static/system/bed/38/
single_node_workflow
|
HG38 autosomes parallelized into 22 partitions, one autosome per partition |
https://seqslabbundles.blob.core.windows.net/static/system/bed/38/
autosomes
|
HG38 primary contigs parallelized into 23 partitions, one autosome per partition, and chrX, chrY, and chrM merged into a single partition |
https://seqslabbundles.blob.core.windows.net/static/system/bed/38/
chromosomes
|
HG38 primary contigs parallelized into 23 partitions, and further including an extra partition with soft-clipped and discordant alignments. It is recommended for structural variation discovery analysis |
https://seqslabbundles.blob.core.windows.net/static/system/bed/38/
chromosomes+softclip_or_discordant_reads
|
HG38 primary contigs parallelized into 50 contiguous unmasked regions |
https://seqslabbundles.blob.core.windows.net/static/system/bed/38/
contiguous_unmasked_regions_50_parts
|
HG38 primary contigs parallelized into 50 contiguous unmasked regions, with an extra partition with soft-clipped and discordant alignments, and is recommended for structural variation discovery analysis. |
https://seqslabbundles.blob.core.windows.net/static/system/bed/38/
contiguous_unmasked_regions_50_parts+softclip_or_discordant_reads
|
HG38 primary contigs parallelized into 155 contiguous unmasked regions |
https://seqslabbundles.blob.core.windows.net/static/system/bed/38/
contiguous_unmasked_regions_155_parts
|
HG38 primary contigs parallelized into 323 contiguous unmasked regions |
https://seqslabbundles.blob.core.windows.net/static/system/bed/38/
contiguous_unmasked_regions_323_parts_unpadded
|
HG38 primary contigs parallelized into 323 contiguous unmasked regions, with 1kbp padding on two sides of each regions. |
https://seqslabbundles.blob.core.windows.net/static/system/bed/38/
contiguous_unmasked_regions_323_parts
|
HG38 primary contigs parallelized into 3101 contiguous unmasked regions |
https://seqslabbundles.blob.core.windows.net/static/system/bed/38/
contiguous_unmasked_regions_3101_parts_unpadded
|
HG38 primary contigs parallelized into 3101 contiguous unmasked regions, with 1kbp padding on two sides of each regions. |
https://seqslabbundles.blob.core.windows.net/static/system/bed/38/
contiguous_unmasked_regions_3101_parts
|
HG38 primary contigs parallelized into 20361 contiguous unmasked regions |
https://seqslabbundles.blob.core.windows.net/static/system/bed/38/
contiguous_unmasked_regions_20361_parts
|
Description |
URI |
---|---|
HG19 primary contigs in 1 partitions |
https://seqslabbundles.blob.core.windows.net/static/system/bed/19/
single_node_workflow
|
HG19 autosomes parallelized into 22 partitions, one autosome per partition |
https://seqslabbundles.blob.core.windows.net/static/system/bed/19/
autosomes
|
HG19 primary contigs parallelized into 23 partitions, one autosome per partition, and chrX, chrY, and chrM merged into a single partition |
https://seqslabbundles.blob.core.windows.net/static/system/bed/19/
chromosomes
|
HG19 primary contigs parallelized into 23 partitions, and further including an extra partition with soft-clipped and discordant alignments. It is recommended for structural variation discovery analysis |
https://seqslabbundles.blob.core.windows.net/static/system/bed/19/
chromosomes+softclip_or_discordant_reads
|
HG19 primary contigs parallelized into 77 contiguous unmasked regions |
https://seqslabbundles.blob.core.windows.net/static/system/bed/19/
contiguous_unmasked_regions_77_parts
|
HG19 primary contigs parallelized into 77 contiguous unmasked regions, with an extra partition with soft-clipped and discordant alignments, and is recommended for structural variation discovery analysis. |
https://seqslabbundles.blob.core.windows.net/static/system/bed/19/
contiguous_unmasked_regions_77_parts+softclip_or_discordant_reads
|
HG19 primary contigs parallelized into 155 contiguous unmasked regions |
https://seqslabbundles.blob.core.windows.net/static/system/bed/19/
contiguous_unmasked_regions_155_parts
|
HG19 primary contigs parallelized into 323 contiguous unmasked regions |
https://seqslabbundles.blob.core.windows.net/static/system/bed/19/
contiguous_unmasked_regions_323_parts_unpadded
|
HG19 primary contigs parallelized into 323 contiguous unmasked regions, with 1kbp padding on two sides of each regions. |
https://seqslabbundles.blob.core.windows.net/static/system/bed/19/
contiguous_unmasked_regions_323_parts
|
HG19 primary contigs parallelized into 3109 contiguous unmasked regions |
https://seqslabbundles.blob.core.windows.net/static/system/bed/19/
contiguous_unmasked_regions_3109_parts_unpadded
|
HG19 primary contigs parallelized into 3109 contiguous unmasked regions, with 1kbp padding on two sides of each regions. |
https://seqslabbundles.blob.core.windows.net/static/system/bed/19/
contiguous_unmasked_regions_3109_parts
|
Description |
URI |
---|---|
HG38 reference genome |
https://seqslabbundles.blob.core.windows.net/static/reference/38/
BROAD-PUB-REF/Homo_sapiens_assembly38.dict
|
HG38 reference genome with primary contigs only |
https://seqslabbundles.blob.core.windows.net/static/reference/38/
PRIMARY/Homo_sapiens_assembly38.dict
|
HG19 reference genome |
https://seqslabbundles.blob.core.windows.net/static/reference/19/
HG/ref.dict
|
HG19 reference genome with primary contigs only |
https://seqslabbundles.blob.core.windows.net/static/reference/19/
HG-primary/ref.dict
|