Operator pipelines#

Atgenomix currently provides the following operator pipelines on SeqsLab.

Table 10 Pipeline operators#

File type

Partitions

Operator pipeline

All

None

Automatic workload pipeline for localizing either a file or directory in a single node cluster

.fastq, .fastq.gz, fq.gz

Depends on the data

File-based FASTQ workload parallelization pipeline with 1,048,576 read records for each partition

.bam

1

File-based BAM workload pipeline with all BAM records in a single partition

.bam

1

File-based BAM workload pipeline with unmapped BAM records in a single partition

.bam

23

File-based BAM workload pipeline with reads on HG19 primary chromosome parallelized into 23 partitions (one autosome per partition, and chrX, chrY, and chrM merged into a single partition)

.bam

77

File-based BAM workload pipeline with reads on HG19 primary chromosome parallelized into 77 contiguous unmasked regions

.bam

155

File-based BAM workload pipeline with reads on HG19 primary chromosome parallelized into 155 contiguous unmasked regions

.bam

3,109

File-based BAM workload pipeline with reads on HG19 primary chromosome parallelized into 3,109 contiguous unmasked regions

.bam

155

File-based BAM workload pipeline with reads on HG19 primary chromosome parallelized into 155 contiguous unmasked regions, where both reads in a read pair are presented in each partition for analyses (e.g., read consensus)

.bam

45

File-based BAM workload pipeline with HG19 reference genome chr20 parallelized into 45 contiguous unmasked regions

.bam

23

File-based BAM workload pipeline with reads on GRCH38 primary chromosome parallelized into 23 partitions (one autosome per partition, and chrX, chrY, and chrM merged into a single partition)

.bam

50

File-based BAM workload pipeline with reads on GRCH38 primary chromosome parallelized into 50 contiguous unmasked regions

.bam

50

File-based BAM workload pipeline with reads on GRCH38 primary chromosome parallelized into 50 contiguous unmasked regions, where both reads in a read pair are presented in each partition for analyses (e.g., read consensus)

.bam

155

File-based BAM workload pipeline with GRCH38 reference genome parallelized into 155 contiguous unmasked regions

.bam

3,101

File-based BAM workload pipeline with HG19 reference genome parallelized into 3,101 contiguous unmasked regions

.bam

None

File-based unmapped BAM workload without data parallelization

.bam

None

File-based BAM workload with no data parallelization

.vcf, .gvcf, .vcf.gz, .gvcf.gz

3,109

File-based VCF workload pipeline with HG19 reference genome parallelized into 3,109 contiguous unmasked regions

.vcf, .gvcf, .vcf.gz, .gvcf.gz

3,101

File-based VCF workload pipeline with GRCh38 reference genome parallelized into 3,101 contiguous unmasked regions

.vcf, .gvcf, .vcf.gz, .gvcf.gz

None

File-based VCF workload pipeline using Glow and Delta Lake

delta lake

None

File-based Delta Lake workload pipeline