(operators:operator-types)=
# Operator types

SeqsLab currently provides the two main operator categories: localization and delocalization. Each category can further be grouped into the following operator types:

```{list-table}
:header-rows: 1

* - Localization                        
  - Delocalization
* - [Loader](Loader)
  - [Collector](Collector)
* - [Transformer](Transformer)
  - [Writer](Writer)
* - [Formatter](Formatter)
  -
* - [Executor](Executor)
  -
```

## Localization

Localization loads datasets from a source (such as blob storage) and optionally transforms the dataset to meet the requirements of distributed task commands.

Computation passes DataFrame partitions to task command as inputs and executes the task command (such as shell script or SQL).

(Loader)=
### Loader

Loaders are responsible for loading a dataset into an in-memory DataFrame or for copying a dataset from a specific data source, such as a blob storage, to the local host file system.

Loaders also inform SeqsLab how it should process the data since SeqsLab supports multiple data processing options that manage and optimize workloads. For example, the `CopyToLocal` loader operator can copy or localize genome reference files to all available computing nodes. 


```{list-table}
:header-rows: 1

* - Operator                        
  - Description
* - [RefLoader](RefLoader)
  - Automatic workload pipeline for localizing either a file or directory shared within cluster in a single node cluster
* - [CopyToLocalLoader](CopyToLocalLoader)
  - <add description for CopyToLocalLoader>
```
(RefLoader)=
#### RefLoader

```{grid}
:gutter: 3

:::{grid-item-card} Data types
<add supported file types>
:::

:::{grid-item-card} Configuration
<add configuration options>
:::
```

```{grid}
:gutter: 3

:::{grid-item-card} Sample use case
<add sample use case>
:::

:::{grid-item-card} Performance metrics
<add example performance metrics>
:::

```

(CopyToLocalLoader)=
#### CopyToLocalLoader

```{grid}
:gutter: 3

:::{grid-item-card} Data types
<add supported file types>
:::

:::{grid-item-card} Configuration
<add configuration options>
:::
```

```{grid}
:gutter: 3

:::{grid-item-card} Sample use case
<add sample use case>
:::

:::{grid-item-card} Performance metrics
<add example performance metrics>
:::

```


(Transformer)=
### Transformer

Transformers are responsible for repartitioning and sorting dataframes to optimize downstream data processing. For example, in genome sequencing analysis, a transformer can repartition BAM or VCF datasets based on non-overlapping target regions.

```{list-table}
:header-rows: 1

* - Operator                        
  - Description
* - [FastqPartitioner](FastqPartitioner)
  - 
* - [BamPartitionerPart1](BamPartitionerPart1)
  - 
* - [BamPartitionerPart1Unmap](BamPartitionerPart1Unmap)
  - 
* - [BamPartitionerHg19Part23](BamPartitionerHg19Part23)
  - 
* - [BamPartitionerHg19Chr20Part45](BamPartitionerHg19Chr20Part45)
  - 
* - [BamPartitionerHg19Part155](BamPartitionerHg19Part155)
  - 
* - [BamPartitionerHg19Part3109](BamPartitionerHg19Part3109)
  - 
* - [BamPartitionerHg19Part155Consensus](BamPartitionerHg19Part155Consensus)
  - 
* - [BamPartitionerGRCh38Part23](BamPartitionerGRCh38Part23)
  - 
* - [BamPartitionerGRCh38Part50Consensus](BamPartitionerGRCh38Part50Consensus)
  - 
* - [BamPartitionerGRCh38Part50](BamPartitionerGRCh38Part50)
  - 
* - [BamPartitionerGRCh38Part3101](BamPartitionerGRCh38Part3101)
  - 
* - [VcfPartitionerHg19Part1](VcfPartitionerHg19Part1)
  - 
* - [VcfPartitionerHg19Part23](VcfPartitionerHg19Part23)
  - 
* - [VcfPartitionerHg19Part155](VcfPartitionerHg19Part155)
  - 
* - [VcfPartitionerHg19Part3109](VcfPartitionerHg19Part3109)
  - 
* - [VcfPartitionerHg19Part3109Unpadded](VcfPartitionerHg19Part3109Unpadded)
  - 
* - [VcfPartitionerGRCh38Part1](VcfPartitionerGRCh38Part1)
  - 
* - [VcfPartitionerGRCh38Part23](VcfPartitionerGRCh38Part23)
  - 
* - [VcfPartitionerGRCh38Part155](VcfPartitionerGRCh38Part155)
  - 
* - [VcfPartitionerGRCh38Part3101](VcfPartitionerGRCh38Part3101)
  - 
* - [VcfDataFrameTransformer](VcfDataFrameTransformer)
  - 
* - [VcfGlowTransformer](VcfGlowTransformer)
  - 
```

(FastqPartitioner)=
#### FastqPartitioner

```{grid}
:gutter: 3

:::{grid-item-card} Data types
<add supported file types>
:::

:::{grid-item-card} Configuration
<add configuration options>
:::
```

```{grid}
:gutter: 3

:::{grid-item-card} Sample use case
<add sample use case>
:::

:::{grid-item-card} Performance metrics
<add example performance metrics>
:::

```

(BamPartitionerPart1)=
#### BamPartitionerPart1

```{grid}
:gutter: 3

:::{grid-item-card} Data types
<add supported file types>
:::

:::{grid-item-card} Configuration
<add configuration options>
:::
```

```{grid}
:gutter: 3

:::{grid-item-card} Sample use case

:::

:::{grid-item-card} Performance metrics
<add example performance metrics>
:::

```


(BamPartitionerPart1Unmap)=
#### BamPartitionerPart1Unmap

```{grid}
:gutter: 3

:::{grid-item-card} Data types
<add supported file types>
:::

:::{grid-item-card} Configuration
<add configuration options>
:::
```

```{grid}
:gutter: 3

:::{grid-item-card} Sample use case
<add sample use case>
:::

:::{grid-item-card} Performance metrics
<add example performance metrics>
:::

```

(BamPartitionerHg19Part23)=
#### BamPartitionerHg19Part23

```{grid}
:gutter: 3

:::{grid-item-card} Data types
<add supported file types>
:::

:::{grid-item-card} Configuration
<add configuration options>
:::
```

```{grid}
:gutter: 3

:::{grid-item-card} Sample use case

:::

:::{grid-item-card} Performance metrics
<add example performance metrics>
:::

```

(BamPartitionerHg19Chr20Part45)=
#### BamPartitionerHg19Chr20Part45

```{grid}
:gutter: 3

:::{grid-item-card} Data types
<add supported file types>
:::

:::{grid-item-card} Configuration
<add configuration options>
:::
```

```{grid}
:gutter: 3

:::{grid-item-card} Sample use case
<add sample use case>
:::

:::{grid-item-card} Performance metrics
<add example performance metrics>
:::

```

(BamPartitionerHg19Part155)=
#### BamPartitionerHg19Part155

```{grid}
:gutter: 3

:::{grid-item-card} Data types
<add supported file types>
:::

:::{grid-item-card} Configuration
<add configuration options>
:::
```

```{grid}
:gutter: 3

:::{grid-item-card} Sample use case
<add sample use case>
:::

:::{grid-item-card} Performance metrics
<add example performance metrics>
:::

```

(BamPartitionerHg19Part3109)=
#### BamPartitionerHg19Part3109

```{grid}
:gutter: 3

:::{grid-item-card} Data types
<add supported file types>
:::

:::{grid-item-card} Configuration
<add configuration options>
:::
```

```{grid}
:gutter: 3

:::{grid-item-card} Sample use case
<add sample use case>
:::

:::{grid-item-card} Performance metrics
<add example performance metrics>
:::

```

(BamPartitionerHg19Part155Consensus)=
#### BamPartitionerHg19Part155Consensus

```{grid}
:gutter: 3

:::{grid-item-card} Data types
<add supported file types>
:::

:::{grid-item-card} Configuration
<add configuration options>
:::
```

```{grid}
:gutter: 3

:::{grid-item-card} Sample use case
<add sample use case>
:::

:::{grid-item-card} Performance metrics
<add example performance metrics>
:::

```
(BamPartitionerGRCh38Part23)=
#### BamPartitionerGRCh38Part23

```{grid}
:gutter: 3

:::{grid-item-card} Data types
<add supported file types>
:::

:::{grid-item-card} Configuration
<add configuration options>
:::
```

```{grid}
:gutter: 3

:::{grid-item-card} Sample use case
<add sample use case>
:::

:::{grid-item-card} Performance metrics
<add example performance metrics>
:::

```

(BamPartitionerGRCh38Part50Consensus)=
#### BamPartitionerGRCh38Part50Consensus

```{grid}
:gutter: 3

:::{grid-item-card} Data types
<add supported file types>
:::

:::{grid-item-card} Configuration
<add configuration options>
:::
```

```{grid}
:gutter: 3

:::{grid-item-card} Sample use case
<add sample use case>
:::

:::{grid-item-card} Performance metrics
<add example performance metrics>
:::

```
(BamPartitionerGRCh38Part50)=
#### BamPartitionerGRCh38Part50

```{grid}
:gutter: 3

:::{grid-item-card} Data types
<add supported file types>
:::

:::{grid-item-card} Configuration
<add configuration options>
:::
```

```{grid}
:gutter: 3

:::{grid-item-card} Sample use case
<add sample use case>
:::

:::{grid-item-card} Performance metrics
<add example performance metrics>
:::

```
(BamPartitionerGRCh38Part3101)=
#### BamPartitionerGRCh38Part3101

```{grid}
:gutter: 3

:::{grid-item-card} Data types
<add supported file types>
:::

:::{grid-item-card} Configuration
<add configuration options>
:::
```

```{grid}
:gutter: 3

:::{grid-item-card} Sample use case
<add sample use case>
:::

:::{grid-item-card} Performance metrics
<add example performance metrics>
:::

```

(VcfPartitionerHg19Part1)=
#### VcfPartitionerHg19Part1

```{grid}
:gutter: 3

:::{grid-item-card} Data types
<add supported file types>
:::

:::{grid-item-card} Configuration
<add configuration options>
:::
```

```{grid}
:gutter: 3

:::{grid-item-card} Sample use case
<add sample use case>
:::

:::{grid-item-card} Performance metrics
<add example performance metrics>
:::

```

(VcfPartitionerHg19Part23)=
#### VcfPartitionerHg19Part23

```{grid}
:gutter: 3

:::{grid-item-card} Data types
<add supported file types>
:::

:::{grid-item-card} Configuration
<add configuration options>
:::
```

```{grid}
:gutter: 3

:::{grid-item-card} Sample use case
<add sample use case>
:::

:::{grid-item-card} Performance metrics
<add example performance metrics>
:::

```

(VcfPartitionerHg19Part155)=
#### VcfPartitionerHg19Part155

```{grid}
:gutter: 3

:::{grid-item-card} Data types
<add supported file types>
:::

:::{grid-item-card} Configuration
<add configuration options>
:::
```

```{grid}
:gutter: 3

:::{grid-item-card} Sample use case
<add sample use case>
:::

:::{grid-item-card} Performance metrics
<add example performance metrics>
:::

```

(VcfPartitionerHg19Part3109)=
#### VcfPartitionerHg19Part3109

```{grid}
:gutter: 3

:::{grid-item-card} Data types
<add supported file types>
:::

:::{grid-item-card} Configuration
<add configuration options>
:::
```

```{grid}
:gutter: 3

:::{grid-item-card} Sample use case
<add sample use case>
:::

:::{grid-item-card} Performance metrics
<add example performance metrics>
:::

```

(VcfPartitionerHg19Part3109Unpadded)=
#### VcfPartitionerHg19Part3109Unpadded

```{grid}
:gutter: 3

:::{grid-item-card} Data types
<add supported file types>
:::

:::{grid-item-card} Configuration
<add configuration options>
:::
```

```{grid}
:gutter: 3

:::{grid-item-card} Sample use case
<add sample use case>
:::

:::{grid-item-card} Performance metrics
<add example performance metrics>
:::

```

(VcfPartitionerGRCh38Part1)=
#### VcfPartitionerGRCh38Part1

```{grid}
:gutter: 3

:::{grid-item-card} Data types
<add supported file types>
:::

:::{grid-item-card} Configuration
<add configuration options>
:::
```

```{grid}
:gutter: 3

:::{grid-item-card} Sample use case
<add sample use case>
:::

:::{grid-item-card} Performance metrics
<add example performance metrics>
:::

```

(VcfPartitionerGRCh38Part23)=
#### VcfPartitionerGRCh38Part23

```{grid}
:gutter: 3

:::{grid-item-card} Data types
<add supported file types>
:::

:::{grid-item-card} Configuration
<add configuration options>
:::
```

```{grid}
:gutter: 3

:::{grid-item-card} Sample use case
<add sample use case>
:::

:::{grid-item-card} Performance metrics
<add example performance metrics>
:::

```

(VcfPartitionerGRCh38Part155)=
#### VcfPartitionerGRCh38Part155

```{grid}
:gutter: 3

:::{grid-item-card} Data types
<add supported file types>
:::

:::{grid-item-card} Configuration
<add configuration options>
:::
```

```{grid}
:gutter: 3

:::{grid-item-card} Sample use case
<add sample use case>
:::

:::{grid-item-card} Performance metrics
<add example performance metrics>
:::

```

(VcfPartitionerGRCh38Part3101)=
#### VcfPartitionerGRCh38Part3101

```{grid}
:gutter: 3

:::{grid-item-card} Data types
<add supported file types>
:::

:::{grid-item-card} Configuration
<add configuration options>
:::
```

```{grid}
:gutter: 3

:::{grid-item-card} Sample use case
<add sample use case>
:::

:::{grid-item-card} Performance metrics
<add example performance metrics>
:::

```

(VcfDataFrameTransformer)=
#### VcfDataFrameTransformer

```{grid}
:gutter: 3

:::{grid-item-card} Data types
<add supported file types>
:::

:::{grid-item-card} Configuration
<add configuration options>
:::
```

```{grid}
:gutter: 3

:::{grid-item-card} Sample use case
<add sample use case>
:::

:::{grid-item-card} Performance metrics
<add example performance metrics>
:::

```

(VcfGlowTransformer)=
#### VcfGlowTransformer

```{grid}
:gutter: 3

:::{grid-item-card} Data types
<add supported file types>
:::

:::{grid-item-card} Configuration
<add configuration options>
:::
```

```{grid}
:gutter: 3

:::{grid-item-card} Sample use case
<add sample use case>
:::

:::{grid-item-card} Performance metrics
<add example performance metrics>
:::

```
(Formatter)=
### Formatter

Formatters are responsible for formatting input datasets by converting the schema, adding or deleting columns, or encoding domain-specific objects.

(Executor)=
### Executor

Executors are responsible for preprocessing or localizing an input DataFrame as a managed table for a Spark SQL command, or for saving data to local files for shell script execution. An executor is required for each input DataFrame of workflow tasks before a pipeline can execute task commands.

```{list-table}
:header-rows: 1

* - Operator                        
  - Description
* - [BamExecutor](BamExecutor)
  - 
* - [CsvExecutor](CsvExecutor)
  - 
* - [FastqExecutor](FastqExecutor)
  - 
* - [VcfExecutor](VcfExecutor)
  - 
* - [TableLocalizationExecutor](TableLocalizationExecutor)
  - 
```
(BamExecutor)=
#### BamExecutor

```{grid}
:gutter: 3

:::{grid-item-card} Data types
`bam`
:::

:::{grid-item-card} Configuration
<add configuration options>
:::
```

```{grid}
:gutter: 3

:::{grid-item-card} Sample use case
<add sample use case>
:::

:::{grid-item-card} Performance metrics
<add example performance metrics>
:::

```


(CsvExecutor)=
#### CsvExecutor

```{grid}
:gutter: 3

:::{grid-item-card} Data types
`csv`
:::

:::{grid-item-card} Configuration
<add configuration options>
:::
```

```{grid}
:gutter: 3

:::{grid-item-card} Sample use case
<add sample use case>
:::

:::{grid-item-card} Performance metrics
<add example performance metrics>
:::

```


(FastqExecutor)=
#### FastqExecutor

```{grid}
:gutter: 3

:::{grid-item-card} Data types
`fq.gz`, `fq.bgz`, `fastq.gz`, `fastq.bgz`
:::

:::{grid-item-card} Configuration
<add configuration options>
:::
```

```{grid}
:gutter: 3

:::{grid-item-card} Sample use case
<add sample use case>
:::

:::{grid-item-card} Performance metrics

:::

```

(VcfExecutor)=
#### VcfExecutor

```{grid}
:gutter: 3

:::{grid-item-card} Data types
`vcf.gz` `vcf.bgz`
:::

:::{grid-item-card} Configuration
<add configuration options>
:::
```

```{grid}
:gutter: 3

:::{grid-item-card} Sample use case
<add sample use case>
:::

:::{grid-item-card} Performance metrics
<add example performance metrics>
:::

```

(TableLocalizationExecutor)=
#### TableLocalizationExecutor

```{grid}
:gutter: 3

:::{grid-item-card} Data types
`delta`
:::

:::{grid-item-card} Configuration
<add configuration options>
:::
```

```{grid}
:gutter: 3

:::{grid-item-card} Sample use case
<add sample use case>
:::

:::{grid-item-card} Performance metrics
<add example performance metrics>
:::

```

## Delocalization

Delocalization collects a file or dataset outputted from a task command and saves it to a destination (such as blob storage).

(Collector)=
### Collector

Collectors are responsible for retrieving the outputs of an executed command from the local file system, and then returning them to a DataFrame. Additionally, it can be responsible for computing aggregates of command outputs. A collector is required for each output file of a workflow task after the command execution has successfully completed.

```{list-table}
:header-rows: 1

* - Operator                        
  - Description
* - [BamCollector](BamCollector)
  - 
* - [TexCollector](TexCollector)
  - 
```

(BamCollector)=
#### BamCollector

```{grid}
:gutter: 3

:::{grid-item-card} Data types
<add supported file types>
:::

:::{grid-item-card} Configuration
<add configuration options>
:::
```

```{grid}
:gutter: 3

:::{grid-item-card} Sample use case
<add sample use case>
:::

:::{grid-item-card} Performance metrics
<add example performance metrics>
:::

```

(TexCollector)=
#### TexCollector

```{grid}
:gutter: 3

:::{grid-item-card} Data types
<add supported file types>
:::

:::{grid-item-card} Configuration
<add configuration options>
:::
```

```{grid}
:gutter: 3

:::{grid-item-card} Sample use case
<add sample use case>
:::

:::{grid-item-card} Performance metrics
<add example performance metrics>
:::

```

(Writer)=
### Writer

Writers are responsible for delocalizing or saving the command execution output DataFrame to the specified storage or repository, such as a cloud file system, HTTPS repository, or JDBC database.

```{list-table}
:header-rows: 1

* - Operator                        
  - Description
* - [GeneralWriter](GeneralWriter)
  - 
```
(GeneralWriter)=
#### GeneralWriter

```{grid}
:gutter: 3

:::{grid-item-card} Data types
<add supported file types>
:::

:::{grid-item-card} Configuration
<add configuration options>
:::
```

```{grid}
:gutter: 3

:::{grid-item-card} Sample use case
<add sample use case>
:::

:::{grid-item-card} Performance metrics
<add example performance metrics>
:::

```
