Task-level data pipelines

What is a pipeline?

A pipeline is used to describe how the data will be processed on the SeqsLab platform. It consists of operators.

Preprocess pipeline

source partition? mpipe? format.write
  • Any input preprocess pipeline should include the data partition description

  • Source: Read input data into memory

  • partition: Operator for partition

  • mpipe: Specify the preprocess job

  • format.write: Write the preprocessed result for a call pipeline

Call PipeLine

source mpipe cpipe
source cpipe mpipe
format.read mpipe cpipe
format.read cpipe mpipe?
format.read ppipe mpipe?
  • Used to execute command in a WDL task

  • format.read: Read the preprocessed result, which can be read into memory or disk

  • mpipe: If the input is read into memory, mpipe can be used before execute command

  • cpipe: Used if inputs are read into memory

  • ppipe: Used if inputs are read into file

Postprocess Pipeline

collect mpipe? sink
  • Collect the execution result and upload

  • collect: Collect the result (either on a local directory or in the cloud)

  • sink: Upload the result to a specific location