Data registration#

Objective#

This tutorial will help you register sequencing sample files and the reference files needed for the WGS-Germline-Snps-Indels workflow.

Prerequisites#

Before you begin, you will need the following:

  • SeqsLab managed application on Azure. For details, see Test drive SeqsLab.

  • A running instance of the SeqsLab CLI tool. For details, see Pull and run the SeqsLab CLI.

  • A command line interface (CLI) tool such as the Windows Command Prompt or the Mac Terminal

Register the sample files#

The datahub register command registers files by taking a JSON payload indicating the file information, such as the name, mime_type, file_type, size, checksum, metadata, and access_url for each file. For example, take a look at this sample.json file, which describes a paired-end FASTQ sample prepared on a publicly available space. This sample is converted from an HG003 bam file using the Picard FastqToSam tool.

You can register the FASTQ sample files on SeqsLab using the following command:

wget https://seqslabbundles.blob.core.windows.net/static/data/sample.json
cat sample.json | seqslab datahub register-blob \
    --stdin file --workspace "${WORKSPACE}"

Register the reference files#

You must execute the WGS-Germline-Snps-Indels workflow with the following reference files:

atgenomix@genomics:/mnt/references$ tree
.
├── https://seqslabbundles.blob.core.windows.net/static/reference/19/HG/
│   ├── ref.fa
│   ├── ref.fa.fai
│   ├── ref.dict
│   ├── ref.fa.sa
│   ├── ref.fa.ann
│   ├── ref.fa.bwt
│   ├── ref.fa.pac
│   ├── ref.fa.amb
│   ├── DbSNP.vcf.gz
│   ├── DbSNP.vcf.gz.tbi
│   ├── Homo_sapiens_known_indels.vcf.gz
│   ├── Homo_sapiens_known_indels.vcf.gz.tbi
│   ├── Mills_and_1000G_gold_standard.indels.vcf.gz
│   ├── Mills_and_1000G_gold_standard.indels.vcf.gz.tbi

Similar to the sample files, the reference files can be described in a static.json file.

Use the following command to register the reference files:

wget https://seqslabbundles.blob.core.windows.net/static/data/static.json
cat static.json | seqslab datahub register-blob \
    --stdin file --workspace "${WORKSPACE}"

Test your own sample files#

This example only provides the steps for registering publicly available sample files for the test drive. However, you can also upload and register your own sample files. For details, see Using the SeqsLab Run Sheet with the CLI.