Data registration#

The following section provides step-by-step instructions on how to register sequencing sample files and the reference files needed for the WGS-Germline-Snps-Indels workflow.

Sample files#

The datahub register command registers files by taking a JSON payload indicating the file information, such as the name, mime_type, file_type, size, checksum, metadata, and access_url for each file. For example, take a look at this sample.json file, which describes a paired-end FASTQ sample prepared on a publicly available space. This sample is converted from an HG003 bam file using the Picard FastqToSam tool.

You can register the FASTQ sample files on SeqsLab using the following command:

cat sample.json | seqslab datahub register-blob \
    --stdin file-blob --workspace "${WORKSPACE}"

Reference files#

You must execute the WGS-Germline-Snps-Indels workflow with reference files.

atgenomix@genomics:/mnt/references$ tree
│   ├── ref.fa
│   ├── ref.fa.fai
│   ├── ref.dict
│   ├──
│   ├── ref.fa.ann
│   ├── ref.fa.bwt
│   ├── ref.fa.pac
│   ├── ref.fa.amb
│   ├── DbSNP.vcf.gz
│   ├── DbSNP.vcf.gz.tbi
│   ├── Homo_sapiens_known_indels.vcf.gz
│   ├── Homo_sapiens_known_indels.vcf.gz.tbi
│   ├── Mills_and_1000G_gold_standard.indels.vcf.gz
│   ├── Mills_and_1000G_gold_standard.indels.vcf.gz.tbi

Similar to the sample files, the reference files can be described in a static.json file.

Use the following command to register the reference files:

cat static.json | seqslab datahub register-blob \
    --stdin file-blob --workspace "${WORKSPACE}"

Test your own sample files#

This user guide only provides the steps for registering publicly available sample files for the test drive. However, you can also upload and register your own sample files. For details, see Using the SeqsLab Run Sheet with the CLI.