🔌

Power Tools

DeepSomatic
DeepTrio
DeepVariant
Download Public Data Files
Extract Sequences from Genome
FastQC
Kraken 2
Sequence Similarity Search
Sequencer Raw Data to FastQ

‣

DeepSomatic

This workflow can be used to identify single-nucleotide variants, indels and structural variants in diploid species genomics resequencing projects by comparison to a reference genome

Version 1.7.0

Use Cases

Determine variants in DNA samples compared to a reference genome including single nucleotide variants (SNVs), insertions and deletions

Somatic Variant Calling

Sequencing Platform supported include Illumina, Pacbio and Oxford Nanopore (ONT)

Summary

This workflow is designed to run DeepSomatic with BAM files.

Methods

Variants are detected with joint calling using DeepSomatic to produce VCF files. Variants effects are determined using SNPEff [1].

‣

Inputs

‣

Outputs

‣

Workflow Walkthrough

‣

Results Walkthrough

‣

Citations

Built with

‣

DeepTrio

This workflow can be used to identify single-nucleotide variants, insertions and deletions in diploid species genomics resequencing projects by comparison to a reference genome for probands and their parents.

Version 1.7.0

Use Cases

Determine variants in DNA samples compared to a reference genome including single nucleotide variants (SNVs), insertions and deletions

Germline Variant Calling

Sequencing Platform supported include Illumina, Pacbio and Oxford Nanopore (ONT)

Summary

This workflow is designed to run DeepTrio with BAM files from probands and their parents.

Methods

Variants are detected with joint calling using DeepTrio to produce VCF files. Variants effects are determined using SNPEff [1].

‣

Inputs

‣

Outputs

‣

Workflow Walkthrough

‣

Results Walkthrough

‣

Citations

Built with

‣

DeepVariant

This workflow runs DeepVariant on BAM files.

Version 1.7.0

Use Cases

Determine variants in DNA samples compared to a reference genome including single nucleotide variants (SNVs), insertions, deletions and structural variants

Germline Variant Calling

Determine variants in DNA samples compared to a custom reference genome for small or synthetic genomes

Plasmid
Virus
Bacteria
Sythetic Genome

Sequencing Platform supported include Illumina, Pacbio and Oxford Nanopore (ONT)

Summary

This workflow is designed to run DeepVariant with BAM files. Workflows can be run either with Parabricks, native open-source tools (NOST).

Methods

Variants are detected with joint calling using DeepVariant [1] to produce gVCF files. Genotyping of gVCF files is determined using GLNexus [2]. Variants effects are determined using SNPEff [3].

‣

Inputs

‣

Outputs

‣

Workflow Walkthrough

‣

Results Walkthrough

‣

Citations

Built with

‣

Download Public Data Files

Download publicly available Short-Read Archive (SRA), Gene Expression Omnibus (GEO), or Recount3 data from their respective databases, files from URL, or gene sequences.

Version 1.0.1

Use Cases

Access and download data from a variety of sources

Short Read Archive (SRA) data from the NCBI and EMBL
Gene Expression Omnibus (GEO) data from the NCBI
Recount3 Data: a public RNASeq project of human and mouse samples

Summary and Methods

This workflow is designed to help the user download data from a variety of sources, including Short Read Archive (SRA) data, gene sequences from supported genomes, the Gene Expression Omnibus (GEO), and Recount3. Click the toggles below to learn more about how the workflow accesses data from each source.

‣

Short Read Archive (SRA) Data

‣

Gene Expression Omnibus (GEO)

‣

Recount3: Gene Expression Data in Mouse/Human

‣

Inputs

‣

Outputs

‣

Workflow Walkthrough

‣

Results Walkthrough

Built with

‣

Extract Sequences from Genome

‣

FastQC

‣

Kraken 2

‣

Sequence Similarity Search

Find gene sequences that are significantly similar to known query sequences.

Version 1.1.1

Use Case

Identify Similar Sequences using a Database where similarity is determined by statistical significance of the alignment score

Summary

This workflow uses Blast [1, 2], Diamond [3], SSEARCH, FASTA36 [4], or Miniprot [5] for sequence comparison and similarity searches in biological databases.

Blast (Basic Local Alignment Search Tool): Blast is a widely used algorithm for comparing biological sequences, such as DNA, RNA, or protein sequences, against a large database. It employs a heuristic approach to find regions of local similarity between sequences. Blast provides a measure of sequence similarity, identifies regions of conservation, and predicts functional and evolutionary relationships between sequences.
Diamond: Diamond is a sequence alignment tool specifically designed for comparing protein sequences against protein sequence databases. It utilizes a fast and sensitive algorithm based on the concept of seed-and-extend alignment. Diamond is known for its high speed and is often used for large-scale protein sequence analysis.
SSEARCH: SSEARCH is a sequence comparison tool that performs global sequence alignment using the Smith-Waterman algorithm. It is known for its sensitivity in detecting distant homologs by finding optimal local alignments. SSEARCH is commonly used in protein sequence analysis and is particularly effective when comparing sequences with low similarity.
FASTA36: FASTA36 is a versatile and widely used program for comparing protein and nucleotide sequences against sequence databases. It employs the FASTA algorithm, which is based on local alignment. FASTA36 is known for its sensitivity in identifying distant homologs and can be used for both database searches and pairwise alignments.
Miniprot: Miniprot is an extremely fast protein-to-genome aligner developed by Heng Li, the developer of minimap2. It outputs alignments in PAF (paired alignment format) and gtf (gene transfer format).

Methods

This analysis was performed using the Sequence Similarity Search workflow on the Form Bio platform. This workflow takes an input FastA file and performs a sequence similarity search with BLAST [1, 2], Diamond [3], SSEARCH, FASTA36 [4], or Miniprot[5].

‣

Inputs

‣

Outputs

Runtime Estimates

Average = 1 hour 19 minutes

‣

Workflow Walkthrough

‣

Citations

‣

Sequencer Raw Data to FastQ

Converts Sequencer Data to FastQ; includes DeepConsensus for PacBio.

Version 0.0.4

Use Cases

The user has completed PacBio HiFi sequencing
The user has completed Illumina sequencing
The user has completed Oxford Nanopore sequencing

Summary

This is a workflow that can be used to convert raw data from a sequencer into data FastQ format.

Methods

If input data is PacBio subread uBAMs or the sequencer run folder, consensus contig reads are created using circular consensus sequencing (CCS) [1]. Optionally DeepConsensus can be used to improve basecalls [2]. If the input data is ONT fast5 data, basecalling is performed using Dorado [3]. If the input data is Illumina sequencer run folder, bcl2fastq will be run to create fastq files and demultiplex using sample barcodes [4].

‣

Inputs

‣

Outputs

‣

Workflow Walkthrough

‣

Results Walkthrough

‣

Citations

Built with