🧬

Genomics

16S Sequencing

Perform taxonomic classification on 16S sequencing data.

Workflow Walkthrough

Genome Assembly

image

This workflow can take short read Illumina, long read ONT, or PacBio data along with OMNI-C or Hi-C reads for scaffolding to create genome assemblies. This workflow is enhanced with Google DeepOmics tools such as DeepConsensus and DeepPolisher.

Version 1.0.5

Use Cases

  • Assemble genomes from short and/or long-read sequencing files

Summary and Methods

This workflow has been designed to create draft genome assemblies from short and/or long-read FastQ sequencing files. The workflow is also capable of polishing, purging, filtering, and evaluating these assemblies during their creation. If supplied with Fast5 files, the workflow can perform ONT basecalling before assembly. The user will provide as input the short and/or long-read FastQ files. The user will receive as output a draft genome assembly. Click the toggles below to learn more about how this workflow processes short reads and long/mixed reads.

Short Read

Long Read or Mixed Read

Inputs

Outputs

Workflow Walkthrough

Results Walkthrough

Citations

Genome Coordinate Conversion

image

Convert the location of a set of genomic features, such as genes, transcription factor bindings sites, or promoters, from one genome to another.

Version 1.0.1

Use Cases

  • Create a Genome Coordinate Conversion File between two genomes
  • Map the location of a set of genomic features (e.g. genes, transcription factor binding sites, promoters) from the target genome to the query genome
  • Filter the genomic features that are converted to the query genome for overlap with a second set of genomic features. The locations of the features in this separate set are in the query genome

Summary

This workflow is designed to help the user convert the location of a set of genomic features, such as genes, transcription factor bindings sites, or promoters, from one genome to another. The user will provide as input a query genome to convert to and a target genome to convert from. If a Genome Coordinate Conversion File is not provided, one may be generated from two input FastA files. The user will receive as output the location of genomic features on the target genome, and a Genome Coordinate Conversion File if indicated.

Methods

This workflow was performed using the Genome Coordinate Conversion workflow on the Form Bio platform. The Genome Coordinate Conversion File is a whole genome alignment between the target genome and the query genome. Only features that lie in regions of homology between the two genomes are mapped. If the Genome Coordinate Conversion File does not yet exist, this workflow can generate one using either LastZ [1] or SegAlign [2] (GPU-optimized version of LastZ) from a pair of genome FASTA files. CrossMap [3] will use this chain file to convert the coordinates of genomic features in the target genome to the query genome. This file of genomic features in the target/reference genome can be in BED, VCF, BAM, or MAF file formats. CrossMap outputs a BED file with the location of these genomic features in the query genome. If provided with a second BED file with genomic features in the query genome, the workflow will filter converted genomic features for overlap with this second BED file [4].

Inputs

Outputs

Workflow Walkthrough

Results Walkthrough

Citations

Built with

image

FLAG: Eukaryote Genome Annotation

image

Annotate eukaryote genomes from an input FastA file.

Version 2.1.0

Use Cases

Annotate eukaryote genomes

Summary

Genome annotation uses computational algorithms to predict the locations of potential genes and tRNAs, a process known as structural annotation. Once locations are found they are functionally annotated by labeling with commonly used gene names, such as KRT8 and KRAS.

The longest part of this process is completed in several parallel steps, including RNA transcript to genome alignment, protein to genome alignment, and gene prediction. Once these steps are completed, the predicted genes and alignments are combined to form a consensus structural annotation. This structural annotation is then formatted uniformly to be similar to that of the NCBI and then functionally annotated with EnTAP.

Methods

This analysis was performed using the FLAG: Eukaryote Gene Annotation workflow on the Form Bio platform. First, if the input genome is unmasked, masking is done with WindowMasker [1], RepeatMasker [2], or RepeatModeler [3] in conjunction with RepeatMasker. Protein and transcript data are then aligned to the genome in parallel. Extra protein or transcript data can also be pulled from databases with BLAST. Depending on the predictors selected, gene prediction will be run in parallel or in series with protein and transcript alignments. After all alignments and gene predictions are finished, they are combined and filtered down to produce more complete consensus gene predictions and to filter out unlikely predictions. The protein coding annotations are also combined with tRNA annotations from tRNAScan. Once all annotations are filtered and combined, functional annotation (labeling genes such as KRAS, BRCA2, etc) is done with enTAP. Lastly, the structural and functional annotations are combined into a singular file and formatted in a gtf format similar to that of the NCBI. Finally, annotation statistics are calculated with AGAT [4] and BUSCO [5]. Further methodology can be found in the FLAG paper.

Workflow Walkthrough

Results Walkthrough

Citations

Built with

image

Prokaryotic (Meta)Genome Analysis

image

This workflow can analyze prokaryotic genome sequences. There are 3 modes: Cultured Genome Assembly (Single Prokaryotic Genome) and Gene Finding and Annotations; 16S rRNA Taxonomic Analysis; and Whole Shotgun Metagenomic Analysis.

Version 1.0.2

Use Cases

  • Analyze 16S rRNA sequences for use in taxonomic profiling
  • Assemble single-cultured prokaryote genomes, and find and annotate genes within the genome
  • Assemble and annotate metagenomes with metagenome-assembled genomes (MAGs)

Summary and Methods

This workflow has been designed to help the user analyze prokaryotic genomes. There are currently 3 supported modes: Cultured Genome Assembly (Single Prokaryotic Genome) and Gene Finding and Annotation, 16S rRNA Taxonomic Profiling, and Whole Shotgun Metagenomic Analysis.

Cultured Genome Assembly

16S rRNA Taxonomic Profiling

Whole Shotgun Metagenomic Analysis

Inputs

Outputs

Workflow Walkthrough

Results Walkthrough

Citations

Built with

image