Gene Therapy

Gene Therapy

AAV PacBio Quality Control

image

Assess the completeness and contamination of PacBio sequenced adeno-associated virus (AAV) constructs by examining alignment coverage across sequences and among specific regions including the promoter and CDS.

Version 2.3.0

Use Cases

  • The user has completed PacBio HiFi sequencing for AAV constructs and wishes to characterize the quality of the sequencing results
  • The user has completed Illumina sequencing and wishes to detect variants in the output data’

Summary

This is a quality control workflow that can be used to characterize PacBio adeno-associated virus (AAV) products by examining alignment coverage across sequence regions of interest. The user will provide either BAM files from the PacBio sequencer run in AAV mode or the raw PacBio run data folder in Tar GZ format to include subreads and XML data. The user may optionally provide Illumina sequencing data for variant detection. For each run analyzed, the user will receive a report of the alignment statistics.

Methods

If masking is selected, vector sequence along with packaging plasmid sequences were can be used to mask the human genome using MUMmer [1]. If input data were PacBio uBAMs that have not been run in AAV mode, consensus contig reads are created using circular consensus sequencing (CCS) [2]. Reads are aligned to the reference sequences using Minimap2 [3]. A custom report of alignment statistics was generated using a workflow developed at PacBio. Resulting alignments are filtered for quality to include primary alignments and reads with mapping quality scores greater than 10. Counts and lengths of alignments to regions of interest are determined from alignment files using Bedtools [4]. If Illumina data is provided, reads are trimmed using TrimGalore [5], to trim low quality (qual < 25) ends of reads and remove reads < 35bp. Trimmed reads are aligned to a reference genome using Minimap2. Duplicate reads can optionally be marked using Picard MarkDuplicates [6]. BAMs from the same sample generated by multiple runs are merged using Samtools [7]. Replication errors can be detected using MuTect2 [8] and Freebayes [9]. Finally a report is generated with relevant quality metrics.

Tips and Tricks

The genomic regions file must be in the form of a BED file with no header and three mandatory columns: chrom (name of chromosome), chromStart, and chromEnd (the starting and ending positions of the feature in the chromosome). The file also takes 9 additional optional columns, including exon count and size as well as strand. More information can be found here.

Inputs

Outputs

Runtime Estimates

Average: 2 hours 18 minutes based on 15 test runs

image

Workflow Walkthrough

Results Walkthrough

Citations

Built with

image
image
image