🖊️

Protein Engineering

Protein Structure and Function

image

Get information about protein sequences including identifying protein functional domains, predicting gene ontology and EC numbers, and predicting protein structure.

Version 2.1.0

Use Cases

Predict 3D structure and functional information of proteins, DNA, RNA, and complexes from FastA files, text sequences, or protein accession numbers

Summary

This workflow is designed to help the user determine the structure and function of protein sequences. This workflow is capable of identifying protein functional domains, predicting gene ontology and EC numbers, and predicting protein structure. The user will provide the sequences of interest as a FastA file, text sequence, or protein accession number, and will receive as output functional and structural information on the sequences.

Protein Functional Domains

Protein domains are functional and structural units within proteins that can be conserved across different proteins and species. Identifying protein domains is important for understanding protein function and evolution. One method for finding protein domains is to use a tool called Reverse PSI-BLAST (RPS-BLAST) to search against the Conserved Domain Database (CDD), a collection of well-characterized protein domains. The RPS-BLAST algorithm uses a profile-based approach to identify domains in protein sequences. It compares the query protein sequence to a database of protein domain profiles, searching for regions of the sequence that match a particular domain profile. These domain profiles are constructed from alignments of sequences that share a particular domain or motif and are therefore highly conserved.

When using RPS-BLAST to search the CDD, the query protein sequence is compared against the CDD’s library of domain profiles and the program returns a list of domains that match the query sequence along with statistical measures of the significance of each match. Once the protein domains have been identified, researchers can use this information to make predictions about the protein’s function, interactions, and evolutionary history. For example, if a protein contains a domain that is commonly found in enzymes, it is likely that the protein is also an enzyme, and the specific enzymatic activity can be inferred from the domain. Similarly, if a domain is found in many different species, it suggests that the domain is important for the protein’s function and has been conserved throughout evolution.

Predicting Protein Function

Annotating a gene for function involves identifying the specific biological activity that the gene encodes. This information can be used to better understand the role of the gene in biological systems and to identify potential targets for therapeutic interventions. One tool that can be used to annotate a gene for function is DeepFRI (Deep Functional Relevance Index), a deep learning-based tool that predicts gene ontology (GO) terms and enzyme commission (EC) numbers based on the gene’s sequence and other available information. To use DeepFRI to annotate a gene for function, one would first input the gene’s DNA or protein sequence into the tool. DeepFRI then uses a deep neural network to analyze the sequence and predict the gene’s functions, including specific biological processes, molecular functions, and cellular components associated with the gene.

In addition to predicting GO terms, DeepFRI can also predict EC numbers, which are a classification system used to identify specific enzyme activities. The tool uses the gene sequence to predict the catalytic activity of the protein encoded by the gene, which can help to identify potential drug targets and inform drug development efforts. Once the GO terms and EC numbers have been predicted by DeepFRI, researchers can use this information to better understand the function of the gene and its role in biological systems. They can also use this information to identify other genes with similar functions or to investigate potential drug targets based on the gene’s predicted enzyme activity. Overall, DeepFRI is a powerful tool for annotating genes for function and can help to accelerate research in a wide range of fields, from basic biology to drug discovery.

Protein Structure

Predicting the structure of a protein sequence is an important step in a thorough analysis of a protein sequence. RaptorX, RosettafoldNA, and AlphaFold are all powerful tools for predicting protein structure.

  • RaptorX is a protein structure prediction server that uses a variety of methods to predict the structure of a given protein sequence. It employs advanced machine learning techniques, such as deep neural networks, to accurately predict protein structures even in the absence of experimental data. RaptorX uses homology modeling, threading, and ab initio modeling to generate protein structure predictions and has been shown to be highly accurate in blind tests.
  • RosettaFoldNA is a protein structure prediction software developed by the RosettaCommons, a consortium of academic research groups. It uses a computational method called “de novo” protein structure prediction, which involves predicting the structure of a protein from scratch without using any information from existing protein structures. RosettaFoldNA employs a sophisticated energy function to evaluate candidate protein structures and uses a Monte Carlo search algorithm to explore the vast conformational space of protein structures. Along with this it can also be used to predict the 3-D structures of DNA, RNA, and mixed nucleotide/amino acid structures.
  • AlphaFold is a deep learning-based protein structure prediction system developed by the artificial intelligence research company DeepMind. It is based on a deep neural network that was trained on a large database of protein structures. AlphaFold can accurately predict protein structures with remarkable speed and accuracy and has outperformed other methods in several protein structure prediction challenges. AlphaFold uses a novel technique called “attention” to predict the relative positions of different parts of a protein, which allows it to predict protein structures with a high degree of accuracy.

These tools are all very powerful and can help researchers to better understand the structure and function of proteins, which is critical for developing new drugs and therapies.

Methods

This analysis was completed using the Protein Structure and Function workflow on the Form Bio platform. If the input is a protein sequence accession number then relevant databases are searched to pull the sequence FastA file. Otherwise, a FastA file or amino acid sequence is input as text and then made into a FastA file. After a FastA file has been obtained the file is formatted and split into individual protein sequences to run in parallel throughout the rest of the workflow. If the sequences given are monomers the workflow is split into 2 main routes. In the first route the sequence is RPS-BLASTed against the Conservative Domain Database [1] (CDD) and then run through deepFRI [2] for structure-based protein function prediction. In the second route, the sequence is run through a selection of structure prediction algorithms (Alphafold [3] or RaptorX [4]) in which multiple 2-D or 3-D structures are predicted and ranked. After both routes are finished the results are consolidated and put into a final report in HTML format. If the sequences given are multimers there is only a singular route for the workflow, during which Alphafold Multimer [5] is run to predict the 3-D multimer structure. In this process, multiple structures are predicted and then ranked and relaxed using Amber Force Fields. The result is multiple ranked PDB files as well as a final HTML report containing a fully interactive plot of the top-ranked model’s 3-D structure, predicted aligned error plots, predicted LDDT per position of the top 5 predictions, and a sequence coverage plot. Details on the run and a random hash are also added to ensure reproducibility and ease of tracking the predictions. Lastly if Nucleic Acid Structures is selected then RosettafoldNA [6] will be run to predict the 3-D structures of RNA, DNA, or mixed protein/nucleic acid structures. In this format a design file is used to specify the corresponding FastA files and designate them both for order in the chain structure being predicted and for designation of DNA, RNA, protein, or paired protein/RNA chains.

‣

Inputs

‣

Outputs

‣

Workflow Walkthrough

‣

Results Walkthrough

‣

Citations

Built with

image

Protein Design

image

Predict new protein sequence with a similar structure to a known sequence.

Version 1.0.0

Use Cases

This workflow can be used to redesign a protein sequence while maintaining its original structure

Summary

This workflow takes an input protein PDB file and can redesign the sequence of it while maintaining the same structure by using proteinmpnn. While doing this parameters can be set to add amino acid biases, avoid certain amino acids, or only redesign certain parts of the protein. It accepts both monomers and multimers [1]. 

‣

Inputs

‣

Outputs

‣

Workflow Walkthrough

‣

Citations

Built with

image