Genomics: ProfileView, an innovative computational approach for the functional classification of protein families, presented by Sorbonne University

0
Genomics: ProfileView, an innovative computational approach for the functional classification of protein families, presented by Sorbonne University

Proteins are present in all living cells where they perform a multitude of functions. Some of them play essential roles in various fields such as human health, biology or biotechnology. To discover them, a research team from the Computational and Quantitative Biology Laboratory (Sorbonne University, CNRS), in collaboration with the Chloroplast Biology and Light Perception in Microalgae Laboratory (Sorbonne University, CNRS), has developed an innovative computational approach for the functional classification of protein families. She presented this researchentitled “Multiple profile models extract features from protein sequence data and resolve the functional diversity of very different protein families” in Molecular Biology and Evolution.

Functional classification of biological sequences is necessary to understand genomic and metagenomic sequence data. But there are thousands of protein sequences from the same ancestor that have been mutated and are involved in interaction with nucleic acids, amino acids and small molecules.

ProfileView, a computational approach for the functional classification of protein families

The team of the Computational and Quantitative Biology Laboratory has developed ProfileView to classify these thousands of sequences with a common ancestor by function.

This innovative approach is based on two concepts:

  • Use multiple probabilistic profile models to explore and extract evolutionary information from sequence databases;
  • Define a new sequence representation space where sequences are analyzed from the point of view of the functional motifs encoded in the profiles.

ProfileView has been validated on seven protein families, which are widespread in the environment, with both a wide variety of functions and significant sequence divergence.

One of them is the Cryptochrome-Photolyases family, which plays a role in various light-activated biological mechanisms and is studied in the laboratory Biologie du chloroplaste et perception de la lumière chez les microalgues (Sorbonne University, CNRS). Some of the members of this family of proteins are extremely important in medicine and biology as they have an important function in genome stability, cancer biology, regulation of circadian rhythms (biological clock) or optogenetic methodologies. Developed over the last ten years, optogenetics is a technique used in the field of neuroscience that consists of genetically modifying neurons to make them sensitive to light through the expression of a protein: opsin.

Results of the ProfileView approach

The experiments of the previous decades produced a lot of functional information that the team used to validate the ProfileView approach. The functional organization of the seven families they considered is consistent with the experimental evidence. In addition, ProfileView allows for as yet undefined functional classifications.

ProfileView increases understanding of the mechanisms developed by nature to harness light for functional purposes.

While ProfileView has been trained to classify whole protein sequences, it can also handle metagenomic sequences. Metagenomics sequences the genomes of several individuals of different species in a given environment that may never be isolated. Developing new approaches to explore their biology in complex ecosystems is paramount.

ProfileView allows us to increase our knowledge of the biology of organisms whose ecological role is recognized (such as marine microbes) but which are not yet accessible to functional investigations, thus opening up a new avenue for functional exploration.

This innovative computational approach to evolutionary processes and the complex space of natural sequences makes possible a general and accurate classification of protein family members, while highlighting functional patterns of interaction with other proteins, DNA, and small molecules, and thus paves the way for large-scale analyses.

Article source:

Multiple Profile Models Extract Features from Protein Sequence Data and Resolve Functional Diversity of Very Different Protein Families.
R. Vicedomini, J-P. Bouyly . E. Laine, A. Falciatore, A. Carbone.
https://doi.org/10.1093/molbev/msac070

Translated from Génomique : ProfileView, approche computationnelle innovante pour la classification fonctionnelle de familles protéiques, présentée par Sorbonne Université