Project information
Using Deep Learning to understand RNA Binding Protein binding characteristics
(DEEPLEARNRBP)
- Project Identification
- 867414
- Project Period
- 7/2019 - 5/2022
- Investor / Pogramme / Project type
-
European Union
- Horizon 2020
- MSCA Marie Skłodowska-Curie Actions (Excellent Science)
- MU Faculty or unit
- Central European Institute of Technology
New technologies have revolutionized our understanding of RNA binding protein (RBP) function. Global screens for RBPs have pulled down hundreds of proteins for which no discernable RNA Binding Domain is present. These proteins, termed enigmRBPs due to their enigmatic nature, do bind RNA in unknown and variable fashion. An ever increasing number of such RBPs are having their target sites identified via CrossLinking and ImmunoPrecipitation Sequencing techniques (CLIP-Seq). This torrent of data can be harnessed by novel Deep Learning techniques to identify high order characteristics of RBP function.
The aim of this proposal is the development of a machine learning model that can explore the functional implications of RBP binding characteristics. A model that, given an enigmatic RBP, can identify other known RBPs that show similar binding characteristics, such as sequence motifs, conservation motifs, secondary structure motifs, and higher order combinations of the above.
We will focus on methods to practically interpret the machine learning model to biological knowledge, especially higher order filters that can learn the interplay among varied input, such as secondary structure, sequence and conservation. Beyond the theoretical, we will disseminate our methods in easy to use, standalone and web application format, in order to increase the practical application of our research.
We are transplanting expertise from the bioinformatics and machine learning field, into a fertile substrate of RNA biology and CLIP-Seq experimentation. This interdisciplinary project will involve close collaboration and two-way transfer of knowledge in a dynamic research environment.
Publications
Total number of publications: 4
2023
-
Genomic benchmarks: a collection of datasets for genomic sequence classification
BMC Genomic Data, year: 2023, volume: 24, edition: 1, DOI
2022
-
ENNGene: an Easy Neural Network model building tool for Genomics
BMC Genomics, year: 2022, volume: 23, edition: 1, DOI
2021
-
Bioinformatics and Machine Learning Approaches to Understand the Regulation of Mobile Genetic Elements
BIOLOGY-BASEL, year: 2021, volume: 10, edition: 9, DOI
2020
-
PENGUINN: Precise Exploration of Nuclear G-Quadruplexes Using Interpretable Neural Networks
Frontiers in Genetics, year: 2020, volume: 11, edition: OCT, DOI