From
Time
Location
STL, Thesis
Speaker : Clément BERNARD
RNAs are, like proteins, biological molecules that play essential roles at various stages in the life of an organism and are involved in various diseases. Determining their structure, especially 3D, is essential to understand their function better. Recently, Google DeepMind proposed a method called AlphaFold, for the prediction of the 3D structure of proteins based on deep learning, which revolutionized the field by showing a high outperformance compared to the state-of-art. However, RNA and protein molecules differ significantly in structure and dynamics, making it non-trivial to apply protein- based methods directly to RNA. AlphaFold, AlphaFold 2, as well as AlphaFold 3, its new version that also predicts RNA 3D structure, rely heavily on multiple sequence alignments (MSAs) as input, which are expensive to compute and not always available, especially for RNAs.
In this thesis, we aim to get ride of the MSA information for the prediction of RNA 3D structures. We seek to develop methods to predict RNA 3D structures from sequence information only. For this, we leverage deep learning methods and particularly language-based models
to map sequences to structure features. By using language-based models pretrained on a large set of RNA sequences, we can learn RNA structural features and then predict the 3D structure.
The work in this thesis is separated into three main contributions. The first, called RNAdvisor, is a tool that wraps the state-of-the-art RNA 3D structure assessment tools to comprehensively evaluate RNA 3D structures, both with and without experimental references. The second contribution, State-of-the-RNArt, is a benchmark of the state-of-the-art RNA 3D structure prediction methods, highlighting current methods’ limitations and challenges. It is followed by a more detailed analysis of the limitations of AlphaFold 3. The third contribution, RNA-TorsionBERT, is a deep learning method that predicts the torsion angles of RNA 3D structures from the sequence, which are an important feature of RNA 3D structures. It leverages a language-based model to map sequences to structure features. It is extended to a new scoring function, TorsionBERT-MCQ, that can assess the quality of RNA 3D structures in torsional space. This work is a step towards the development of deep learning methods for RNA 3D structure prediction, using only sequence information and not relying on costly multiple-sequence alignments.