Developing Advanced Representation Learning Techniques for mRNA Sequence and Structure Modeling

Loading...
Thumbnail Image

Authors

Nahali, Sepideh

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Recent studies in bioinformatics and genomics focus on analyzing RNA sequences, which are complex due to diverse nucleotide compositions, varying lengths, and multiple isoforms.

Accurately modeling these sequences is essential for predicting mRNA degradation, a key factor in designing effective RNA-based therapies. However, many existing models struggle to capture the intricate relationships between sequence and structure, limiting their predictive power.

We introduce StructmRNA, a BERT-based model using dual-level and conditional masking to embed RNA sequences and structures. This enables accurate prediction of mRNA sequences and structures without explicit structural data, effectively capturing sequence-structure dependencies. Evaluations show StructmRNA outperforms existing models in predicting mRNA degradation and secondary structure.

Experiments with GAN-generated RNA sequences showed no performance improvement. Nonetheless, StructmRNA’s consistent convergence over 30 epochs highlights its robustness and accuracy. This work advances RNA representation learning and demonstrates deep learning’s potential in RNA-based therapeutic design and bioinformatics.

Description

Keywords

Bioinformatics, Artificial intelligence, Genetics

Citation