Developing Advanced Representation Learning Techniques for mRNA Sequence and Structure Modeling

dc.contributor.advisorHuang, Jimmy
dc.contributor.authorNahali, Sepideh
dc.date.accessioned2025-11-11T19:58:31Z
dc.date.available2025-11-11T19:58:31Z
dc.date.copyright2025-06-26
dc.date.issued2025-11-11
dc.date.updated2025-11-11T19:58:30Z
dc.degree.disciplineInformation Systems and Technology
dc.degree.levelMaster's
dc.degree.nameMA - Master of Arts
dc.description.abstractRecent studies in bioinformatics and genomics focus on analyzing RNA sequences, which are complex due to diverse nucleotide compositions, varying lengths, and multiple isoforms. Accurately modeling these sequences is essential for predicting mRNA degradation, a key factor in designing effective RNA-based therapies. However, many existing models struggle to capture the intricate relationships between sequence and structure, limiting their predictive power. We introduce StructmRNA, a BERT-based model using dual-level and conditional masking to embed RNA sequences and structures. This enables accurate prediction of mRNA sequences and structures without explicit structural data, effectively capturing sequence-structure dependencies. Evaluations show StructmRNA outperforms existing models in predicting mRNA degradation and secondary structure. Experiments with GAN-generated RNA sequences showed no performance improvement. Nonetheless, StructmRNA’s consistent convergence over 30 epochs highlights its robustness and accuracy. This work advances RNA representation learning and demonstrates deep learning’s potential in RNA-based therapeutic design and bioinformatics.
dc.identifier.urihttps://hdl.handle.net/10315/43254
dc.languageen
dc.rightsAuthor owns copyright, except where explicitly noted. Please contact the author directly with licensing requests.
dc.subjectBioinformatics
dc.subjectArtificial intelligence
dc.subjectGenetics
dc.subject.keywordsBioinformatics
dc.subject.keywordsmRNA degradation prediction
dc.subject.keywordsmRNA sequences
dc.subject.keywordsSecondary structures
dc.subject.keywordsStructmRNA model
dc.subject.keywordsMachine learning
dc.subject.keywordsTwo-level masking
dc.subject.keywordsConditional masking
dc.subject.keywordsSynthetic RNA data
dc.subject.keywordsBERT model
dc.subject.keywordsSequence-structure relationship
dc.titleDeveloping Advanced Representation Learning Techniques for mRNA Sequence and Structure Modeling
dc.typeElectronic Thesis or Dissertation

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Nahali_Sepideh_2025_MA.pdf
Size:
7.28 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.87 KB
Format:
Plain Text
Description:
Loading...
Thumbnail Image
Name:
YorkU_ETDlicense.txt
Size:
3.39 KB
Format:
Plain Text
Description: