Magierowski, Sebastian2019-07-022019-07-022019-02-122019-07-02http://hdl.handle.net/10315/36268Nanopore DNA sequencing is a method in which DNA bases are determined (basecalled) using electric current signals generated by passing DNA through nanopore sensors. The raw measured signals can be aggregated into event data presenting new bases entering the nanopore. This thesis has two contributions. First, we implemented RNN-based single- and double-strand basecallers for simulated event data to analyze the effect of signal noise. As the SNR decreased from 20 dB to 5 dB, the accuracy of the single-strand basecaller dropped 9% while the accuracy of double-strand basecaller only dropped 0.5%. Second, we implemented an end-to-end single-strand basecaller, directly processing the raw signal using an encoder-decoder model with attention instead of the CTC-style approach used in available basecallers. We achieved an accuracy of 81.9% for a viral sample and an accuracy of 90.9% for a bacterial sample. Our accuracy is comparable to state-of-the-art basecallers with a considerably smaller model.enAuthor owns copyright, except where explicitly noted. Please contact the author directly with licensing requests.Computer scienceAn Encoder-Decoder Based Basecaller for Nanopore DNA SequencingElectronic Thesis or Dissertation2019-07-02DNA SequencingNanopore SequencingDeep LearningRecurrent Neural NetworksSeq2seqAttention Mechanism