Genetic algorithm for dimer-led and error-restricted spaced motif discovery
Document Type
Book chapter
Source Publication
Proceedings of the IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB 2013 - 2013 IEEE Symposium Series on Computational Intelligence, SSCI 2013
Publication Date
9-12-2013
First Page
198
Last Page
205
Publisher
Institute of Electrical and Electronics Engineers Inc.
Abstract
DNA motif discovery is an important problem for deciphering protein-DNA bindings in gene regulation. To discover generic spaced motifs which have multiple conserved patterns separated by wild-cards called spacers, the genetic algorithm (GA) based GASMEN has been proposed and shown to outperform related methods. However, the over-generic modeling of any number of spacers increases the optimization difficulty in practice. In protein-DNA binding case studies, complicated spaced motifs are rare while dimers with single spacers are more common spaced motifs. Moreover, errors (mismatches) in a conserved pattern are not arbitrarily distributed as certain highly conserved nucleotides are essential to maintain bindings. Motivated by better optimization in real applications, we have developed a new method, which is GA for Dimer-led and Error-restricted Spaced Motifs (GADESM). Common spaced motifs are paid special attention to using dimer-led initialization in the population initialization. The results on real datasets show that the dimer-led initialization in GADESM achieves better fitness than GASMEN with statistical significance. With additional error-restricted motif occurrence retrieval, GADESM has shown better performance than GASMEN on both comprehensive simulation data and a real ChIP-seq case study.
DOI
10.1109/CIBCB.2013.6595409
Publisher Statement
Copyright © 2013 IEEE. Access to external full text or publisher's version may require subscription.
Additional Information
Paper presented at the 10th Annual IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), Apr 16-19, 2013, Singapore.
ISBN of the source publication: 9781467358750
Full-text Version
Publisher’s Version
Language
English
Recommended Citation
Chan, T.-M., Lo, L.-Y., Wong, M.-L., Liang, Y., & Leung, K.-S. (2013). Genetic algorithm for dimer-led and error-restricted spaced motif discovery. In Proceedings of the IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB 2013 - 2013 IEEE Symposium Series on Computational Intelligence, SSCI 2013 (pp.198-205). Singapore: Institute of Electrical and Electronics Engineers Inc. doi: 10.1109/CIBCB.2013.6595409