Genetic algorithm for dimer-led and error-restricted spaced motif discovery

Document Type

Book chapter

Source Publication

Proceedings of the IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB 2013 - 2013 IEEE Symposium Series on Computational Intelligence, SSCI 2013

Publication Date

9-12-2013

First Page

198

Last Page

205

Publisher

Institute of Electrical and Electronics Engineers Inc.

Abstract

DNA motif discovery is an important problem for deciphering protein-DNA bindings in gene regulation. To discover generic spaced motifs which have multiple conserved patterns separated by wild-cards called spacers, the genetic algorithm (GA) based GASMEN has been proposed and shown to outperform related methods. However, the over-generic modeling of any number of spacers increases the optimization difficulty in practice. In protein-DNA binding case studies, complicated spaced motifs are rare while dimers with single spacers are more common spaced motifs. Moreover, errors (mismatches) in a conserved pattern are not arbitrarily distributed as certain highly conserved nucleotides are essential to maintain bindings. Motivated by better optimization in real applications, we have developed a new method, which is GA for Dimer-led and Error-restricted Spaced Motifs (GADESM). Common spaced motifs are paid special attention to using dimer-led initialization in the population initialization. The results on real datasets show that the dimer-led initialization in GADESM achieves better fitness than GASMEN with statistical significance. With additional error-restricted motif occurrence retrieval, GADESM has shown better performance than GASMEN on both comprehensive simulation data and a real ChIP-seq case study.

DOI

10.1109/CIBCB.2013.6595409

Publisher Statement

Copyright © 2013 IEEE. Access to external full text or publisher's version may require subscription.

Additional Information

Paper presented at the 10th Annual IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), Apr 16-19, 2013, Singapore.

ISBN of the source publication: 9781467358750

Full-text Version

Publisher’s Version

Language

English

Recommended Citation

Chan, T.-M., Lo, L.-Y., Wong, M.-L., Liang, Y., & Leung, K.-S. (2013). Genetic algorithm for dimer-led and error-restricted spaced motif discovery. In Proceedings of the IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB 2013 - 2013 IEEE Symposium Series on Computational Intelligence, SSCI 2013 (pp.198-205). Singapore: Institute of Electrical and Electronics Engineers Inc. doi: 10.1109/CIBCB.2013.6595409

Share

COinS