A Genetic Algorithm Guided Formation of Spaced Dyads Coupled with an EM Algorithm for Motif Discovery

Previously, we reported GADEM, an efficient de novo motif discovery tool for large-scale genomic sequence data. We present an updated version, v1.3.1, that has improvements and additions. We added a 'seeded' analysis in which a user-specified position weight matrix (PWM) is the starting PWM model. Seeded analyses are at least 10x faster and perhaps more accurate than the already scalable 'unseeded' analyses, and can identify short and less abundant motifs, and variants of dominant motifs. We propose an approach for estimating the number of binding sites in the data, include non-uniform motif priors that take advantage of the high spatial resolution of ChIP-seq data. Finally, runs now report each motif's fold enrichment in input data vs. background/random sequence data. These changes substantially enhance GADEM's functionality and efficiency for motif discovery in large-scale genomic data.

This program was developed by Leping Li at the National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina 27709.

License

This work is made available under the GPL v3.

Download

Download the source code for the distribution of GADEM along with usage documentation and examples. (gadem_v1.3.1.tar.gz) (482KB) (last update 05/16/2011)

Building

In the main directory of the distribution, type

  • ./configure
  • then
  • make
  • make install

By default, the configure program will direct the executable files to /usr/local/bin which, in most cases, requires the user to "su" to root prior to the "make install" step. The target directory for the executable file can be overridden by specifying the --prefix option during the configure phase. For example,

  • ./configure --prefix=/home/GADEM_user

will direct the executables into /home/GADEM_user/bin directory.

The configure application accepts several arguments to tailor the build and installation process. Please see the INSTALL file contained in the root directory of the distribution for further details.

The source code and package were developed using Windows and tested on Linux (Fedora). Although the intent was to make the code portable to most U*IX variants, you may encounter minor build issues on other platforms. Feedback regarding any difficulties you may experience will be very helpful in improving the distribution package.

Contact

Leping Li, Ph.D.
Principal Investigator
Tel 984-287-3836
Fax 919-541-4311
[email protected]