A Method for Identifying Transcription Co-regulator Binding Sites in ChIP-seq Data
A typical ChIP-seq experiment profiles the genome-wide binding of a single transcription factor. It is known that multiple transcription factors may work together to regulate gene expression. Most existing methods for motif discovery consider only one motif at a time. Here, we present a three-component mixture framework to model the joint distribution of two motifs as well as the situation where some sequences contain only one or none of the motifs. We used the expectation-maximization (EM) algorithm to numerically maximize the observed data likelihood with respect to the position weight matrices (PWMs) of the two motifs and the proportions of the sequences containing none (pure "noise"), one or both motif binding sites. Based on the estimates of the parameters from the EM procedure, we compute the posterior probabilities that any given sequence contains either motif, both motifs, or pure statistical "noise".
This program was developed by Tracy Xu, Clare Weinberg, David Umbach and Leping Li at the National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina 27709.
- This work is made available under the GPL v3.
- Download the source code for the distribution of coMotif along with usage documentation and examples. coMOTIF_v1.0.tar.gz (13MB)
In the main directory of the distribution, type
By default, the configure program will direct the executable files to /usr/local/bin which, in most cases, requires the user to "su" to root prior to the "make install" step. The target directory for the executable file can be overridden by specifying the --prefix option during the configure phase. For example,
will direct the executables into /home/coMOTIF_user/bin directory.
The configure application accepts several arguments to tailor the build and installation process. Please see the INSTALL file contained in the root directory of the distribution for further details.
The source code and package were developed using Windows and tested on Linux (Fedora). Although the intent was to make the code portable to most U*IX variants, you may encounter minor build issues on other platforms. Feedback regarding any difficulties you may experience will be very helpful in improving the distribution package.
Leping Li, Ph.D.
Deputy Chief, Biostatistics & Computational Biology Branch and Principal Investigator