\name{module2TF}
\alias{module2TF}
\title{ find transcription factor that bind more genes in one module than expected by chance}
\description{
  based on the modules found by module.genes, and a transcription factor binding strength data, find the transcription factor that bind more genes in the module than expected by chance
}
\usage{
module2TF(moduleGenes, bindData, bindExp, eInfo, matchCol, bindData.colOff = 0, p.cut = 0.001, p.hyper = 0.01)
}
\arguments{
  \item{moduleGenes}{ result of function module.genes }
  \item{bindData}{ transcription factor binding data, each row is one gene, the first \code{bindData.colOff} columns store information of each gene. Each of other columns corresponds to one transcription factor. Each data point is the binding p-value that transcription factor binds the gene, smaller p-value means higher confidence of binding. }
  \item{bindExp}{ transcription factor information }
  \item{eInfo}{ data frame of all the genes' information. This data.frame should include at least five variables: ID, geneSym, chr (chromosome), start and end}
  \item{matchCol}{ use \code{matchCol} to match gene information in eInfo and binding data in bindData. For example, \code{matchCol} can be ORF name for yeast }
  \item{bindData.colOff}{ first \code{bindData.colOff} columns of bindData store information of each gene }
  \item{p.cut}{ the cut-off of binding p-value, we say one transcription factor binds one gene if p < \code{p.cut}  }
  \item{p.hyper}{ the cut-off of hypergeometric distribution p-value.}
}
\details{
  Use hypergeometric distribution to quantify whether one transcription factor (TF) binds more genes in a module than expected by chance. p = 1-phyper(q-1, m, n, k), where m is the number of the genes that the TF binds, and n is the number of genes that the TF does not bind, k is the number of the genes in the module, and q is the number of the genes in the module and bound by the TF. The correlation between log(binding p-value) and log(linkage p-value) is also calculated, and corresponding p-value is also calculated. Strong correlation between log(binding p-value) and log(linkage p-value) implies that the TF is responsible for the linkage. 
}
\value{
a data.frame including the following variables: 
    \item{\code{modSym}}{the module symbol}
    \item{\code{TF}}{the transcriptional factor}
    \item{\code{q}}{the number of the genes in the module and bound by the TF}
    \item{\code{m}}{number of the genes that the TF binds}
    \item{\code{n}}{the number of genes that the TF does not bind}
    \item{\code{k}}{the number of the genes in the module}
    \item{\code{p}}{the hypergeometric p-value: p = 1-phyper(q-1, m, n, k)}
    \item{\code{cor}}{the correlation between log(binding p-value) and log(linkage p-value)}
    \item{\code{p.cor}}{p-value of the correlation}
    \item{\code{bind.gene}}{genes bound by the transcription factor}
}

\references{ None. }
\author{ Wei Sun sunwei@stat.ucla.edu }
\note{ }

\seealso{\code{\link{eqtl.module}} \code{\link{module.genes}}}
\examples{

data(moduleGenes.y112)
data(yeastBind.data)
data(yeastBind.exp)
data(eInfo.y112)

moduleTF = module2TF(moduleGenes.y112, yeastBind.data, yeastBind.exp, 
eInfo.y112, matchCol="ORF", bindData.colOff = 3)

}
\keyword{ methods }
