\name{GetGeneGtf}
\alias{GetGeneGtf}
\title{
Generation of GTF (Gene transfer format) File with Gene Symbols
}
\description{
\code{GetGeneGtf} creates a new gene annotation GTF file from an input gene annotation GTF file and a gene-transcript reference file.
}
\usage{
GetGeneGtf(gene.file.name, transcript.gtf.name,	
           out.gtf.name = "modified.gtf")
}
\arguments{
  \item{gene.file.name}{
	The path (either relative or full) of the input gene-transcript reference file. See ``Details'' for the requirements of the gene-transcript reference file  
}
  \item{transcript.gtf.name}{
	The path (either relative or full) of the input gene annotation GTF file. See ``Details'' for the requirements of the GTF file.
}
  \item{out.gtf.name}{
	The path (either relative or full) of the output gene annotation GTF file.
}
}
\details{
\code{GetGeneGtf} is useful for those GTF files with the "gene_id" information in the last (9-th) column is missing or inaccurate (like the GTF file downloaded from UCSC genome browser).\\
The \code{gene.file.name} is the path of the input gene-transcript reference file, which should be a tab-delimited text file without header.
The first two columns of the file should be transcript name (column 1) and gene symbol (column 2).
It may contain other columns, but are not used by \code{GetGeneGtf}.

The \code{transcript.gtf.name} is the path of the gene annotation input GTF file.
The last (9-th) column should contain a mandatory "transcript_id" attribute for a GTF file.

The \code{out.gtf.name} is the path of the output gene annotation GTF file.
The new GTF file is the same as the input GTF file, except the last (9-th) column.
The last column of the new GTF file has the form "gene_id XXX; transcript_id YYY;",
where "XXX" is the gene symbol (inferred from \code{gene.file.name}) and "YYY" is the transcript name.
Such a GTF file is needed for \code{\link{IUTA}}.
}
\value{
No value is returned by \code{GetGeneGtf}.
}
\references{
See \url{http://mblab.wustl.edu/GTF22.html} for the details of GTF format.
}
\author{
Liang Niu
}
\note{
If the gene-transcript reference file with path \code{gene.file.name} does not provide a valid gene symbol for 
a transcript in the input gene annotation GTF file with path \code{transcript.gtf.name}, 
\code{GetGeneGtf} exclude all records of the transcript from the output GTF file.
At the end of \code{GetGeneGtf}, one or both of the following warningss are then reported:
if there is at least one transcript belong to different gene symbol,
the warning message is "Found transcript(s) belong to different genes in reference! Such transcript(s) are removed from the gene annotation!";
if there is at least one transcript in the input GTF file but not in the gene-transcript reference file, 
the warning message is "found transcript(s) in annotation but not in reference! such transcript(s) are removed from the gene annotation!".
}
\examples{
## get the paths of sample GTF file and sample reference file
transcript.gtf<-system.file("gtf","mm10_kg_sample.gtf",package="IUTA")
gene.transcript.ref<-system.file("gtf","gene_id.txt",package="IUTA")

## check the last (9-th) column of the first line of transcript.gtf 
## notice they are the same
print(read.delim(transcript.gtf,header=FALSE)[1,])

## run GetGeneGtf
GetGeneGtf(gene.transcript.ref,transcript.gtf,"modified.gtf")

## read in the new GTF file and check the gene_id attribute
print(read.delim("modified.gtf",header=FALSE)[1,])

## remove "modified.gtf"
file.remove("modified.gtf")
}
