\name{IUTA}
\alias{IUTA}
\title{
Isoform Usage Two-step Analysis
}
\description{
\code{IUTA} takes RNA-Seq alignment files (in BAM format) from two groups of samples, together with a gene annotation file (in GTF format) for the related species,
to test for differential isoform usage (set of relative abundances of isoforms) for each of the inquired genes.
It outputs two tab-delimited files (with header): ``estimates.txt'' and ``p_values.txt''.
See ``Details'' for the details of \code{IUTA} and the details of the two output files.
}
\usage{
IUTA(bam.list.1, bam.list.2, transcript.info,
     rep.info.1 = rep(1, length(bam.list.1)),
     rep.info.2 = rep(1, length(bam.list.2)), 
     output.dir = paste(getwd(), "/IUTA", sep = ""),
     output.na = FALSE,
     genes.interested = "all", 
     strand.specific = rep("1.5",
                    length(rep.info.1)+length(rep.info.2)),
     gene.filter.chr = c("_", "M", "Un"),	
     mapq.cutoff = NA, alignment.per.kb.cutoff = 10,
     IU.for.NA.estimate = "even",	
     sample.FLD = FALSE, FLD = "empirical", 
     mean.FL.normal = NA, sd.FL.normal = NA, 
     number.samples.EFLD = 1e+06, 
     isoform.weight.cutoff = 1e-4,
     adjust.weight = 1e-4, epsilon = 1e-05, 
     test.type = "SKK", log.p = FALSE, fwer = 1e-2,
     mc.cores.user = NA)
}
\arguments{
  \item{bam.list.1}{
	A character vector of paths (either relative or full) of the BAM files for the replicates of samples in group one.
	It has \eqn{r_1+r_2+\cdots+r_{n_1}} elements, where \eqn{r_i} is the number of replicates of sample \eqn{i} (\eqn{i=1, 2, \cdots, n_1}) in group one
	and \eqn{n_1} is the number of samples in group one.
	The paths for the replicates of the same sample should be placed together, i.e.,
	the first \eqn{r_1} elements of \code{bam.list.1} should be the paths of the \eqn{r_1} replicates of sample 1,
	the next \eqn{r_2} elements of \code{bam.list.1} should be the paths of the \eqn{r_2} replicates of sample 2, etc.
	See ``References'' for a reference for BAM format.
}
  \item{bam.list.2}{
	A character vector of paths (either relative or full) of the BAM files for the samples in group two.
	The format of \code{bam.list.2} is the same as the format of \code{bam.list.1}.
	See ``References'' for a reference for BAM format.
}
  \item{transcript.info}{
	The path (either relative or full) of the GTF file for the related species. 
	See ``References'' for a reference of GTF format.
	See ``Details'' for the requirement of the input GTF file.
}
  \item{rep.info.1}{
	The technical replicate information for the samples in group one.
	It is a \eqn{n_1}-dimensional vector with the \eqn{i}-th entry be the number of technical replicates in the \eqn{i}-th sample in group one,
	where \eqn{n_1} is the number of samples in group one.
	The default is set in such a way that each sample has only one technical replicate.
}
  \item{rep.info.2}{
	The technical replicate information for the samples in group two.
	It is a \eqn{n_2}-dimensional vector with the \eqn{i}-th entry be the number of technical replicates in the \eqn{i}-th sample in group two,
        where \eqn{n_2} is the number of samples in group two.
        The default is set in such a way that each sample has only one technical replicate.	
}
  \item{output.dir}{
	The path (either relative or full) of the directory in which the two output files are stored.
	If the directory does not exist, \code{IUTA} will create it.
	The default is the subdirectory ``IUTA'' under the working directory.
}
  \item{output.na}{
	Whether to include genes with \code{NA} results in the two output text files or not. 
	If it is \code{TRUE}, all inquired genes are included in the two output text files.
	If it is \code{FALSE} (default), genes with \code{NA} as the estimated isoform usages in ALL samples of the two groups are excluded from ``estimates.txt''
	and genes with \code{NA} as the p-values of ALL tests (including all user-specified tests in \code{test.type}) are excluded from ``p-values.txt''.
}
  \item{genes.interested}{
	A character vector of the inquired gene names.
	The default is \emph{"all"}, i.e.,  all genes with more than two isoforms in the filtered gene annotation GTF file. 
	See ``Details'' for the details of filtering process for the gene annotation GTF file.
}
  \item{strand.specific}{
	A character vector of length \eqn{n_1+n_2}, where \eqn{n_1} is the number of samples in group one and \eqn{n_2} is the number of samples in group two.
	The \eqn{i}-th element (either "1", or "2", or "1.5") of \code{strand.specific}
	indicates that which read (in a read pair) has the same orientation as the mRNA molecule from which the read pair was sequenced from the replicates of sample \eqn{i}.
	Specifically, if all replicates of sample \eqn{i} were sequenced by a strand-specific protocol such that the first read of each pair has the same orientation as the mRNA molecule from which the read pair was sequenced,
	then the \eqn{i}-th element of \code{strand.specific} is set as "1";
	if all replicates of sample \eqn{i} were sequenced by a strand-specific protocol such that the second read of each pair has the same orientation as the mRNA molecule from which the read pair was sequenced,
	then the \eqn{i}-th element of \code{strand.specific} is set as "2";
	if all replicates of sample \eqn{i} were sequenced by a non-strand-specific protocol (such as the standard Illumina),
	then the \eqn{i}-th element of \code{strand.specific} is set as "1.5".
}
  \item{gene.filter.chr}{
	A character vector of symbols that are used to filter the genes on ``irregular'' chromosomes.
	Specifically, all genes with at least one transcript on chromosomes with these symbols are filtered from the GTF file with path \code{transcript.info}
	and are not considered in the IUTA analysis.
	The default is c("_","M","Un"), which correspond to "chrN_random" (\eqn{N} is a chromosome number), "chrM" and "chrUn".
	If the user wants to keep all the ``irregular'' chromosomes in consideration for IUTA analysis,
	set \code{gene.filter.chr} be \code{NA}.
}
  \item{mapq.cutoff}{
	The mapping quality cut-off that is used to filter the RNA-Seq read pairs for \code{IUTA}.
	If it is \code{NA} (default), ALL read pairs will be used for \code{IUTA}.
	Otherwise, only reads pairs with both mapping qualities bigger than \code{mapq.cutoff} are used for \code{IUTA} analysis. 
}
  \item{alignment.per.kb.cutoff}{
	The unit (per kilobases) cut-off number of ``valid'' alignments (read pairs) that are needed for \code{IUTA} to estimate the isoform usage of a gene.
	That is, \code{IUTA} estimates the isoform usage of a gene in a sample only when the number of ``valid'' read pairs
	is bigger than \code{alignment.per.kb.cutoff} times the length of the union of exons (in unit of kilobases). The default of \code{alignment.per.kb.cutoff} is 10.
	See ``Details'' for the definition of a "valid" read pair. 
}
  \item{IU.for.NA.estimate}{
	The way that the isoform usage of a gene is estimated for a sample when it cannot be estimated from the data. 
	The ``artificial'' estimates obtained in this way are only used for (differential isoform usage) testing purpose.
	\code{IU.for.NA.estimate} is only valid when both groups have at least two samples for which the isoform usages of the gene can be estimated from the data (otherwise no test can be performed).
	\code{IU.for.NA.estimate} can be either "even" (default), or "average", or "none".
	See ``Details'' for more details.
}
  \item{sample.FLD}{
	Whether the fragment length distribution (FLD) for each sample is sample-specific or group-specific.
	The default is \emph{FALSE}, i.e., use the group-specific FLD for each sample in the group.
	The group-specific FLD is the FLD determined for the first sample of the group.
}
  \item{FLD}{
	Whether to use empirical (\emph{"empirical"}) FLD (EFLD) or normal (\emph{"normal"}) FLD.
	If it is \emph{"empirical"}, the EFLD is used and it is estimated from the data.
	If it is \emph{"normal"}, a discrete normal distribution is used as FLD.
	In the latter case, user can specify the mean and the standard deviation (sd) via \code{mean.FL.normal} and \code{sd.FL.normal};
	if user does not specify the mean or/and the standard deviation of the normal FLD, the corresponding estimate(s) from the raw EFLD (i.e., before smoothing) will be used.
}
  \item{mean.FL.normal}{
	The mean of the normal FLD. Only valid if FLD="normal".
	The default \code{NA} is set so that the mean of the raw EFLD (i.e., before smoothing) is used.
}
  \item{sd.FL.normal}{
	The standard deviation of the normal FLD. Only valid if FLD="normal".
	The default \code{NA} is set so that the sd of raw EFLD (i.e., before smoothing) is used.
}
  \item{number.samples.EFLD}{
	The maximum number of sample fragments used to estimate EFLD.
	The default is \eqn{10^6}.
}
  \item{isoform.weight.cutoff}{
        A small non-negative value (less than 1) that is used to determine whether to keep an isoform in consideration when testing for differential isoform usage.
	Specifically, those isoforms whose estimated relative abundances are no more than \code{isoform.weight.cutoff} in all samples (excluding those in which isoform usage cannot be estimated from the data)
	are not considered when testing for differential isoform usage. The default is \eqn{10^{-4}}.
	See ``Details'' for more details.
}
  \item{adjust.weight}{
	A small positive value that is used to adjust the estimated isoform usage for testing purpose.
	Specifically, for the isoforms that are considered in the test(s) of differential isoform usage, 
	all estimated relative abundances that are smaller than \code{adjust.weight} (including those zeros) are replaced by \code{adjust.weight}
	and the tests of differential isoform usage are based on the adjusted estimates.
	The main purpose of such adjustment is to make the isometric logratio transformation (ilr) applicable for the estimated isoform usages,
	since ilr is not applicable when an isoform usage has zero entries.
	The default is \eqn{10^{-4}}.
}
  \item{epsilon}{
	A small positive value used in the stop criterion of the EM algorithm for estimating isoform usages.
	The EM stops when the (Euclidean) distance between two consecutive estimations is smaller than \code{epsilon}.
	The default is \eqn{10^{-5}}. 
}
  \item{test.type}{
	A character vector consists of the test types that the user wants to use for testing differential isoform usage in \code{IUTA}.
	Three types of test are available: "SKK" (default), "CQ" and "KY".
	The character vector is composed using the three test types, e.g., c("SKK","CQ"), or c("CQ","SKK","KY").
	See ``Details'' and ``References''. 
}
  \item{log.p}{
	Whether to output logarithm of p-values or p-values.
	The default is \code{FALSE}, i.e., to output p_values.
}
  \item{fwer}{
        The family-wise error rate (FWER) that the user wants to control for the the main test (the first test in \code{test.type}).
	In \code{IUTA}, the FWER is controlled by Bonferroni correction.
	Specifically, all genes with p-values less than \eqn{\code{fwer}/n_t} are claimed as genes with differential isoform usage,
	where \eqn{n_t} is the total number of valid tests.
        The default is \code{0.01}.
}
  \item{mc.cores.user}{
	The number of cores to use, i.e. at most how many child processes will be run simultaneously.
	The default (\code{NA}) is set to use all cores that R detects on the machine.
	Note that in windows \code{mc.cores.user} has to be set to be 1, 
	since the function \code{\link[parallel]{mclapply}} used in \code{IUTA} is not applicable when \code{mc.cores.user} bigger than 1. 
}
}
\details{
\code{IUTA} first checks the input gene annotation GTF file with path \code{transcript.info} to remove records for the following three types of genes:
those with isoforms on ``irregular'' chromosomes
(according to \code{gene.filter.chr}, e.g., when  \code{gene.filter.chr}=c("_","M","Un") (default), the ``irregular'' chromosomes are chrN_random (\eqn{N} is a chromosome number), chrM and chrUn),
those with isoforms on different chromosomes and those with isoforms on different strands.
The new GTF file consists of the remaining records is used for the further analysis.
If \code{genes.interested} is \code{"all"}, \code{IUTA} then estimates the isoform usage and tests for differential isoform usage for all the genes with at least two isoforms in the new GTF file;
otherwise, \code{IUTA} removes the genes that are not in the new GTF file from \code{gene.interested} and performs further analysis on the remaining genes in \code{gene.interested},
the number of removed genes is reported.

After the genes for the further analysis are selected,
\code{IUTA} combines the BAM files of the technical replicates into a single BAM file for each sample,
and then iterates such BAM files one by one to estimate the isoform usage for each selected genes in each sample using the fragment length distribution (FLD) for the sample.
The FLD can be either an empirical fragment length distribution (EFLD) or a discrete normal distribution, depending on \code{FLD}, and it can be either identical across the samples within a group
or sample-specific, depending on \code{sample.FLD}.

If \code{FLD="empirical"} and \code{sample.FLD="true"}, the FLD for the sample is set to be a sample-specific EFLD that is obtained from the (possibly combined) BAM file for the sample.
To obtain the EFLD, \code{IUTA} makes use of those "stand-alone" exons in the (filtered) GTF file,
i.e., exons that do not overlap with any exons of any gene but themselves and proceeds in iterations.
Specifically,
In each iteration,
\code{IUTA} selects 1000 ``stand-alone'' exons in the decreasing order of exon length,
and reads the (possibly combined) BAM file to select paired-end reads that satisfy all the following three requirements:
both reads in the pair fall into any of the ``stand-alone'' exons selected for the iteration;
both reads in the pair has mapping quality bigger the \code{mapq.cutoff} (when \code{mapq.cutoff} is not \code{NA}, otherwise, this requirement is ignored);
both reads in the pair have flags consistent with the \code{strand.specific} and the direction of the exon they fall in.
For each such read pair, a fragment is inferred and the fragment length is recorded.
The iteration stops when either the number of inferred fragments exceeds \code{number.samples.EFLD}
or the ``stand-alone'' exons are all used.
The raw EFLD is then the relative frequency distribution of recorded lengths.
The mean and standard deviation (sd) of the raw ELFD, together with the number of recorded fragments, are reported.
By smoothing the raw EFLD by a smoothing window of length \eqn{11},
i.e., the function value of length \eqn{l} is the average of relative frequencies of fragments with length between \eqn{l-5} and \eqn{l+5},
and then standardizing the resulted function, the EFLD is obtained.

If \code{FLD="empirical"} and \code{sample.FLD="false"},
then the FLD for the sample is set to be the EFLD that is obtained by the above procedure from the (possibly combined) BAM file for the first sample in the group,
thus the FLD is group-specific.

If \code{FLD="normal"}, a discrete normal distribution is used as the FLD for the sample.
A warning "Please consider using EFLD estimates for Fragment Length Distribution if they are much different from the user specified ones!" is printed,
either for each sample (when \code{sample.FLD="true"}) or for the first sample of each group (when \code{sample.FLD="false"}).
If \code{sample.FLD="true"}, the normal FLD is sample-specific; otherwise the normal FLD is group-specific.
In fact, \code{sample.FLD} takes no effect when the user specifies both the mean (via \code{mean.FL.normal}) and the sd (via \code{sd.Fl.normal}),
since the same mean and sd are used for all samples.
However, \code{sample.FLD} takes effect when the mean and/or the sd are not specified.
Specifically, when the mean and/or the sd are not specified,
\code{IUTA} sets the mean and/or the sd as the mean and/or the sd of the raw EFLD of the sample when \code{sample.FLD="true"};
and sets the mean and/or the sd as the mean and/or the sd of the raw EFLD of the first sample in the group when \code{sample.FLD="false"}.

Once FLD is achieved, \code{IUTA} starts to estimate the isoform usages gene by gene.
For each gene of interest, \code{IUTA} reads the (possibly combined) BAM file to get reads that fall into the gene region (including both exons and introns)
and selects paired-end reads satisfying the following three requirements:
both reads in a pair are consistent at least one isoform of the gene (i.e., can be from a fragment of the isoform);
both reads in a pair has mapping quality bigger than \code{mapq.cutoff} (when \code{mapq.cutoff} is not \code{NA}, otherwise, this requirement is ignored);
both reads in a pair have flags consistent with the \code{read.direction} and the direction of the gene.
Then for each such pair, \code{IUTA} calculates the length of its corresponding fragment on each compatible isoform and
calculates the probabilities of lengths based on FLD;
if all such probabilities are zero, the pair is then discarded.
All the remaining pairs are called ``valid'' pairs.
If there are enough ``valid'' pairs, i.e., more than the product of \code{alignment.per.kb.cutoff} and the length of the union of exons (in unit of kilobases),
\code{IUTA} then performs an EM algorithm to find the MLE of isoform usage based on the IUTA model (see ``References'' for IUTA model)
using the length information as observed data;
otherwise, \code{IUTA} records \code{NA} as the estimated isoform usage for the gene in the sample.
The estimated isoform usage is written into the tab-delimited text file ``estimates.txt'' at the end of \code{IUTA}.

After a sample is processed, \code{IUTA} reports a summary of the analysis for the sample.
In the summary, \code{IUTA} reports the number of genes with no reads after filtering,
the number of genes with no data fits annotation,
the number of genes with no enough data fits FLD
and the number of genes with isoform usages estimated.

After \code{IUTA} processed all the samples and gets the estimated isoform usages for the genes of interest in all samples,
\code{IUTA} then tests for differential isoform usage for each gene using the estimated gene isoform usages.

For each gene, \code{IUTA} requires that there are at least two valid estimates, i.e., not \code{NA}, in both groups,
otherwise, \code{IUTA} cannot perform any test and records \code{NA} as the p-value.
\code{IUTA} also assumes zero relative abundance (in both groups) for the isoforms with small (less than \code{isoform.weight.cutoff}) estimated relative abundances across all samples,
and performs tests based on the isoform usage formed by the relative abundances of the other, say \eqn{K}, isoforms.
If \eqn{K=0}, then \code{IUTA} cannot perform any tests and records \code{NA} as the p-value;
if \eqn{K=1}, then \code{IUTA} assumes that there is only one isoform are produced in all samples and records the p-value as 1 (or 0 when \code{log.p=TRUE});
if \eqn{K>1}, then \code{IUTA} replaces the small (less than \code{adjust.weight}, can be zero) entries of the estimated isoform usages (\eqn{K}-dimensional) by \code{adjust.weight},
such replacement has two advantages:
first, it makes the isometric logratio transformation (ilr, see ``References'') be applicable to the estimated isoform usages,
as ilr is not applicable to an estimated isoform usage with zero entries;
second, it makes the tests less sensitive to the (isoform usage) estimation error caused by the noise in the alignment data,
as such error can affect the test result dramatically.
Note that the number of the valid estimates in each group is recorded and later output as the ``test_sample_size'' for the gene in the text file ``p_values.txt'', for all genes of interest.

In addition to the above data preprocessing procedures,
\code{IUTA} also checks the argument \code{IU.for.NA.estimate}
to decide whether and how the extra ``artificial'' estimated isoform usage should be created to perform the test(s) of differential isoform usage.
Specifically,
if \code{IU.for.NA.estimate} is \code{``even''} (default),
\code{IUTA} assumes that the estimated isoform usage is a \eqn{K}-dimensional vector with all entries equal to  \eqn{\frac{1}{K}} for each sample with no valid estimated isoform usage,
if \code{IU.for.NA.estimate} is \code{``average''},
code{IUTA} assumes that the estimated isoform usage is the average (in Aitchison geometry) of the valid estimated isoform usages of the corresponding group for each sample with no valid estimated isoform usage,
if \code{IU.for.NA.estimate} is \code{``none''},
code{IUTA} does not create ``artificial'' estimated isoform usages for the samples with no valid estimated isoform usage.
In the first two cases ( \code{IU.for.NA.estimate} is \code{``even''} or \code{``average''}),
both ``artificial'' estimates and valid (data-based) estimates are used to perform the tests.
In general, setting \code{IU.for.NA.estimate} as \code{``even''} makes the test results more conservative,
that is, the test results have low type I error rates;
and setting \code{IU.for.NA.estimate} as \code{``average''} makes the test results more powerful,
that is, the test results have higher power.

To do tests, \code{IUTA} performs ilr to all the estimates, which may include the valid isoform usage estimates (possibly adjusted) and the ``artificial'' estimates,
to transform these \eqn{K}-dimensional vectors to \eqn{K-1}-dimensional vectors.
\code{IUTA} assumes the transformed estimates follow group-specific multivariate normal distributions and performs the user-specified test(s) in \code{test.type} (whenever applicable).
Notice that when \eqn{K=2}, the ``KY'' test becomes Welch's t-test.
Since ilr is a isometric transformation between \eqn{K}-dimensional open simplex with Aitchison geometry (See ``References'')
and \eqn{(K-1)}-dimensional real space, the test for equal group mean of the transformed estimates is equivalent to the test of equal group mean (in Aitchison geometry) of the original estimates.
All p-values (or log of it, if \code{log.p=TRUE}) are recorded and are written in the tab-delimited text file ``p_values.txt''.

Finally, \code{IUTA} outputs two tab-delimited text files with header, ``estimates.txt'' and ``p_values.txt'', in the directory with path \code{output.dir}. 
The file ``p_values.txt'' contains a table with \eqn{3+1+1+(m-1)+1} columns, where \eqn{m} is the number of tests in \code{test.type}.
The first three columns are ``gene'' (gene name), ``number_of_isoform'' (number of isoforms of the gene), ``test_sample_size'' (number of samples of each group in which the isoform usage can be estimated, separated by comma).
The fourth column is ``test'', which is the type of test used to calculate the next column ``p_value'' (either the first test type in \code{test.type}, or \code{NA} when the test outputs \code{NA}).
The fifth column is ``p_value'', which is the output p-value for the gene by the test in column ``test''.
The next \eqn{m-1} columns corresponding to the p-values by the tests in \code{test.type} except the first type of test in \code{test.type}.
If \code{log.p=TRUE}, the logarithm of p-values are output instead of p-values;
Notice that ``KY'' test is only applicable for genes with number of samples in each group bigger than \eqn{K-1},
otherwise the output p-value for ``KY'' is \code{NA}.
The last column is ``significant'', which can be either ``yes'', or ``no'', or \code{NA}.
This is determined by the fifth column ``p_value'' and the family-wise error rate \code{fwer} that the user wants to control by the Bonferroni correction.
Specifically, all genes with p-value (the ``p_value'' when \code{log.p}=``FALSE''; the exponential of ``p_value'' when \code{log.p}=``TRUE'')
less than \eqn{\code{fwer}/n_t} are claimed as genes with differential isoform usage, i.e., with ``significant'' as ``yes'';
all genes with ``p-value'' no less than \eqn{\code{fwer}/n_t} are claimed as genes with same isoform usage, i.e., with ``significant'' as ``no'';
all genes with ``p-value'' as \code{NA} has ``significant'' as \code{NA},
where \eqn{n_t} is the number of valid tests, i.e., the number of genes with valid ``p-value''s (not \code{NA}). 
The table is sorted by the column ``p_value'' in increasing order.
The file ``estimates.txt'' contains a table with \eqn{2+n_1+n_2} columns:
the first two columns are ``gene'' (gene name) and ``isoform'' (isoform of the gene);
the next \eqn{n_1} columns are the estimates of relative isoform abundance of the isoform from samples in group one;
the last \eqn{n_2} columns are the estimates of relative isoform abundance of the isoform from samples in group two.
The name of each of the last \eqn{n_1+n_2} columns is the file name of the BAM file of the first replicate of the corresponding sample, with extension ``.bam'' omitted.
The gene order of the table is same as in the table in ``p_values.txt'', and the corresponding isoforms of each gene are ordered alphabetically.
There are two comment lines on the top of the table,
which provide information about the number of genes analyzed, sample sizes and that which (``normal'' or ``empirical'') FLD is used.
}
\value{
	No value is returned by \code{IUTA}.
}
\references{
Pawlowsky-Glahn, V. and Egozcue, J. J. (2001). 
Geometric approach to statistical analysis on the simplex. 
\emph{Stochastic Environmental Research and Risk Assessment}, \bold{15(5)}, 384--398.

Egozcue, J. J., Pawlowsky-Glahn, V., Mateu-Figueras, G., and Barcel\'{o}-Vidal,C. (2003).
Isometric logratio transformations for compositional data analysis.
\emph{Mathematical Geology}, \bold{35(3)}, 279--300.

Liang Niu, Weichun Huang, David M. Umbach and Leping Li (2013).
IUTA: a tool for effectively detecting differential isoform usage from RNA-Seq data,
in preparation.

See \url{http://mblab.wustl.edu/GTF22.html} for the details of Gene transfer format (GTF) and \url{http://samtools.sourceforge.net/SAMv1.pdf} for the details of Sequence Alignment/Map (SAM) format. The BAM format is the compressed binary version of SAM format.
}
\author{
Liang niu
}
\note{
\code{mc.cores.user} has to be set to 1 in Windows.	
}
\examples{
## set the paths for the BAM file and GTF file
## notice that the gtf file contains correct gene_id information
bam.list.1<-system.file("bamdata",paste("sample_",1:3,".bam",sep=""),
                       package="IUTA") 
bam.list.2<-system.file("bamdata",paste("sample_",4:6,".bam",sep=""),
                       package="IUTA")
transcript.info<-system.file("gtf","mm10_kg_sample_IUTA.gtf",
                             package="IUTA")

## run IUTA in Unix or MacOS (not for Windows!)
IUTA(bam.list.1,bam.list.2,transcript.info,output.dir=getwd(),
     FLD="normal",mean.FL.normal=250,sd.FL.normal=10,
     test.type=c("SKK","CQ","KY"))
## or run IUTA in Windows
IUTA(bam.list.1,bam.list.2,transcript.info,output.dir=getwd(),
     FLD="normal",mean.FL.normal=250,sd.FL.normal=10,
     test.type=c("SKK","CQ","KY"),mc.cores.user=1)


## check the results in file 
print(read.delim("estimates.txt",comment.char="#")[1:3,]) 
print(read.delim("p_values.txt")[1,])

## remove the output text files and BAM index files
file.remove(c("estimates.txt","p_values.txt"))
file.remove(system.file("bamdata",paste("sample_",1:6,".bam.bai",sep=""),
            package="IUTA"))
}
