PHMM for detecting shortened  3' UTR due to alternative polyadenylation

Jun Lu and Pierre R. Bushel. Dynamic expression of 3 UTRs revealed by Poisson hidden Markov modeling of RNA-Seq: implications in gene expression profiling.  Gene 2013 

Report bugs, corrections and suggestions to Pierre Bushel at bushel@niehs.nih.gov (http://www.niehs.nih.gov/research/atniehs/labs/bb/staff/bushel/index.cfm) 
and Jun Lu (jlu276@gmail.com)


Requirements:
R v2.15.0 or higher
- depmixS4  (from CRAN)
Packages to install (using biocLite works best)
  - GenomicFeatures, Rsamtools , GenomicRanges (from Bioconductor)
Sorted bam file (and accompanying index file) with alignment of reads to a reference genome	

Steps (in the following order) to run PHMM:
# build transcript database, e.g. RefSeq
  Command: R CMD BATCH "--args refGene txdb.mm9.refGene.sqlite mm9" buildTranscriptDB_fixed.R &
  output :  txdb.mm9.refGene.sqlite

# select transcripts with 3'utr with length > 600bp; remove duplicates
  Command: R CMD BATCH "--args txdb.mm9.refGene.sqlite 19 long3utr.txt" apa3utr_fixed.R &
  output:  long3utr.txt

# Given a bam file, compute read tag counts in sliding windows.  Supply # a sorted bam file conatining aligned reads
  Command: R CMD BATCH "--args long3utr.txt 19 myfile.sorted.bam" apaCount_fixed.R &
  output : myfile.sorted.cts.rda [replace suffix .bam with .cts.rda]

# fit poisson HMM
  Commands: R CMD BATCH "--args long3utr.txt myfile.sorted.cts.rda res.myfile.csv" poissonHMM_fixed.R &
  res.myfile.csv > log 2>&1  &
  output:  res.myfile.csv

