Skip Navigation
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Your Environment. Your Health.

PCA-based Gene Filtering

Overview

Due to the nature of the array experiments which examine the expression of tens of thousands of genes (or probesets) simultaneously, the number of null hypotheses to be tested is large. Hence multiple testing correction is often necessary to control the number of false positives. However, multiple testing correction can lead to low statistical power in detecting genes that are truly differentially expressed. Filtering out non-informative genes allows for a reduction of the number of hypotheses, which potentially can reduce the impact of multiple testing corrections. While several filtering methods have been suggested, the best practice to filtering is still under debate. We propose a new filtering statistic for Affymetrix GeneChips, based on principal component analysis (PCA) on the probe-level gene expression data. Using a wholly defined spike-in dataset, we show that filtering by the proportion of variation accounted by the first principal component (PVAC) provides increased sensitivity in detecting truly differentially expressed genes while controlling the false discoveries. Further, a data-driven approach to guide the selection of the filtering threshold value is also proposed.

Downloads

Instructions

Installation

  • Install the bioconductor package 'affy'
    # within R
    source("http://bioconductor.org/biocLite.R")
    biocLite("affy")
  • Install the "pvac" and "pvacExampleData" source packages
    # on command line
    R CMD INSTALL pvac
    R CMD INSTALL pvacExampleData
  • [Optional] install pbapply (from CRAN) for viewing progress bar

Example

  • # within R
    library(affy)
    library(pvac)
    library(pvacExampleData)
    data(pvacExampleData)
  • # pvacExampleData is a raw AffyBatch object myeset = rma(pvacExampleData)
  • # eset summarized using RMA res = pvacFilter(pvacExampleData)
  • # perform pvac filtering res$aset
  • # names of probesets that have passed the filter myeset.filtered = myeset[res$aset,].
  • # eset object after filtering
  • # statistical tests within R ...,
  • # or output a tab-delimited file of probeset level data
  • write.exprs(myeset.filtered,'out.txt')

References

Lu J, Kerns RT, Peddada S, Bushel PR 2010 PCA-based filtering improves detection for Affymetrix gene expression arrays (in preparation)

Contact

Pierre R. Bushel, Ph.D.
Tel 919-618-1945
bushel@niehs.nih.gov
Back
to Top