Due to the nature of the array experiments which examine the expression of tens of thousands of genes (or probesets) simultaneously, the number of null hypotheses to be tested is large. Hence multiple testing correction is often necessary to control the number of false positives. However, multiple testing correction can lead to low statistical power in detecting genes that are truly differentially expressed. Filtering out non-informative genes allows for a reduction of the number of hypotheses, which potentially can reduce the impact of multiple testing corrections. While several filtering methods have been suggested, the best practice to filtering is still under debate. We propose a new filtering statistic for Affymetrix GeneChips, based on principal component analysis (PCA) on the probe-level gene expression data. Using a wholly defined spike-in dataset, we show that filtering by the proportion of variation accounted by the first principal component (PVAC) provides increased sensitivity in detecting truly differentially expressed genes while controlling the false discoveries. Further, a data-driven approach to guide the selection of the filtering threshold value is also proposed.
- Download the R package (1KB) and example data (5MB)
- Package Downloads for Bioconductor
- Report bugs, corrections and suggestions to email@example.com
- Install the bioconductor package 'affy'
# within R
- Install the "pvac" and "pvacExampleData" source packages
# on command line
R CMD INSTALL pvac
R CMD INSTALL pvacExampleData
- [Optional] install pbapply (from CRAN) for viewing progress bar
- # within R
- # pvacExampleData is a raw AffyBatch object myeset = rma(pvacExampleData)
- # eset summarized using RMA res = pvacFilter(pvacExampleData)
- # perform pvac filtering res$aset
- # names of probesets that have passed the filter myeset.filtered = myeset[res$aset,].
- # eset object after filtering
- # statistical tests within R ...,
- # or output a tab-delimited file of probeset level data
Lu J, Kerns RT, Peddada S, Bushel PR 2010 PCA-based filtering improves detection for Affymetrix gene expression arrays (in preparation)