ORIOGEN (O rder R estricted I nference
for O rdered G ene E xpressioN )
and multidimensional pairwise comparisons
Inputs Screen:
Total number of dose groups or time points
Vector of sample sizes per dose group or time point
Number of initial bootstrap samples
Number of final bootstrap samples
FDR level for performing the actual test
Analysis type:
For pattern analysis of ordered experimental groups (e.g. time course or dose reponse), select "Time Course/Dose Response Pattern Analysis". For performing pairwise comparisons among two or more groups controlling mdFDR, select "Multiple Pairwise Comparisons".
For more information on these methods, see the Help - About screen.
Input file name:
This field specifies the path and name of the file to be analyzed.
Users can directly enter the file path and name, or use the browse button for file selection.
The input matrix consists of as many rows as the number of variables (e.g. genes) with all fields being tab-delimited. Note that ORIOGEN does not normalize the input data.
The format of the input file should be as follows:
Header row (optional): Row 1 can be a header row (if present it is ignored) or data row.
Column 1: Contains the gene ID. The gene ID is an alpha-numeric character string used to identify the gene.
Column 2 (optional): May contain a gene
description string. If present, this string will be saved into the output file if
the gene gets selected and will appear on the popup window if the user clicks on this
gene in the Results graph.
All Remaining Columns: Contain tab-delimited
numeric gene expression data. Missing values in the input data should
be represented by a single period(i.e. ".").
For repeated measurements data, the order in which the subjects appear in the columns should
be same across all time points. For example, suppose we have 3 subjects (A, B, C) and 4 time
points then the columns containing the expression data should be arranged as follows:
A1 B1 C1 A2 B2 C2 A3 B3 C3 A4 B4 C4
where the numerals 1, 2, 3 and 4 represent the four time points.
Output file name (Fitted means/Raw means):
This field specifies the path and name of the file containing the genes selected by the ORIOGEN software.
Users can directly enter the file path and name, or use the browse button for file
selection.
The fields in the output file are tab-delimited.
The format of the file is as follows:
Column 1: Contains the counter number of the gene selected, starting at number
one.
Column 2: Contains the row number from the input file of the selected gene.
Column 3: Contains the gene ID.
Column 4: Contains the user provided gene description from the input file if
present, blank otherwise.
Column 5: Contains the profile number of the selected gene.
Column 6: Contains the computed P-value.
Column 7: Contains the computed Q-value.
Columns 8 and Higher: Contains the following:
NOTE: In addition to the output file specified above, ORIOGEN will also create two additional files. The first is a raw output file that contains the input data for the genes that were selected. This file will have the same name as the primary output file, with "(Raw)" appended to the end of the filename. The second additional file will have a similar structure to the output file, but will contain the rejected genes and their corresponding results. This file will have the same name as the primary output file, with "(RejectedGenes)" appended to the end of the filename.
Vector of sample sizes per dose/time point:
This field specifies the sample sizes that are associated with each dose/time point.
Sample sizes can be individually entered using the "Enter Sample Sizes" button, or as a string with values separated by commas.
The ORIOGEN software performs a check to ensure that the input data is correct with respect to dose/time points. For example, if a user specifies 4 dose/time points with each sample size being 4, then the input file must contain 16 data values for each gene (including a period "." for missing values). If the sum of the sample sizes for each dose/time point does not equal the number of data values, an error message is displayed.
Number of initial bootstrap samples:
This field specifies the number of initial bootstrap samples to be used in the analysis. This is the starting point for the adaptive bootstrapping routine. The program will evaluate each gene using this number of bootstraps and will accept or reject some of them based on a confidence interval around the calculated p-value. Genes that are not accepted or rejected are evaluated again after the number of bootstraps has been doubled.
Number of final bootstrap samples (only for time course/dose response pattern analysis):
This field specifies the number of final bootstrap samples to be used in the analysis. As described above, the number of bootstraps is doubled on each round of the adaptive routine to evaluate genes that have not yet been accepted or rejected. When doubling the number of bootstraps will cause it to exceed the final number specified, each undecided gene is accepted or rejected based on its latest calculated p-value.
Nominal FDR:
This field specifies the false discovery rate level to be used for selecting significant genes.
Bootstrap random seed:
Automatic random seed: The bootstrap is implemented using a random seed generated by the program. Manual random seed: The bootstrap is implemented using a seed provided by the user. This will be the preferred choice for reproducibility of the results at a later time point.
Are the data log transformed?:
Select the appropriate option from the list.
Longitudinal sampling:
If this option is checked, ORIOGEN will assume that the data were derived from a longitudinal study (e.g. repeated measures). The data will be analyzed using the methodology described in Peddada et al. (2010). For input format click here.
Transpose Input File:
Select this option if the rows and column are transposed in the input data file.