ORIOGEN (O rder R estricted I nference for O rdered G ene E xpressioN )
Total number of dose
groups or time points
Vector of sample sizes
per dose group or time point
Number of
initial bootstrap samples
Number of final
bootstrap samples
FDR level for performing the actual
test
Percentile to use for s0
adjustment
This field specifies the path and name of the file to be
analyzed.
Users can directly enter the file path and name, or use the
browse button for file selection.
All fields in the input file must be tab-delimited.
ORIOGEN does not normalize the input data. It is
recommended that the user pre-process the data by applying a suitable
normalization method before submitting the data to ORIOGEN. ORIOGEN selects and
clusters genes on the basis of the mean of the expression values provided to
ORIOGEN.
The format of the input file should be as follows:
Header row
(optional): Row 1 can be a header row (if
present it is ignored) or data row.
Column 1: Contains the gene ID. The gene ID is an alpha-numeric
character string used to identify the gene and can also be used as the key when
performing a gene look-up on selected genes. The format of the gene ID used for
gene look-up can be any of the following:
Probe ID (example: A_42_P453131)
Systemic name (example: 216994_Rn)
GenBank Accession Number (example:
BE109018)
UniGene ID (example: Rn.19577)
Column 2
(optional): May contain a gene description
string. If present, this string will be saved into the output file if the gene
gets selected and will appear on the popup window if the user clicks on this
gene in the Results graph.
All Remaining
Columns: Contain tab-delimited numeric
gene expression data. Missing values in the input data should be represented by
a single period(i.e. ".").
This field specifies the
path and name of the file containing the genes selected by the ORIOGEN
software.
Users can directly enter
the file path and name, or use the browse button for file selection.
The fields in the output file are tab-delimited.
The format of the file is as follows:
Column 1: Contains the
counter number of the gene selected, starting at number one.
Column 2: Contains the
row number from the input file of the selected gene.
Column 3: Contains the
gene ID.
Column 4: Contains the
user provided gene description from the input file if present, blank otherwise.
Column 5: Contains the
profile number of the selected gene.
Column 6: Contains the
computed P-value.
Column 7: Contains the
computed Q-value.
Columns 8 and Higher: Contains
the following:
Last 3 Columns: The
last three columns of the output file may contain the following fields depending
on the results of the ontology look-up procedure. If the ontology file is not
specified, or the look-up procedure finds no data, these fields will not be
present.
NOTE: In addition to the output file specified
above, ORIOGEN will also create two additional files. The first is a raw output
file that contains the input data for the genes that were selected. This file will
have the same name as the primary output file, with "(Raw)" appended
to the end of the filename. The second additional file will have a similar
structure to the output file, but will contain the rejected genes and their
corresponding results. This file will have the same name as the primary output
file, with "(RejectedGenes)" appended to
the end of the filename.
This field specifies the path and name of the file
used for the ontology look-up for a selected gene. Ontology files are available
for download from ftp://ftp.tigr.org/pub/data/tgi/Resourcerer/.
ORIOGEN uses the following procedure to find the
ontology data for a particular gene:
This field specifies the
total number of dose groups or time points present in the input file. The
maximum number of dose groups or time points supported in this release is 30.
This field specifies the sample sizes that are associated
with each dose/time point.
Sample sizes can be
individually entered using the "Enter Sample Sizes" button, or as a
string with values separated by commas.
The ORIOGEN software
performs a check to ensure that the input data is correct with respect to
dose/time points. For example, if a user specifies 4 dose/time points with each
sample size being 4, then the input file must contain 16 data values for each gene
(including a period "." for missing values). If the sum of the sample
sizes for each dose/time point does not equal the number of data values, an
error message is displayed.
This field specifies the
number of initial bootstrap samples to be used in the analysis. This is the
starting point for the adaptive bootstrapping routine. The program will
evaluate each gene using this number of bootstraps and will accept or reject
some of them based on a confidence interval around the calculated p-value.
Genes that are not accepted or rejected are evaluated again after the number of
bootstraps has been doubled.
This field specifies the
number of final bootstrap samples to be used in the analysis. As described
above, the number of bootstraps is doubled on each round of the adaptive
routine to evaluate genes that have not yet been accepted or rejected. When
doubling the number of bootstraps will cause it to exceed the final number
specified, each undecided gene is accepted or rejected based on its latest
calculated p-value.
This field specifies the
false discovery rate level to be used for selecting significant genes. Genes
that meet this criteria are selected and written to
the output files specified above.
This field specifies the
percentile to use for the s0 adjustment used in the calculation of the
goodness-of-fit statistic. A value of 0 disables this adjustment. Higher values
reduce the effects of low sample variances in the data. Recommended values are
0 for sampling with replacement and 10 for longitudinal sampling.
Automatic random seed
uses a constantly changing seed value for the random number generator. Manual
random seed uses a user provided seed value for the random number generator.
If this option is
checked, ORIOGEN will take the log of all signal values in the Input file
before calculating means or performing any processing.
If this option is
checked, ORIOGEN will assume that the data was derived from a longitudinal
sampling (repeated measures) study and will adjust the statistic accordingly.
For example, for a time-course study with 3 replicates, the data file should be
set up so that each time point is showing data from the same list of subjects.
See the "About ORIOGEN" screen for a detailed description of how
ORIOGEN handles longitudinal sampling.
If this option is
checked, ORIOGEN will assume that the input file is arranged with each gene
represented by a single column and each replicate represented by a single row.