Frequently Asked Questions
Categories
Agilent single Color: Percentile Shift
Agilent Two Color: No Normalization in Genespring
Affymetrix Expression: RMA (Summarization method)
Affymetrix Exon Expression: RMA-16 (summarization method)
Illumina Single Color: Percentile Shift
Agilent miRNA: Percentile Shift
RMA is chosen as the default summarization in the guided work flow because it is more popularly used as the default option, by the micro array community.
RMA considers only Perfect Matches and hence uses positive signal intensities for probe level normalization.
By not considering mismatches it reduces the noise.
The 75th percentile is a more robust intensity value to normalize the data. With any tissue that you are analyzing, there are a certain percentage of genes that are not expressed. Even if a gene is not expressed, a probe matching to that gene will still report an intensity value. These intensity values will most likely fall into the lower percentile ranking on the array. These values are considered noise and are less reliable. So, taking an intensity value of a higher percentile like 75th is taking a median of only the probes with reliable intensity values (taking median of only genes that are expressed).
The following are the order of steps in GeneSpring for percentile shift normalization:
1.Transforms the signal values to the log base
2.Arranges the log transformed signal values in increasing order.
3.Computes the rank of the required percentile (Pth percentile).
4.Now if the rank is an integer, then the Pth percentile would be the number with rank R.
In another scenario, when the rank is not an integer then the tool calculates the value using certain steps. Once the value corresponding to the Pth percentile is obtained , this value is subtracted from the corresponding log transformed signal values. This would give the normalized intensity value.
Normalization could be performed only while creating an experiment. If you would like to change the normalization settings, you would have to create a new experiment.
The IterPLIER differs from PLIER in the aspect that it does not use all the probes for summarization. It selects only the good probes and iteratively discards the bad probes.
RMA16 summarization algorithm is referred to as the addition of value16 to the expression values. This is done to attain variance stabilization.
In order to establish a hierarchy of gene confidence levels, the sources of input transcript annotations are partitioned into three types. From the highest to the lowest confidence, the types are labeled as Core, Extended, and Full.
Core: Core list comprises 17,800 transcript clusters from RefSeq and full-length GenBank mRNAs.,
Extended: The Extended list comprises 129k transcript clusters including cDNA transcripts, syntenic rat and mouse mRNA, and Ensembl, microRNA, Mitomap, Vegagene and VegaPseudogene annotations.
Full: The full list comprises 262k transcript clusters including ab-initio predictions from Geneid, Genscan, GENSCAN Suboptimal, Exoniphy, RNAgene, SgpGene and TWINSCAN.
As part of the pre-processing step of experiment creation, thresholding is performed, due to which the raw values less than ‘1’ are threshold to 1. This is the default setup in GeneSpring, but the user has the choice to threshold the raw values to any value.
The option to transform the values to ‘0.01’ is unavailable in current GeneSpring version.
The threshold values cannot be specified as the value less than 1 in the standard experiment but, can be changed in a Generic experiment. The user could create the custom technology followed by Generic experiment creation to be able to change the threshold value.
While creating an experiment for two color data, GeneSpring allows you to load the data files and select the dye swapped arrays.
Ratio computation for two color data is done as follows:
Samples without dye swap:
Cy5(test) / Cy3(control)
Samples with dye swap:
Cy3(test) / Cy5(control)
If you have multiple endogenous controls, their 'Ct values' are averaged (arithmetic). That value is then subtracted from target Ct values for normalization.
The term raw signal values used in the context of Agilent Two Color data refers to the linear data after thresholding and summarization for the individual channels (cy3 and cy5).
Summarization is performed by computing the geometric mean.
The term Normalized signal value refers to the data after ratio computation, log transformation and Baseline Transformation.
In GeneSpring, the sequence of events involved in the processing of the Agilent two color text data files is: Thresholding → Summarization → dye swap → ratio computation → log transformation → Baseline Transformation.
In GeneSpring, the intensity values post the pre-processing steps are displayed in the log base 2 values. These values are further used for the analysis.
For Exon arrays, the background subtraction is done on a pool of probes having similar GC content (which is not the case with expression arrays). This typically results in probe sets having small expression values leading to an unreliable estimate of the variance. To offset this, an adhoc value of 16 is added to the expression values of all the probe sets.
The reason for adding 16 is that it is generally considered a low enough number that it will provide the required stabilization effect without changing suppressing true signal values. 8 and 32 are other options that are commonly used. usually values smaller than 16 are due to noise and you could have values 8 and 16 causing a fold change of 2, purely due to noise.
Statistical analysis results would not change based on the Baseline transformation to median of all samples as the actual deviation between the conditions for the particular entity would not change and therefore there would not be any change in the P-values across all the experiments based on the baseline transformation. Baseline transformation provides the user better visualization when comparing the relation between two groups without affecting the downstream analysis.
Please follow the steps to disable and to select the other normalization methods.
1. Disable the "Perform Quantile Normalization" option under ToolsOptionsAffymetrix Exon Summarization AlgorithmsExon PLIER/Iter PLIERUn-Check 'Perform Quantile Normalization'.
2. Create the Exon Expression experiment in GeneSpring.
3. After getting the data in, export 'All Entities' from the right clickExport entity list option.
4. Import it back in as a Generic Experiment. (i.e. create a custom technology using the exported data) Please
Note: when you are importing data back into GX11, it is already in log scale, so while creating the generic experiment you should explicitly select the option "Please select if your data is in log scale" so that log transformation is not performed on the data again.
5.Once you have your data as a custom experiment, you can perform any of the normalization methods available for Generic single color data.
Thresholding the data to 1 is convert the values which are less than 1 to 1. This is done because a values less than 1 would give large negative value after log transformation.
Now, any entity with the value 1 after log transformation would give a value of 0 in GeneSpring. However, when we threshold the value to 0, it would not give any value after log tranformation and hence empty boxes would be observed.
So, if would like to filter entities with missing value , you could threshold to 0 and then, go to Utilities and select “Remove entities with missing value”.
Please follow the steps below to import the pre-normalized data:
Create a custom technology with the data file from:
GeneSpring Menu Bar → Annotations → Create Technology → Custom From File
Once the technology is created, please create a new experiment with those files to import into GeneSPring.
While creating an experiment, please choose the Experiment type as 'Generic Single color'.
In step 2 of 4, please check the option 'Please select if your data is in log scale' and the Normalization algorithm as 'None'.
In step 4 of 4, choose the option 'Do not perform baseline transformation'.
Now, the experiment is created with the pre-normalized data.
Resources
- Learning Hub
- Demo Data
- FAQs
- Scripts
- User Guides
- File Based Updates
- BioCyc Pathways
- BridgeDb
- NGS Annotations
- Pathway Interaction DB
- Technology