GeneSpring > User Guides >Floating/ Shared Installation

Introduction

GeneSpring provides powerful, accessible statistical tools for intuitive data analysis and visualization. Designed specifically for the needs of biologists, GeneSpring .. "
"offers an interactive environment that promotes investigation and enables understanding of Transcriptomics, Metabolomics, Proteomics and NGS data within a biological context. Regarded as the gold standard in expression analysis, GeneSpring allows you to quickly and reliably identify targets of interest that are both statistically and biologically meaningful. GeneSpring has over 20000 references in Google Scholar, including over 2000 in peer reviewed publications. GeneSpring is an expanding suite of integrated software applications for systems-level research, handling genomic, transcriptomic, proteomic and metabolomic data in one unified application.

Multi-omic analysis with Agilent's GeneSpring Bioinformatics Suite

A key component of systems biology research involves producing heterogeneous data that measure various biological entities and events such as mRNA and microRNA expression, exon splicing, DNA structural variation, proteins and metabolites. GeneSpring 13, a part of Agilent's GeneSpring Analysis Platform, allows researchers to perform integrated analysis of such heterogeneous data, enabling them to identify linkages and data concordance that contribute to a more comprehensive understanding of the underlying mechanism of a condition. New in GeneSpring 13, metadata analysis and visualization tools will allow researchers to analyze phenotypic parameters such as administrative or physiological attributes of the subjects alongside their gene or metabolite expression profiles. Complementing more traditional bioinformatics techniques, correlation tools and visualizations will allow investigators to identify co-regulated genes, metabolites, and proteins in an intuitive and easy-to-use manner. The use of prior information is also critical in designing follow-up experiments. GeneSpring includes the ability to design next-phase experiments from pathway information, enabling hypothesis-driven experimental design by incorporating prior biological knowledge from multiple measurement technologies.

Platform for integrated data analysis and biological contextualization

GeneSpring addresses the challenges in multi-omic data analysis by providing comprehensive analytical and visualization tools for multiple data types. Heterogeneous data such as gene expression, miRNA, exon splicing, genomic copy number, genotyping, proteins, and metabolites, can be combined into one project, allowing researchers to analyze, compare, and view results in a single user interface (Figure 3)

Correlation Framework

Correlation analysis allows identification of co-regulated molecules such as gene and metabolite as well as to identify relationships between the samples in a study. Introduced in GeneSpring 13, the correlation framework is supported on most of the datasets generated using high throughput omic platforms such as Microarray, Mass Spectrometry and Next Generation Sequencing. The framework supports pair-wise correlations measured using a single technology platform and cross-technology measurements between two different platforms.

Entity-Entity Correlation

Correlation analysis is performed in a pair-wise manner to identify and visualize dependency between abundance levels of any pair of biological entities. An ‘entity’ in GeneSpring is a gene, metabolite, protein or a probe in an expression array.

The option to perform correlation analysis is included in the Workflow Browser of the experiment. Correlation analysis can be performed on entities within a single experiment or across two different experiments. For cross-experiment correlation analysis a Multi-Omic Analysis (MOA) experiment should be created in GeneSpring using the two experiments whose entities would be selected for correlation. Table 1 summarizes the types of GeneSpring experiments which support correlation analysis.


Figure 3: Heatmap with correlation coefficients between entities of a single experiment.


Figure 4: Heatmap with pair-wise correlation calculation between entities of two different experiments.

Sample-Sample Correlation

In addition to entity-entity correlation, the GeneSpring correlation framework supports pair-wise correlation analysis between biological samples within a given experiment. Sample correlation allows identification of the condition-wise relationships which may exist between the samples in a study.


Figure 5: Sample-sample correlation heatmap for a metabolomics study. Clustering on correlation coefficients clearly demonstrates that samples group based on their pH values rather than the infection status (NRBC- Non-infected RBCs; IRBC-Infected RBCs).

Multi-omic Pathway Analysis

Viewing results in the context of pathways and interaction networks can facilitate insights into the underlying biology. GeneSpring 13 allows researchers to import and view pathways in the KEGG, Wikipathways (GPML format), BioCyc and BioPAX exchange format. These pathways can be used in single experiment analysis and Multi-omic analysis pathway tool, where researchers could quickly determine if there is an enrichment of the entities of interest in any pathways.


Figure 6: Pathway and network diagrams help place statistical results in a biological context. Direct navigation between biological pathways and their associated genes provides systems-level insight.

Pathway Analysis and Visualization Capabilities:

  • Identification of significant pathways from multi-omic data
  • Canonical biochemical, metabolomic, & signaling pathways from KEGG, Wikipathways and BioCyc
  • Curated pathway rendering
  • Intuitive data overlay
  • Support for GPML / OWL pathway import
  • Custom pathway creation
  • Pathway browsing, searching and navigation
  • Automatic translation between annotation types, pathways, and organisms

NLP Network Analysis

Genes and proteins interact in a biochemical network to orchestrate the biological processes involved in a condition. GeneSpring 13 provides modeling capabilities to allow researchers to quickly generate and dynamically explore these networks. Using a set of algorithms and provided organism-specific interaction databases, researchers can build a range of network types, including targets and regulators, transcription regulators, biological processes, and shortest connect. Natural language processing-based (NLP) algorithms can be applied to a body of text, HTML, a PDF, or Medline XML to extract and add interactions to an existing interaction database.


Figure 7: NLP (Natural Language Processing) Discovery Network

In addition to providing built-in pathway and network analysis tools, GeneSpring 13 extends biological contextualization capabilities through its integration with Ingenuity Pathway Analysis (IPA), Metacore and CytoScape, where lists of genes and experimental data can be seamlessly transferred between the two applications for iterative analysis.

Meta-data framework

Clustering analysis is an efficient way to group the samples and conditions in a dataset into subsets based on the similarity of their abundance profiles. Sample clustering has been broadly used for inferring condition subtypes and for subject stratification. Used in this context, the hierarchical clustering is a very important analysis tool for revealing the molecular mechanism underlying the biological function. New in GeneSpring/MPP 13, the metadata analysis framework allows researchers to visualize the abundance profiles of samples alongside metadata such as administrative, physiological, or technology related information. The metadata visualization framework allows researchers to reveal tacit dependencies between characteristics of the subjects or samples and their gene, metabolite, or protein expression profiles.


Figure 8. Hierarchical clustering of 220 samples from TCGA with the visual alignment of metadata. (Labels in panels E and F: LPS – Lymphnode pathologic spread, LE – Lymphnode examined, HM-Hypermutated, MSI – Microsatellite Instability Status, ES – Expression subtype, MS – Methylation subtype).


Figure 9. HER4 shows up-regulation in subjects with lesion regression (Sinn score 3 and 4). Normalized expression values for HER4 (probe A_32_P183765) are shown before and after treatment.

Downstream analysis of processed NGS data/ Variant analysis using vcf files

GeneSpring GX provides support for downstream analysis of processed NGS data with vcf files. The Variant Analysis workflow in GeneSpring allows users to import results in vcf format for visualization, biological contextualization of identified variants, and supports multi-omic integrated biology studies. Some of the key features include:

  • region list operations to filter and annotate the imported regions,
  • translation of regions to genes,
  • identification of genic regions,
  • visualization in Genome Browser,
  • and single or multi-omic Pathway Analysis.



Figure 18: Suggested order of steps for downstream analysis of processed NGS data in GeneSpring GX

Transcriptomic analysis

GeneSpring 13 provides flexible and comprehensive workflows for a variety of transcriptomic applications. A broad spectrum of data pre-processing, linear and non-linear normalization methods is available for both one- and two-color gene expression data. Depending on the researcher's level of expertise, data analysis can be performed using either the Guided Workflow or Advanced Analysis mode. Quality control can be performed using platform-specific metrics, enabling researchers to optimize pre-processing steps before statistical analysis.


Figure 10. Intuitive GUI allows users to toggle between windows.

Other key features for Transcriptomics applications include:

  • Probe-or gene-level expression analysis on all major microarray platforms, including Agilent, Affymetrix, and Illumina
  • microRNA analysis and identification of gene targets using integrated TargetScan information
  • The ability to do correlative analysis on mRNA expression and miRNA data, (or splicing, QPCR, copy number, etc.)
  • Exon splicing analysis using t-tests or multivariate splicing ANOVA and filtering for transcripts on splicing index
  • Visualization splicing analysis
  • Real-time PCR QC and data analysis
  • NCBI Gene Expression Omnibus Importer tool for expression datasets

CGH analysis

GeneSpring supports the investigation of genomic variations and their implications in disease susceptibility and progression. GeneSpring enables scientists to look at regions of interest discovered, genes overlapping those aberrations can be identified and the impact of copy number changes can be assessed in downstream GO Analysis or pathway analysis. GeneSpring’s clustering, correlation, biological contextualization analysis can be compared to results from other omic data.

Features of this workflow include:
  • Creation of experiment with reports from CytoGenomics and Agilent genomic workbench software
  • Translate regions to genes
  • Find common aberrations
  • Genome browser visualization
  • Single and multi omic pathway analysis

Genomic copy number analysis

GeneSpring 13 also supports the interrogation of genomic structural variations and their implications in disorder susceptibility and progression. Providing workflows for paired and unpaired analysis of Affymetrix and Illumina genotyping array data, GeneSpring 13 enables scientists to quickly detect regions of genomic copy number variation (CNV) and loss of heterozygosity (LOH). Once regions of interest are discovered, genes overlapping those regions can be identified and the biological impact of copy number variation can be assessed in downstream GO Analysis or pathway analysis. Features of the genomic copy number workflow include:

  • Ability to create and use a custom reference in addition to packaged HapMap reference
  • Batch effect correction method
  • Circular Binary Segmentation
  • Filters to identify copy-neutral LOH events and regions of allelic imbalance
  • Abilty to identify common variations across a set of samples

Genome-wide association analysis

Genome-wide association studies (GWAS) utilize high-density genotyping microarrays to identify SNPs associated with qualitative or quantitative traits. GeneSpring 13 provides a comprehensive workflow for case-control GWAS using Affymetrix and Illumina genotyping microarray platforms. The flexible workflow supports case-control experimental designs, offering a suite of statistical tests applied under various genetic models, multiple testing correction, and correction methods for population stratification. After identifying genes harboring SNPs or haplotype blocks associated with trait, researchers can perform GO Analysis and pathway analysis to determine what biological process and pathways may be involved in the condition under study. Other key features for the genetic association workflow include:

  • EIGENSTRAT and Genomic Control population stratification correction
  • Tag SNP identification
  • Haplotype inference and Haplotype Trend Regression
  • Pearson's Chi-Square, Fisher's Exact, Cochran Armitage, and Chi-Square correlation
  • Logistic and linear regression
  • LD plot

Statistical tools for testing differential expression

GeneSpring 13 provides a broad choice of tools to identify unique patterns in data. Clustering algorithms can be employed to group entities and/or samples based on the similarity of their expression profiles, revealing information regarding the biological function or the co-regulation of genes. GeneSpring 13 also provides robust classification algorithms that use training datasets to find predictive expression patterns. By offering multiple classifiers including Decision Tree, Support Vector Machine, Naive Bayesian, Neural Network, and Partial least squares discriminate, GeneSpring 13 enables biomarker discovery for a variety of experimental designs.

Extensible functionality with Jython and R

GeneSpring 13 provides scientists the ability to write, execute, and save their own scripts to combine operations in GeneSpring 13 with a more general Jython (Python with Java class import capabilities) programming framework. Users can develop their own data transformation operations, automatically pull up data views, and run external algorithms within GeneSpring 13. An embedded R scripting editor will also allow R scripts to be written and run from within GeneSpring 13. Any R function can be given access to GeneSpring 13 data, with results being automatically incorporated back into the GeneSpring 13 environment.

GeneSpring R integration:

GeneSpring R package for:
  • Reading and downloading GeneSpring datasets into R
  • Creating new GeneSpring experiments from R
  • Accessing Samples, Attributes, and Sample Attachments
  • Accessing Experiment Grouping Information
  • Accessing and Creating GeneSpring Entity Lists
  • Searching on Projects, Experiments, Entity Lists, Technologies, and Samples

Report Generation Capability

In GeneSpring, you can export plot views and pathways including matched pathway lists as a Report. A report can be downloaded as a .pdf file or saved locally or on the cloud as a .gsreport file. Reports can be created to include:

  • Legend
  • Annotation columns to include experiment data
  • Customized paper size
  • Customized page orientation
  • Customized page margins

Multiple reports can me merged to create a single report. For a report, the maximum number of pages to be displayed in GeneSpring can be modified from Tools -> Options -> Miscellaneous -> Report. If a report contains more than the specified number of pages, you can be download it as a .pdf to the local system.


Figure 11. Report Generation Feature – it enables users to create pdf reports on the fly containing text and images.

Intuitive graphical displays

GeneSpring 13 displays data in ways that help users conceptualize and convey the information in their data. The various types of plots, graphs and diagrams highlight different aspects of the data, allowing visual information to be extracted in multiple ways. Virtually any graphical image can be exported as HTML or as image in .tiff, .jpeg, .png or .bmp format compatible with publishing software applications. An interactive and powerful Genome Browser is a key component to visualizing genomic structural data and integrating results from multi-omic studies (Figure 8). Results from different experiments can be dragged and dropped into individual data tracks and viewed simultaneously. The image overlay feature permits the user to overlay any data or annotation tracks, thus allowing researchers to qualitatively assess correlation between different data types. Other visualization tools in GeneSpring 13 include scatter plot, MvA, profile plot, heatmap, histogram, and many more.

Quickly and easily discover differences between sample groups

  • 3D PCA Loading Plot
  • 4-way Venn Diagrams
  • Fold change and p-value cut-off sliders in visualization windows
  • Annotation Text Searching in the Genome Browser
  • Plot changing patterns of compound abundances over time
  • Develop useful multivariate models for class prediction

    Figure 16. The interactive Genome Browser allows users to visually integrate heterogeneous data by simultaneously importing data tracks from multiple experiments and permitting overlay of data and annotation tracks

Built-in ID browser automates database and spectral library searches in MPP

Converting entities into actual chemical compounds using Mass Profiler Professional is made easy using comparative techniques against public or private databases. The software automatically annotates the entity list and projects the compound names onto any of the various visualization and pathway analysis tools. Mass Profiler Professional includes an integrated ID browser that mirrors MassHunter's qualitative analysis functionality to allow identification using:

  • LC/MS Personal Compound Database (METLIN, pesticides)
  • GC/MS libraries (NIST and Fiehn library)
  • Empirical Formula Calculation using Agilent's Molecular Formula Generator (MFG) algorithm

Support for NGS data visualization

NGS data can be aligned and analyzed in Strand NGS (GeneSpring’s companion product/NGS solution). The processed data from Strand NGS can be easily imported in GeneSpring 13 to perform multi-omic correlation analysis, multi-omic pathway analysis. Analyzed data for SureSelect RNA-seq, SureSelect DNA-seq, SureSelect Methyl-seq, small RNA-seq and ChIP-seq data can be visualized in GeneSpring 13. The imported reads, region and entities can be visualized in the Genome Browser.

Sign up for Strand NGS Free Trial