Gene chip analysis


Microarray is a powerful tool for genome analysis. It gives the global view of the genome analysis in a single experiment. Data analysis in the Microarray is a vital part as this part influences the final result. Each microarray study comprises multiple microarray experiments, each microarray study would give tens of thousands of data points. Since the volume of data growing exponential, the analysis becomes a challenging task. In general the greater the volume of data, the more chances arise for erroneous results. Handling such large volumes of data requires high end computational infrastructures and programs that can handle multiple data formats. There are already programs available for microarray data analysis on various platforms. But due to rapid development, diversity in microarray technology, and different data formats, there is always the need for comprehensive and complete microarray data analysis

Data analysis

Data analysis is the critical part of the whole analysis, since any error introduced in the data analysis part will result in biologically insignificant results. In data analysis, the information from the raw data file is further processed to yield meaningful biological results.This part includes data normalization, Flagging of the data, Averaging the ratio for replicates, Clustering of similarly expressed genes, etc. Each replicate data has to undergo normalization before further analysis. Normalization removes the non-biological variation between the samples. After the normalization, the ratio is calculated for each gene in the replicate. Based on the ratio, differentially regulated genes are determined. There are various statistical analyses which are also done for confidence analysis. Each replicate data is also examined for various experimental artifacts, bias by computing parameters related to intensity, background, flags, spot details, etc.


It is important to note the necessity in conducting Microarray experiments in replicates. Like any other quantitative measurements, repeated experiments provide the ability to conduct confidence analysis and identify differentially expressed genes at a given level of confidence. More replicates provide more confidence in determining differentially expressed genes. In practice, three to five replicates would be an ideal.


Normalization is required to standardize data and focus on biologically relevant changes. There are many sources of systematic variation in Microarray experiments that affect the measured gene expression levels such as Dye bias, Heat and light sensitivity, Efficiency of dye incorporation, Difference in the labeled cDNA Hybridization conditions, Scanning conditions, and Unequal quantities of starting RNA etc. Normalization is important step to Adjust data set for technical variation and removing relative abundance of gene expression profiles, this is only point where 1 and 2 color data analysis vary. The normalization method depends on the data. The basic idea behind all the normalization methods is that the expected mean intensity ratio between the two channels is one. If the observed mean intensity ratio deviates from one, the data is mathematically processed in such a way that the final observed mean intensity ratio becomes one. When the mean intensity ratio is adjusted to one, the distribution of the gene expression is centered so that genuine differentials can be identified

Quality control

Before doing analysis the biological variation must perform QC steps to determine if the data is fit for statistical test. Statistical tests are very sensitive to the nature of the input data.

Filtering of flag

Filtering on bad intensity spot is an important process of quality control For example; there is a certain limit of the scanner below which the intensity values cannot be trusted anymore. Typically, the lowest intensity value of the reliable data is about 100–200 for Affymetrix data and 100–1000 for cDNA Microarray data. These cut-offs are likely to change, as the scanners get more precise. The values below the cut-off point are usually removed (filtered) from the data, because they are likely to be artifacts.

Filtering of noise replicate

Filtering the noise replicate is one of the crucial parts in quality control. Experimental replicate should behave in similarly pattern. The replicates with noise should be eliminated before analysis .the noise replicate can be removed ANOVA statistical method

Filtering of non-significant gene

Filtering of non significant is done to reduce the number of genes so that analysis could be done on selected genes. Nonsignificant genes were removed by specifying relative fold changewith respect to normal control. For over expressed and underexpressed genes values were given 2 and −2. As a result of the filtration few genes where retained. the remaining gene are then subjected to statistical analysis.

Statistical analysis

Statistical analysis plays a vital role in identifying the gene which is statistically significant expressed.


Clustering is a data mining technique used to group the genes, which have similar expression patterns. Hierarchical clustering, k-mean clustering are widely used technique in microarray analysis.

Hierarchical clustering

Hierarchical clustering is a statistical method for finding relatively homogeneous Clusters. Hierarchical clustering consists of two separate phases. Initially, a distance matrix containing all the pair wise distances between the genes is calculated. Pearson’s correlation or Spearman’s correlation are often used as dissimilarity estimates, but other methods, like Manhattan distance or Euclidian distance can also be applied. If the genes on a single chip need to be clustered, the Euclidian distance is the correct choice, since at least two chips are needed for calculation of any correlation measures.After calculation of the initial distance matrix, the hierarchical clustering algorithm Either iteratively joins the two closest clusters starting from single clusters (Agglomerative, bottom-up approach) or iteratively partitions clusters starting from the complete set (divisive, top-down approach). After each step, a new distance matrix between the newly formed clusters and the other clusters is recalculated. If there are N cases, Hierarchical cluster analysis including:

• Single linkage (minimum method, nearest neighbor)• Complete linkage (maximum method, furthest neighbor)• Average Linkage (UPGMA).

K-mean clustering

K-mean clustering is an algorithm to classify or to group genes based on pattern into "K" number of group. "K" is positive integer number. The grouping is done by minimizing the sum of squares of distances between data and the corresponding cluster centroid. Thus the purpose of K-mean clustering is to classify the data based on similar expression. (

Gene ontology study

Ontology study gives the biologically meaning full information like cellular location, molecular function and biological function about the gene which are differentially regulated in disease or drug treatment condition with respect to normal contol.

Pathway analysis

Pathway analysis gives the specific information about the pathway being affected in disease condition with reference to normal control. This pathway analysis also allows to identify the gene network and the genes how it regulated.


"T.Hema Thanka Christlet,S.S.J.Shiek Fareeth Ahmed,A.Ahameethunisa,Janani Kannan. Dept of Biotechnology,SRM University"

Wikimedia Foundation. 2010.

Look at other dictionaries:

  • Chip de ADN — Ejemplo de un chip de ADN con 40000 sondas. Sinónimos Microarray de ADN DNA microarray Microarreglo de ADN Oligonucleótido array Micromatriz de ADN Gene chip ADN array …   Wikipedia Español

  • ChIP-on-chip — Workflow overview of a ChIP on chip experiment. Contents …   Wikipedia

  • Chip-Sequencing — ChIP Sequencing, also known as ChIP Seq, is used to analyze protein interactions with DNA. ChIP Seq combines chromatin immunoprecipitation (ChIP) with massively parallel DNA sequencing to identify the cistrome of DNA associated proteins. It can… …   Wikipedia

  • Chip Reese — at the 2005 World Series of Poker Nickname(s) Chip Hometown Las Vegas, Nevada Born March 28, 1951 …   Wikipedia

  • Regulation of gene expression — Gene modulation redirects here. For information on therapeutic regulation of gene expression, see therapeutic gene modulation. For vocabulary, see Glossary of gene expression terms Diagram showing at which stages in the DNA mRNA protein pathway… …   Wikipedia

  • DNA microarray — A DNA microarray (also commonly known as gene chip, DNA chip, or biochip) is a collection of microscopic DNA spots attached to a solid surface. Scientists use DNA microarrays to measure the expression levels of large numbers of genes… …   Wikipedia

  • Chemical biology — is a scientific discipline spanning the fields of chemistry and biology that involves the application of chemical techniques and tools, often compounds produced through synthetic chemistry, to the study and manipulation of biological systems.… …   Wikipedia

  • Tiling array — Tiling Arrays are a subtype of microarray chips. They function on a similar principle to traditional microarrays in that labeled target molecules are hybridized to unlabeled probes fixed on to a solid surface. Tiling arrays differ in the nature… …   Wikipedia

  • Fluorescence in situ hybridization — A metaphase cell positive for the bcr/abl rearrangement (associated with chronic myelogenous leukemia) using FISH. The chromosomes can be seen in blue. The chromosome that is labeled with green and red spots (upper left) is the one where the… …   Wikipedia

  • Paired-end Tags — Paired end tags, also known as PET, refer to the short sequences at the 5’ and 3’ ends of the DNA fragment of interest, which can be a piece of genomic DNA or cDNA. These short sequences are called tags or signatures because, in theory, they… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”

We are using cookies for the best presentation of our site. Continuing to use this site, you agree with this.