基因芯片技术为疾病异质性研究提供了有力的工具。当前基于传统聚类分析的方法一般利用芯片上大量基因作为特征来发现疾病的亚型,因此它们没有考虑到特征中包含的大量无关基因会掩盖有意义的疾病样本的分割。为了避免这个缺点,提出了基于耦合双向聚类的异质性分析方法(Heterogeneous Analysis Based on Coupled Two-WayClustering,HCTWC)来搜索有意义的基因簇以便发现样本的内在分割。该方法被应用于弥漫性大B细胞淋巴瘤(diffuselargeB-celllymphomaDLBCL)芯片数据集,通过识别的基因簇作为特征对DLBCL样本聚类发现生存期分别为55%和25%的两类DLBCL亚型(P<0.05),因此,HCTWC方法在解决疾病异质性是有效的。
The advent of DNA microarray technology has offered the promise of casting new insights onto deciphering secrets of life by monitoring activities of thousands of genes simulta-neously. Current analyses of microarray data focus on precise classification of biological types, for example, tumor versus normal tissues. A further scientific challenging task is to extract dis-ease-relevant genes from the bewildering amounts of raw data, which is one of the most critical themes in the post-genomic era, but it is generally ignored due to lack of an efficient approach. In this paper, we present a novel ensemble method for gene extraction that can be tailored to fulfill multiple biological tasks including (i) precise classification of biological types; (ii) disease gene mining; and (iii) target-driven gene networking. We also give a numerical application for (i) and (ii) using a public microarrary data set and set aside a separate paper to address (iii).
Reconstruction of genetic networks is one of the key scientific challenges in functional genomics. This paper describes a novel approach for addressing the regulatory dependencies be-tween genes whose activities can be delayed by multiple units of time. The aim of the proposed ap-proach termed TdGRN (time-delayed gene regulatory networking) is to reversely engineer the dy-namic mechanisms of gene regulations, which is realized by identifying the time-delayed gene regu-lations through supervised decision-tree analysis of the newly designed time-delayed gene expres-sion matrix, derived from the original time-series microarray data. A permutation technique is used to determine the statistical classification threshold of a tree, from which a gene regulatory rule(s) is ex-tracted. The proposed TdGRN is a model-free approach that attempts to learn the underlying regula-tory rules without relying on any model assumptions. Compared with model-based approaches, it has several significant advantages: it requires neither any arbitrary threshold for discretization of gene transcriptional values nor the definition of the number of regulators (k). We have applied this novel method to the publicly available data for budding yeast cell cycling. The numerical results demonstrate that most of the identified time-delayed gene regulations have current biological knowledge supports.
JIANG Wei1,2, LI Xia1,2,3,4, GUO Zheng1,2,3, LI Chuanxing1, WANG Lihong1 & RAO Shaoqi1,5 1. Department of Bioinformatics, Harbin Medical University, Harbin 150086, China