It is of great importance to identify new cancer genes from the data of large scale genome screenings of gene mutations in cancers. Considering the alternations of some essential functions are indispensable for oncogenesis, we define them as cancer functions and select, as their approximations, a group of detailed functions in GO (Gene Ontology) highly enriched with known cancer genes. To evaluate the efficiency of using cancer functions as features to identify cancer genes, we define, in the screened genes, the known protein kinase cancer genes as gold standard positives and the other kinase genes as gold standard negatives. The results show that cancer associated functions are more efficient in identifying cancer genes than the selection pressure feature. Furthermore, combining cancer functions with the number of non-silent mutations can generate more reliable positive predictions. Finally, with precision 0.42, we suggest a list of 46 kinase genes as candidate cancer genes which are annotated to cancer functions and carry at least 3 non-silent mutations.
LI YanHui1, GUO Zheng1,2, PENG ChunFang2, LIU Qing2, MA WenCai2, WANG Jing2, YAO Chen2, ZHANG Min2 & ZHU Jing1 1 Bioinformatics Centre, School of Life Science, University of Electronic Science and Technology of China, Chengdu 610054, China
Identifying disease-relevant genes and functional modules, based on gene expression pro- files and gene functional knowledge, is of high im- portance for studying disease mechanisms and sub- typing disease phenotypes. Using gene categories of biological process and cellular component in Gene Ontology, we propose an approach to selecting func- tional modules enriched with differentially expressed genes, and identifying the feature functional modules of high disease discriminating abilities. Using the differentially expressed genes in each feature module as the feature genes, we reveal the relevance of the modules to the studied diseases. Using three data- sets for prostate cancer, gastric cancer, and leukemia, we have demonstrated that the proposed modular approach is of high power in identifying functionally integrated feature gene subsets that are highly rele- vant to the disease mechanisms. Our analysis has also shown that the critical disease-relevant genes might be better recognized from the gene regulation network, which is constructed using the characterized functional modules, giving important clues to the concerted mechanisms of the modules responding to complex disease states. In addition, the proposed approach to selecting the disease-relevant genes byjointly considering the gene functional knowledge suggests a new way for precisely classifying disease samples with clear biological interpretations, which is critical for the clinical diagnosis and the elucidation of the pathogenic basis of complex diseases.
GESTs (gene expression similarity and taxonomy similarity), a gene functional prediction approach previously proposed by us, is based on gene expression similarity and concept similarity of functional classes defined in Gene Ontology (GO). In this paper, we extend this method to protein-protein interac-tion data by introducing several methods to filter the neighbors in protein interaction networks for a protein of unknown function(s). Unlike other conventional methods, the proposed approach automati-cally selects the most appropriate functional classes as specific as possible during the learning proc-ess, and calls on genes annotated to nearby classes to support the predictions to some small-sized specific classes in GO. Based on the yeast protein-protein interaction information from MIPS and a dataset of gene expression profiles, we assess the performances of our approach for predicting protein functions to “biology process” by three measures particularly designed for functional classes organ-ized in GO. Results show that our method is powerful for widely predicting gene functions with very specific functional terms. Based on the GO database published in December 2004, we predict some proteins whose functions were unknown at that time, and some of the predictions have been confirmed by the new SGD annotation data published in April, 2006.
GAO Lei1, LI Xia1,2, GUO Zheng1,2, ZHU MingZhu1, LI YanHui1 & RAO ShaoQi1,3 1 Department of Bioinformatics, Harbin Medical University, Harbin 150086, China
Selecting differentially expressed genes(DEGs) is one of the most important tasks in microarray applications for studying multi-factor diseases including cancers.However,the small samples typically used in current microarray studies may only partially reflect the widely altered gene expressions in complex diseases,which would introduce low reproducibility of gene lists selected by statistical methods.Here,by analyzing seven cancer datasets,we showed that,in each cancer,a wide range of functional modules have altered gene expressions and thus have high disease classification abilities.The results also showed that seven modules are shared across diverse cancers,suggesting hints about the common mechanisms of cancers.Therefore,instead of relying on a few individual genes whose selection is hardly reproducible in current microarray experiments,we may use functional modules as functional signatures to study core mechanisms of cancers and build robust diagnostic classifiers.
YAO ChenZHANG MinZOU JinFengLI HongDongWANG DongZHU JingGUO Zheng