YI Le, LUO Dongmei, QIN Yuehai. Statistical Analysis of RPPA Data in Cancer Research[J]. Acta Scientiarum Naturalium Universitatis SunYatseni, 2015,54(2):36-42.
YI Le, LUO Dongmei, QIN Yuehai. Statistical Analysis of RPPA Data in Cancer Research[J]. Acta Scientiarum Naturalium Universitatis SunYatseni, 2015,54(2):36-42.DOI:
采用癌症基因图谱计划的蛋白表达数据,即反相蛋白阵列技术(Reverse Phase Protein Arrays,RPPAs)数据进行统计分析,来挖掘蛋白表达数据所隐藏的癌症的相关信息,提高临床诊断的效率和降低检验的成本。通过3组数据的热点图探测到每组数据的网络结构以及样本中不同基因的表达水平;通过主成分分析,得到在3种癌症中蛋白表达水平起重要作用的5种基因,最后以这5种基因的蛋白表达水平为指标建立了3种癌症的判别模型,并计算误判率的回代估计和交叉验证法估计。得到以下结论:3种癌症形成各自的蛋白表达水平相互关系网络结构,3种癌症有共同蛋白表达水平起重要作用的5种基因,3种癌症的判别模型是可靠的。
Abstract
Protein expression data of The Cancer Genome Atlas
namely the Reverse Phase Protein Array data for statistical analysis
are adopted to mine hidden association information between cancer and genes
and to improve the efficiency of clinical diagnosis and to reduce the cost of inspection. Network structure of each group data and expression levels of different genes are gotten through the heat maps. And 5 genes which play an important role in protein expression levels for these 3 kinds of cancers are obtained by principal component analysis. Finally
the discriminant model based on the 5 genes for 3 kinds of cancer and the estimation of the misjudgment rate are built by the back substitution method and cross-validation method. It is concluded that the network structure of the protein expression level for each kind of cancer is constructed respectively
5 genes which play an important role in protein expression level are sought out
and the result of linear discriminant model is reliable.
关键词
癌症RPPA数据热点图主成分分析判别模型
Keywords
cancerRPPA dataheat mapprincipal component analysisdiscriminant model