- 最后登录
- 2019-4-8
- 在线时间
- 510 小时
- 寄托币
- 4754
- 声望
- 97
- 注册时间
- 2009-2-7
- 阅读权限
- 30
- 帖子
- 60
- 精华
- 0
- 积分
- 1360
- UID
- 2599524
- 声望
- 97
- 寄托币
- 4754
- 注册时间
- 2009-2-7
- 精华
- 0
- 帖子
- 60
|
本帖最后由 tuziduidui 于 2009-8-10 08:59 编辑
【CASK EFFECT】0910G阅读能力基础自测(速度、难度、深度、越障、真题、RAM)
https://bbs.gter.net/forum.php?mod=viewthread&tid=910464&highlight
【CASK EFFECT】0910F阅读全方位锻炼--越障【SCI】汇总贴
https://bbs.gter.net/thread-982020-1-1.html
规则:0 u, r. g$ C/ d+ [4 f5 C
我每天贴出1000字左右的一篇文字7 j) N0 Q, Q- ]( V4 E
没有别的要求,只要大家坚持读完就可以
如果你能坚持一个月,你会发现自己的阅读进化了~
[注]9 K7 C8 w4 {" L
1、直接在电脑屏幕面前做,虽然GRE阅读是在纸上考,但是这个过程会遏制你做笔记,同时给你的阅读造成视觉障碍,也就是把难度训练和抗干扰训练同步结合,增加效率(初期会很累,但是既然大家想要成为高手,那么就别对自己太温柔)
Today's Topic: Ranking differentially expressed genes from Affymetrix gene expression data: methods with reproducibility, sensitivity, and specificity
Abstract
Background: To identify differentially expressed genes (DEGs) from microarray data, users of the Affymetrix Gene Chip system need to select both a preprocessing algorithm to obtain expression level measurements and a way of ranking genes to obtain the most plausible candidates. We recently recommended suitable combinations of a preprocessing algorithm and gene ranking method that can be used to identify DEGs with a higher level of sensitivity and specificity. However, in addition to these recommendations, researchers also want to know which combinations enhance reproducibility.
Results: We compared eight conventional methods for ranking genes: weighted average difference (WAD),average difference (AD), fold change (FC), rank products (RP), moderated t statistic(modT), significance analysis of microarrays (samT), shrinkage t statistic(shrinkT), and intensity based moderated t statistic (ibmT) with six preprocessing algorithms (PLIER, VSN, FARMS, multimgMOS (mmgMOS), MBEI, and GCRMA). A total of 36 real experimental data sets was evaluated on the basis of the area under the receiver operating characteristic curve (AUC) as a measure for both sensitivity and specificity. We found that the RP method performed well for VSN-, FARMS-, MBEI-, and GCRMA-preprocessed data, and the WAD method performed well for mmgMOS preprocessed data. Our analysis of the MicroArray Quality Control (MAQC) project's data sets showed that the FC-based gene ranking methods (WAD, AD, FC, and RP) had a higher level of reproducibility: The percentages of overlapping genes (POGs) across different sites for the FC-based methods were higher overall than those for the t-statistic-based methods (modT,samT, shrinkT, and ibmT). In particular, POG values for WAD were the highest overall among the FC based methods irrespective of the choice of preprocessing algorithm.
Conclusion: Our results demonstrate that to increase sensitivity, specificity, and reproducibility in microarray analyses, we need to select suitable combinations of preprocessing algorithms and gene ranking methods. We recommend the use of FC-based methods,in particular RP or WAD.
Background
Microarray analysis is often used to detect differentially expressed genes (DEGs) under different conditions. As there are considerable differences [1,2] in how well it performs, choosing the best method of ranking these genes is important.Furthermore, Affymetrix GeneChip users need to choose a preprocessing algorithm from a number of competitors in order to obtain expression-level measurements [3].
We recently reported with another group that there are suitable combinations of preprocessing algorithms and gene ranking methods [1,2]. We evaluated three preprocessing algorithms, MAS [4], RMA [5], and DFW [6], and eight gene ranking methods, WAD [1], AD, FC, RP [7], modT [8], samT [9], shrinkT [10], and ibmT[11], by using a total of 38 data sets (including 36 real experimental datasets)[1]. Meanwhile, Pearson [2] evaluated nine preprocessing algorithms, MAS [4],RMA [5], DFW [6], MBEI [12], CP [13], PLIER[14], GCRMA [15], mmgMOS [16], and FARMS[17], and five gene ranking methods, modT [8], FC, a standard t-test,cyberT [18], and PPLR [19], by using only one artificial 'spike-in' dataset,the Golden Spike dataset [13].
When were-evaluated the two reports using the common algorithms and methods we found that suitable gene ranking methods for each of the three preprocessing algorithms, i.e., MAS, RMA, and DFW, converge to the same: Combinations of MAS and modT (MAS/modT), RMA/FC, and DFW/FC can thus be recommended. However, the final conclusions for the original reports are understandably different: Our recommendations [1] are MAS/WAD, RMA/FC, and DFW/RP, while Pearson [2] recommends mmgMOS/PPLR, GCRMA/FC, and so on. This difference is mainly because fewer preprocessing algorithms were evaluated in our previous study [1].
We investigated suitable gene ranking methods for each of six preprocessing algorithms: MBEI,VSN [20], PLIER, GCRMA, FARMS, and mmgMOS. We also investigated the best combination of a preprocessing algorithm and gene ranking method using another evaluation metric, i.e., the percentage of overlapping genes (POG), proposed by the MAQC study [21].
Most authors of methodological papers have made claims that their methods have a greater area under the receiver operating characteristic curve (AUC) values, i.e., both high sensitivity and specificity [1,2]. However, reproducibility is rarely mentioned[21]. A good method should produce high POG values, i.e., those indicating reproducibility as well as high AUC ones, i.e., those for sensitivity and specificity. We will discuss suitable combinations of preprocessing algorithms and gene ranking methods.
Conclusion
We evaluated the performance of combinations between six preprocessing algorithms and eight gene ranking methods in terms of the AUC value, i.e., both sensitivity and specificity, and the POG one, i.e., reproducibility. Our comprehensive evaluation confirmed the importance of using suitable combinations of preprocessing algorithms and gene ranking methods.
Overall, two FC-based gene ranking methods (RP and WAD) can be recommended. Our current and previous results indicate that any of the following combinations, RMA/RP,DFW/RP, PLIER/RP, VSN/RP, FARMS/RP, MBEI/ RP, GCRMA/RP, MAS/WAD, and mmgMOS/WAD, enhances both sensitivity and specificity, and also that using the WAD method enhances reproducibility.
Methods
The raw data(Affymetrix CEL files) for Datasets 3–38 were obtained from the Gene ExpressionOmnibus (GEO) website [32]. All analysis was performed using R (ver. 2.7.2)[33] and Bioconductor [34]. The versions of R libraries used in this study areas follows: plier (ver. 1.10.0), vsn (3.2.1), farms (1.3),puma (1.6.0), affy (1.16.0) [35], gcrma (2.10.0), RankProd(2.12.0) [36], st (1.0.3) [10], limma (2.14.7) [8], ROC (1.14.0).The main functions in the R libraries are as follows: justPlier for PLIER,vsnrma for VSN, q.farms for FARMS, mmgmos for mmgMOS, expresso for MBEI (PM only model), gcrma for GCRMA, mas5 for MAS, rmafor RMA, expresso and the R codes available in [37] for DFW, RP forRP, modt.stat for modT, sam.stat for samT, shrinkt.stat forshrinkT, IBMT for ibmT [38], and pumaComb and pumaDE forPPLR [19].
Since the MBEIand MAS expression measures do not output logged values, signal intensities under 1 in those preprocessed data were set to 1 so that the logarithm of the data could be found. Logged values smaller than 0 in PLIER-, VSN-, FARMS-, mmgMOS-,and GCRMA-preprocessed data were set to 0. For reproducible research, we made the R code for analyzing Dataset 4 (GEO ID: GSM189708–189713) available as the additional file [see Additional file 3]. The R codes for the other datasets are available upon request.
The raw data forthe MAQC datasets were obtained from the MAQC website [39]. The evaluationbased on POG was done with 12 datasets produced by the MAQC project [21] inwhich two RNA sample types and two mixtures of the original samples were used:Sample A, a universal human reference RNA; Sample B, a human brain reference RNA;Sample C, which consisted of 75 and 25% of Sample A and B respectively; andSample D, which consisted of 25 and 75% of Sample A and B respectively. Fivereplicate experiments for each of the four sample types at six independent testsites (Sites 1–6) were conducted, and, thus there are 20 files at each site.The data preprocessing was performed at each site. The application of the gene rankingmethods was independently performed for comparisons of "Sample A versusB" and "Sample C versus D". |
-
总评分: 声望 + 1
查看全部投币
|