A traditional case–control genome-wide association study (GWAS) typically involves individual single nucleotide polymorphism (SNP) analysis. Therefore, it is of great interest to find a more complex variable selection procedure that is effective, as well as computationally efficient. The aim of this article is to develop an empirical Bayes variable selection approach to logistic regression modeling of large-scale genomic data. The proposed method (ICM/M) is able to incorporate gene–gene network information using a Bayesian formulation and implementation. Simulation studies were carried out to assess the performances of ICM/M, with lasso and adaptive lasso as benchmarks. Overall, the simulation studies show that ICM/M outperforms other methods in terms of number of false positives and has competitive predictive ability. Parkinson’s disease (PD) data collected from NINDS Repository samples and acquired from dbGaP was used to further validate ICM/M . The authors conclude that simulation studies and empirical data analysis show considerable improvement of ICM/M over lasso.
Click here to access the full publication:
Additional information is available on the PubMed website.