TY - JOUR
T1 - Whole genome prediction of bladder cancer risk with the bayesian LASSO
AU - De Maturana, Evangelina López
AU - Chanok, Stephen J.
AU - Picornell, Antoni C.
AU - Rothman, Nathaniel
AU - Herranz, Jesús
AU - Calle, M. Luz
AU - García-Closas, Montserrat
AU - Marenne, Gaëlle
AU - Brand, Angela
AU - Tardón, Adonina
AU - Carrato, Alfredo
AU - Silverman, Debra T.
AU - Kogevinas, Manolis
AU - Gianola, Daniel
AU - Real, Francisco X.
AU - Malats, Núria
PY - 2014
Y1 - 2014
N2 - To build a predictive model for urothelial carcinoma of the bladder (UCB) risk combining both genomic and nongenomic data, 1,127 cases and 1,090 controls from the Spanish Bladder Cancer/EPICURO study were genotyped using the HumanHap 1M SNP array. After quality control filters, genotypes from 475,290 variants were available. Nongenomic information comprised age, gender, region, and smoking status. Three Bayesian threshold models were implemented including: (1) only genomic information, (2) only nongenomic data, and (3) both sources of information. The three models were applied to the whole population, to only nonsmokers, to male smokers, and to extreme phenotypes to potentiate the UCB genetic component. The area under the ROC curve allowed evaluating the predictive ability of each model in a 10-fold cross-validation scenario. Smoking status showed the highest predictive ability of UCB risk (AUCtest = 0.62). On the other hand, the AUC of all genetic variants was poorer (0.53). When the extreme phenotype approach was applied, the predictive ability of the genomic model improved 15%. This study represents a first attempt to build a predictive model for UCB risk combining both genomic and nongenomic data and applying state-of-the-art statistical approaches. However, the lack of genetic relatedness among individuals, the complexity of UCB etiology, as well as a relatively small statistical power, may explain the low predictive ability for UCB risk. The study confirms the difficulty of predicting complex diseases using genetic data, and suggests the limited translational potential of findings from this type of data into public health interventions.
AB - To build a predictive model for urothelial carcinoma of the bladder (UCB) risk combining both genomic and nongenomic data, 1,127 cases and 1,090 controls from the Spanish Bladder Cancer/EPICURO study were genotyped using the HumanHap 1M SNP array. After quality control filters, genotypes from 475,290 variants were available. Nongenomic information comprised age, gender, region, and smoking status. Three Bayesian threshold models were implemented including: (1) only genomic information, (2) only nongenomic data, and (3) both sources of information. The three models were applied to the whole population, to only nonsmokers, to male smokers, and to extreme phenotypes to potentiate the UCB genetic component. The area under the ROC curve allowed evaluating the predictive ability of each model in a 10-fold cross-validation scenario. Smoking status showed the highest predictive ability of UCB risk (AUCtest = 0.62). On the other hand, the AUC of all genetic variants was poorer (0.53). When the extreme phenotype approach was applied, the predictive ability of the genomic model improved 15%. This study represents a first attempt to build a predictive model for UCB risk combining both genomic and nongenomic data and applying state-of-the-art statistical approaches. However, the lack of genetic relatedness among individuals, the complexity of UCB etiology, as well as a relatively small statistical power, may explain the low predictive ability for UCB risk. The study confirms the difficulty of predicting complex diseases using genetic data, and suggests the limited translational potential of findings from this type of data into public health interventions.
UR - http://www.scopus.com/inward/record.url?scp=84902986027&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84902986027&partnerID=8YFLogxK
U2 - 10.1002/gepi.21809
DO - 10.1002/gepi.21809
M3 - Article
C2 - 24796258
AN - SCOPUS:84902986027
SN - 0741-0395
VL - 38
SP - 467
EP - 476
JO - Genetic Epidemiology
JF - Genetic Epidemiology
IS - 5
ER -