Deep Learning for Genome-wide Association Studies and the Impact of SNP Locations

Deep Learning for Genome-wide Association Studies and the Impact of SNP Locations
Author: Songyuan Ji
Publisher:
Total Pages:
Release: 2019
Genre:
ISBN:


Download Deep Learning for Genome-wide Association Studies and the Impact of SNP Locations Book in PDF, Epub and Kindle

The study of Single Nucleotide Polymorphisms (SNPs) associated with human diseases is important for identifying pathogenic genetic variants and illuminating the genetic architecture of complex diseases. A Genome-wide association study (GWAS) examines genetic variation in different individuals and detects disease related SNPs. The traditional machine learning methods always use SNPs data as a sequence to analyze and process and thus may overlook the complex interacting relationships among multiple genetic factors. In this thesis, we propose a new hybrid deep learning approach to identify susceptibility SNPs associated with colorectal cancer. A set of SNPs variants were first selected by a hybrid feature selection algorithm, and then organized as 3D images using a selection of space-filling curve models. A multi-layer deep Convolutional Neural Network was constructed and trained using those images. We found that images generated using the space-filling curve model that preserve the original SNP locations in the genome yield the best classification performance. We also report a set of high risk SNPs associate with colorectal cancer as the result of the deep neural network model.

Machine Learning in Genome-Wide Association Studies

Machine Learning in Genome-Wide Association Studies
Author: Ting Hu
Publisher: Frontiers Media SA
Total Pages: 74
Release: 2020-12-15
Genre: Science
ISBN: 2889662292


Download Machine Learning in Genome-Wide Association Studies Book in PDF, Epub and Kindle

This eBook is a collection of articles from a Frontiers Research Topic. Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: frontiersin.org/about/contact.

Integration and Development of Machine Learning Methodologies to Improve the Power of Genome-wide Association Studies

Integration and Development of Machine Learning Methodologies to Improve the Power of Genome-wide Association Studies
Author: Jing Li
Publisher:
Total Pages: 250
Release: 2016
Genre:
ISBN:


Download Integration and Development of Machine Learning Methodologies to Improve the Power of Genome-wide Association Studies Book in PDF, Epub and Kindle

Genome-wide association studies (GWAS) have led to a great number of new findings in human genetics and genetic epidemiology. GWAS identifies DNA sequence variations using human genome data and identifies the genetic risk factors for common diseases. There are many challenges that remain when mapping the complex underlying relationships between genotypes and phenotypes in GWAS. Here, we attempt to improve the power to detect correct mapping in GWAS for disease prevention and treatment. We examine a number of assumptions in GWAS that have been made over the past decade, which need to be updated and discussed in light of recent GWAS algorithm development. To achieve this goal, we discuss some of the current assumptions of GWAS and all possible factors that could affect predictive power. Using simulation studies, we show statistical evidence of how different factors, including sample size, heritability, model misspecification, and measurement error, affect the power to detect correct genetic associations. These data have the potential to improve the design of GWAS. As epistasis is the key to studying GWAS, we specifically studied epistasis, which is believed to account for part of the missing heritability. To detect interactions, we developed permuted Random Forest (pRF), a scale-free method, which is based on the traditional machine learning method Random Forest (RF). This method accurately detects single nucleotide polymorphism (SNP)-SNP interactions and top interacting SNP pairs by estimating how much the power of a random forest classification model is influenced by removing pairwise interactions. We systematically tested this approach on a simulation study with datasets possessing various genetic constraints including heritability, number of SNPs, and sample size. Our methodology shows high success rates for detecting interacting SNP pairs. We also applied our approach to two bladder cancer datasets, which shows results consistent with well-studied methodologies and we built permuted Random Forest networks (PRFN), in which we used nodes to represent SNPs and edges to indicate interactions. Data suggest the pRF method could improve detection of pure gene-gene interactions. Classic methods used to detect genetic association in GWAS involved separating biological knowledge from genetic information, thus wasting useful biological information when modeling associations between genotypes and phenotypes. We therefore further developed a biological information guided machine learning methodology, based on Encyclopedia of DNA Elements (ENCODE), called ENCODE information guided synthetic feature Random Forest (E-SFRF). Instead of studying biological associations at the SNP level, we separated SNPs based on ENCODE information and grouped them into a particular gene or enhancer to calculate the synthetic feature (SF) on a higher level. In our study, we focused on genes or enhancers from the AHR pathway, which is involved in cancer development. This work showed that the E-SFRF method could identify consistent main effect models based on SFs from two independent bladder cancer studies. We further studied the SNP-SNP interactions inside the top main effect SFs and discovered interesting SNP-SNP interactions that may lead to strong main effects. We believe our method could increase the possibility of replicating results across different GWAS datasets by increasing both the consistency and accuracy in genetic studies. Overall, we have found that studying interactions among SNPs is essential to increasing the power to uncover genetic architectures. By developing different machine learning methods, pRF, and further incorporating biological information to develop E-SFRF, we were able to detect pure gene-gene interactions in a scale-free and non-parametric way, helping to increase repeatability and reliability of GWAS using biological knowledge.

Deep Learning for Genome-wide Association Studies

Deep Learning for Genome-wide Association Studies
Author: Deepak Sharma
Publisher:
Total Pages:
Release: 2022
Genre:
ISBN:


Download Deep Learning for Genome-wide Association Studies Book in PDF, Epub and Kindle

"Genome-Wide Association Studies (GWAS) are a popular tool in statistical genomics that are used to identify genetic variants associated with various dis- eases. However, their success has been limited, in part because they typically do not incorporate interactions between variants to model target traits. Since Deep neural networks have been successful across domains abundant with com- plex signals, like speech, language, and vision, they are also popular candidates for modelling interactions between genetic variants. However, their black-box nature is a hindrance to their application for GWAS. In this thesis, we present a pipeline to train and interpret feedforward neu- ral networks to conduct a genome-wide association study (GWAS). We show that trained deep neural networks can be interpreted using feature-importance techniques to accurately distinguish and rank simulated causal genetic variants. We improve its accuracy by extending the pipeline to the multi-task setting, wherein we simultaneously model two related, simulated traits. We demon- strate the accuracy, reliability, and scalability of our approach by identifying most known Diabetes genetic risk factors found using a conventional GWAS on the UK Biobank"--

Genetic Dissection of Complex Traits

Genetic Dissection of Complex Traits
Author: D.C. Rao
Publisher: Academic Press
Total Pages: 788
Release: 2008-04-23
Genre: Medical
ISBN: 0080569110


Download Genetic Dissection of Complex Traits Book in PDF, Epub and Kindle

The field of genetics is rapidly evolving and new medical breakthroughs are occuring as a result of advances in knowledge of genetics. This series continually publishes important reviews of the broadest interest to geneticists and their colleagues in affiliated disciplines. Five sections on the latest advances in complex traits Methods for testing with ethical, legal, and social implications Hot topics include discussions on systems biology approach to drug discovery; using comparative genomics for detecting human disease genes; computationally intensive challenges, and more

Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Author: Gerard Salton
Publisher: New York ; Montreal : McGraw-Hill
Total Pages: 470
Release: 1983
Genre: Computers
ISBN:


Download Introduction to Modern Information Retrieval Book in PDF, Epub and Kindle

Examines Concepts, Functions & Processes of Information Retrieval Systems

Exploration of SNP-set Interactions in Genome-wide Association Studies

Exploration of SNP-set Interactions in Genome-wide Association Studies
Author: Shomoita Alam
Publisher:
Total Pages: 91
Release: 2014
Genre: Biotechnology
ISBN:


Download Exploration of SNP-set Interactions in Genome-wide Association Studies Book in PDF, Epub and Kindle

Advance in biotechnologies has enabled genome-wide association studies (GWAS) that scan the entire human genome for understanding genetic contributions to the risk of a certain disease as well as to variation in treatment efficacy and side effects. In GWAS, the association between each single-nucleotide polymorphism (SNP) and a phenotype is assessed statistically, typically analyzing one single SNP at a time, ignoring potential SNP-SNP interactions. Such individual-SNP analysis approaches have extracted small fractions of expected genetic contributions to disease risks: this has been recognized as "the missing heritability problem" of GWAS. Biologically, it is highly unlikely that a single SNP alone would determine disease risk, especially for complex chronic diseases. We therefore tested whether biological interactions among multiple SNPs determine disease risks and whether it can explain the missing heritability problem. The methodologies proposed in this work take into account the interaction between selected SNP-sets using two methods: (1) method based on logic regression that incorporates two specific forms of interaction; and (2) method based on SNP-pair analysis which is an exploration of genotypes that are only observed in cases with a sufficient frequency and with no control having the same specific genotypes. Both methods could identify many previously-found and novel susceptibility genes for the datasets we tested on, although validation studies are required to avoid spurious findings. While our results do not provide a satisfactory solution to the ``missing heritability" problem, they show the importance of considering SNP interactions and their exploration in considering genetic contributions of disease etiology, prevention and treatment.

Agricultural Bioinformatics

Agricultural Bioinformatics
Author: Kavi Kishor P.B.
Publisher: Springer
Total Pages: 296
Release: 2014-07-14
Genre: Science
ISBN: 8132218809


Download Agricultural Bioinformatics Book in PDF, Epub and Kindle

A common approach to understanding the functional repertoire of a genome is through functional genomics. With systems biology burgeoning, bioinformatics has grown to a larger extent for plant genomes where several applications in the form of protein-protein interactions (PPI) are used to predict the function of proteins. With plant genes evolutionarily conserved, the science of bioinformatics in agriculture has caught interest with myriad of applications taken from bench side to in silico studies. A multitude of technologies in the form of gene analysis, biochemical pathways and molecular techniques have been exploited to an extent that they consume less time and have been cost-effective to use. As genomes are being sequenced, there is an increased amount of expression data being generated from time to time matching the need to link the expression profiles and phenotypic variation to the underlying genomic variation. This would allow us to identify candidate genes and understand the molecular basis/phenotypic variation of traits. While many bioinformatics methods like expression and whole genome sequence data of organisms in biological databases have been used in plants, we felt a common reference showcasing the reviews for such analysis is wanting. We envisage that this dearth would be facilitated in the form of this Springer book on Agricultural Bioinformatics. We thank all the authors and the publishers Springer, Germany for providing us an opportunity to review the bioinformatics works that the authors have carried in the recent past and hope the readers would find this book attention grabbing.

Biostatistical Genetics and Genetic Epidemiology

Biostatistical Genetics and Genetic Epidemiology
Author: Robert C. Elston
Publisher: John Wiley & Sons
Total Pages: 860
Release: 2002-04-22
Genre: Medical
ISBN: 9780471486312


Download Biostatistical Genetics and Genetic Epidemiology Book in PDF, Epub and Kindle

"Human Genetics and Genetic Epidemiology" ist der 3. Band aus der sehr erfolgreichen Reihe 'Wiley Biostatistics Reference Series', die auf Artikeln der "Encyclopedia of Biostatistics" basiert. Dieser Band gibt einen topaktuellen und umfassenden Überblick über ein Forschungsgebiet, das insbesondere im Zuge des Human-Genomprojekts eine regelrechte Explosion an Forschungsaktivitäten erlebt hat. Enthalten sind komplett aktualisierte Artikel aus der "Encyclopedia of Biostatistics" sowie über 25% neue Artikel. Mit einem komplexen System an Querverweisen, die das Auffinden der gewünschten Information erheblich erleichtern. Eine interessante Lektüre für Genetiker, Epidemiologen, Biostatistiker und Forscher in diesen Bereichen.