deepBreaks identifies and prioritizes genotype-phenotype associations using machine learning
Document Type
Journal Article
Publication Date
11-7-2025
Journal
Scientific reports
Volume
15
Issue
1
DOI
10.1038/s41598-025-25580-6
Keywords
Machine learning algorithm; Phenotype-genotype association; SNP prioritization; Sequence analysis
Abstract
Sequence data, such as nucleotides or amino acids, are crucial in advancing our understanding of biology. However, investigating and analyzing sequencing data and genotype-phenotype associations present several challenges, including noise components that arise from the sequencing, nonlinear genotype-phenotype associations, collinearity between input features, and high dimensionality of the input data. Machine learning (ML) algorithms have proven to be effective in detecting intricate and nonstructural patterns, making them a valuable tool for studies focused on genotype-phenotype associations. Yet, there needs to be more user-friendly ML implementations that leverage the unique features of high-volume DNA sequence data. Here, we introduce deepBreaks, a generic approach that detects important positions (genotypes) in sequence data that are associated with phenotypic traits. deepBreaks compares the performance of multiple ML algorithms and prioritizes positions based on the best-fit models. It is open-source software with online documentation and examples available at https://github.com/omicsEye/deepBreaks .
APA Citation
Baghbanzadeh, Mahdi; Dawson, Tyson; Sayoldin, Bahar; Frazer, Seth A.; Oakley, Todd H.; Crandall, Keith A.; and Rahnavard, Ali, "deepBreaks identifies and prioritizes genotype-phenotype associations using machine learning" (2025). GW Authored Works. Paper 8123.
https://hsrc.himmelfarb.gwu.edu/gwhpubs/8123
Department
Biostatistics and Bioinformatics