deepBreaks identifies and prioritizes genotype-phenotype associations using machine learning

Document Type

Journal Article

Publication Date

11-7-2025

Journal

Scientific reports

Volume

15

Issue

1

DOI

10.1038/s41598-025-25580-6

Keywords

Machine learning algorithm; Phenotype-genotype association; SNP prioritization; Sequence analysis

Abstract

Sequence data, such as nucleotides or amino acids, are crucial in advancing our understanding of biology. However, investigating and analyzing sequencing data and genotype-phenotype associations present several challenges, including noise components that arise from the sequencing, nonlinear genotype-phenotype associations, collinearity between input features, and high dimensionality of the input data. Machine learning (ML) algorithms have proven to be effective in detecting intricate and nonstructural patterns, making them a valuable tool for studies focused on genotype-phenotype associations. Yet, there needs to be more user-friendly ML implementations that leverage the unique features of high-volume DNA sequence data. Here, we introduce deepBreaks, a generic approach that detects important positions (genotypes) in sequence data that are associated with phenotypic traits. deepBreaks compares the performance of multiple ML algorithms and prioritizes positions based on the best-fit models. It is open-source software with online documentation and examples available at https://github.com/omicsEye/deepBreaks .

Department

Biostatistics and Bioinformatics

Share

COinS