Document Type

Journal Article

Publication Date

1-1-2016

Journal

PLoS One

Volume

Issue

Inclusive Pages

Article number: e0152725

DOI

10.1371/journal.pone.0152725

Abstract

The number of published articles describing associations between mutations and diseases is increasing at a fast pace. There is a pressing need to gather such mutation-disease associations into public knowledge bases, but manual curation slows down the growth of such databases. We have addressed this problem by developing a text-mining system (DiMeX) to extract mutation to disease associations from publication abstracts. DiMeX consists of a series of natural language processing modules that preprocess input text and apply syntactic and semantic patterns to extract mutation-disease associations. DiMeX achieves high precision and recall with F-scores of 0.88, 0.91 and 0.89 when evaluated on three different datasets for mutation-disease associations. DiMeX includes a separate component that extracts mutation mentions in text and associates them with genes. This component has been also evaluated on different datasets and shown to achieve state-of-the-art performance. The results indicate that our system outperforms the existing mutation-disease association tools, addressing the low precision problems suffered by most approaches. DiMeX was applied on a large set of abstracts from Medline to extract mutation-disease associations, as well as other relevant information including patient/cohort size and population data. The results are stored in a database that can be queried and downloaded at http://biotm.cis.udel.edu/dimex/. We conclude that this high-throughput text-mining approach has the potential to significantly assist researchers and curators to enrich mutation databases.

Comments

Reproduced with permission of PLOS One.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

APA Citation

Mahmood, A., Wu, T., Mazumder, R., & Vijay-Shanker, K. (2016). DiMeX: A Text Mining System for Mutation-Disease Association Extraction.. PLoS One, 11 (4). http://dx.doi.org/10.1371/journal.pone.0152725

Peer Reviewed

Open Access

Download

Included in

Biochemistry, Biophysics, and Structural Biology Commons

COinS

Biochemistry and Molecular Medicine Faculty Publications

DiMeX: A Text Mining System for Mutation-Disease Association Extraction.

Document Type

Publication Date

Journal

Volume

Issue

Inclusive Pages

DOI

Abstract

Comments

Creative Commons License

APA Citation

Peer Reviewed

Open Access

Included in

Search

Browse

Author Corner

Links

Biochemistry and Molecular Medicine Faculty Publications

DiMeX: A Text Mining System for Mutation-Disease Association Extraction.

Authors

Document Type

Publication Date

Journal

Volume

Issue

Inclusive Pages

DOI

Abstract

Comments

Creative Commons License

APA Citation

Peer Reviewed

Open Access

Included in

Share

Search

Browse

Author Corner

Links