Birth defects can be linked to many factors—genetic, environmental, even pure chance. Characterizing the links of any factor to congenital abnormalities is a daunting task, given the vastness of the problem.
In the face of this challenge, a team of researchers at the Icahn School of Medicine at Mount Sinai tapped artificial intelligence (AI) methods to shed light on associations between existing medications and their potential to induce specific birth abnormalities.
“We wanted to improve our understanding of reproductive health and fetal development, and importantly, warn about the potential of new drugs to cause birth defects before these drugs are widely marketed and distributed,” says Avi Ma’ayan, PhD, Professor of Pharmacological Sciences and Director of the Mount Sinai Center for Bioinformatics at Icahn Mount Sinai.
The team developed a knowledge graph—a descriptive model that maps out the relationships between entities and concepts—called ReproTox-KG to integrate data about small-molecule drugs, birth defects, and genes. In addition to constructing the knowledge graph, the team also used machine learning, specifically semi-supervised learning, to illuminate unexplored links between some drugs and birth defects.
Here’s how ReproTox-KG works as a knowledge graph to predict birth defects.
The study examined more than 30,000 preclinical small-molecule drugs for their potential to cross the placenta and induce birth defects, and identified more than 500 “cliques”—interlinked clusters between birth defects, genes, and drugs—that can be used to explain molecular mechanisms for drug-induced birth defects. Findings were published in Communications Medicine on July 17, and the platform has been made available on a web-based user interface.
In this Q&A, Dr. Ma’ayan, senior author of the paper, discusses ReproTox-KG and its potential impacts.
What was the motivation for your study?
The motivation for the study was to find a use case that combines several datasets produced by National Institutes of Health (NIH) Common Fund programs to demonstrate how integrating data from these resources can lead to synergistic discoveries, particularly in the context of reproductive health.
The study identifies some relationships between approved drugs and birth defects to identify existing drugs that are currently not classified as harmful but which may pose risks to the development of a fetus. It also provides a new global framework to assess potential toxicity for new drugs and explain the biological mechanisms by which some drugs known to cause birth defects may operate.
What are the implications?
Identifying the causes of birth defects is complicated and difficult. But we hope that through complex data analysis integrating evidence from multiple sources, we can improve our understanding of reproductive health and fetal development, and also warn about the potential of new drugs to cause birth defects before these drugs are widely marketed and distributed.
What are the limitations of the study?
We have not yet experimentally validated any of the predictions. There are currently no considerations of tissue and cell type, and the knowledge graph representation omits some detail from the original datasets for the sake of standardization. The website that supports the study may not be appealing to a large audience.
How might these findings be put to use?
Regulatory agencies such as the U.S. Environmental Protection Agency or the Food and Drug Administration may use the approach to evaluate the risk of new drug or other chemical applications. Manufacturers of drugs, cosmetics, supplements, and foods may consider the approach to evaluate the compounds they include in products.
What is your plan for following up on this study?
We plan to use a similar graph-based approach for other projects focusing on the relationship between genes, drugs, and diseases. We also aim to use the processed dataset as training materials for courses and workshops on bioinformatics analysis. Additionally, we plan to extend the study to consider more complex data, such as gene expression from specific tissues and cell types collected at multiple stages of development.
Learn more about how Mount Sinai researchers and clinicians are leveraging machine learning to improve patient lives