DELPHI: an MIT framework using machine learning to estimate the impact of a scientific article

0
DELPHI: an MIT framework using machine learning to estimate the impact of a scientific article

Two researchers at the Massachusetts Institute of Technology (MIT) have developed a framework using artificial intelligence. Based on machine learning, this software infrastructure was designed to predict the impact of a new technology by analyzing scientific articles published in the field related to this innovation. The tool also analyzes the potential models developed by researchers, as explained in these publications.

A framework designed using machine learning

James W. Weis, a research associate at the MIT Media Lab, and Joseph Jacobson, a professor in the Media Arts & Sciences program and head of the Media Lab’s Molecular Machines research group, have published an article about the tool they developed: the Dynamic Early-warning by Learning to Predict High Impact ( DELPHI) framework.

This is a machine learning algorithm that takes into account numerous scientific articles. The researchers wanted to exploit this database, which has been growing steadily since the 1980s. The model was developed using a complete chronological compilation of articles that does not only take into account the number of citations of the publication, but all the available metadata allowing to really grasp the propagation of this information in the scientific world.

James W. Weis explains how the tool works:

“Essentially, our algorithm works by learning patterns from scientific history, then matching those patterns on new publications to try to identify early high-impact signals. By tracking the early diffusion of ideas from these new papers, we can predict how likely publications are to go viral or likely to spread to the broader academic and scientific community in a meaningful way.”

The result is a graph containing several connections: they correspond to citations to an article. At the end of both ends of the connection are the nodes that contain all the information of a publication: content, authors, institutions, etc. The more a node is in the center of the graph and causes the creation of new connections, the more the publication is considered to have a strong impact. Several of these simplified graphs are shown in the image below:

modèle machine learning graph connexions données noeuds

Thanks to this system, the scientific impact of the articles is estimated and the articles located in the center of the graph, up to 5% of the nodes, are considered as “high impact”. 5% is the base value, but it can be adjusted between 1 and 10% of the nodes.

A system that deduces the high potential impact of an article

From the generated graphs, DELPHI suggests that high-impact articles propagate on a large scale: to small scientific communities, and even in fields not necessarily related to the one of publication. DELPHI considers that between two papers with the same number of citations, the one with the highest impact is the one that reaches the widest audience. However, even if the programme succeeds in identifying and valorising articles considered as having an impact, publications with a lower impact are not really exploited.

James W. Weis, one of the project’s two investigators, discusses the possible uses of its creation:

“The framework could be useful to encourage teams to work together, even if they don’t know each other. For example, it could make it easier for them to manage their funds with each other to get together and work on important multidisciplinary problems.”

Joseph Jacobson, the second researcher, refers to the objectives of this study:

“This study was about whether it was possible to create a process in a more scalable way, using the scientific community as a whole, as embedded in the academic graph, and being more inclusive in identifying high-impact research directions.”

The researchers caution, however, that DELPHI does not predict the future. Machine learning is used to extract and quantify signals present in the data set. Nevertheless, they were surprised at how quickly an article can be considered high-impact.

A model used to discover the gem of scientific publications

DELPHI was used to highlight about 50 scientific publications that would have a high impact by 2023. Several fields are covered: nanorobots used for cancer treatment, deep neural networks to help chemistry, new discoveries around lithium batteries, etc.

The two researchers believe that DELPHI will be a tool that can help institutions, governments and other decision-makers better manage investments in scientific research. The model will identify technologies that are considered “rare gems” of modern science, which could guide decision-makers to make the right choices.

This is an aspect also mentioned by James W. Weis:

“I became increasingly aware that investors, including myself, were constantly looking for new companies in the same places and with the same preconceptions around research. Yet there is a huge wealth of highly talented people and incredible technologies that I began to glimpse, but that many were overlooking. I thought there had to be a way to work in this space and that machine learning could help us find all this untapped potential more efficiently.”

Translated from DELPHI : un framework du MIT utilisant le machine learning pour estimer l’impact d’un article scientifique