In March, Microsoft introduced Microsoft Biomedical Search, a tool for researchers to search the entire biomedical literature using natural language queries rather than keywords. This prototype is the result of efforts undertaken since the beginning of the pandemic with the creation of the COVID-19 Open Research Dataset(CORD-19) and a list of questions representative of the growing information needs of biologists and clinicians carried out with the Cleveland Clinic.
Scientific information search and retrieval tools have a fundamental role in research. This has been highlighted in the fight against the current pandemic. Microsoft teams have built on their work for CORD-19 and with the Cleveland Clinic. In a post on the Microsoft blog, Eric Horvitz explained:
“With Microsoft Biomedical Search, we’ve gathered large biomedical literature datasets, representative query workloads, and advances in large-scale neural language models to improve biomedical research. In particular, we sought features that allow researchers to better specify what they mean with the precision of natural language.
Microsoft Biomedical Search highlights the most relevant results from more than 20 million documents from CORD-19, Microsoft Academic, PubMed and PubMed Central and is based on the PubMedBERT, MetaAdaptRank and SaliencyMeasure models.
PubMedBERT is a large-scale language model that has been pre-trained on biomedical text rather than on a mixture of general domain and domain-specific language. The model was pre-trained from scratch on 3 billion biomedicine-specific words. We found that the model outperforms all previous language models on biomedical natural language processing applications.
MetaAdaptRank helps to accurately determine relevance by tempering common problems associated with ranking search results. Information retrieval systems often fail to identify all relevant information because queries and documents use different terms to describe the same concept. For example, this mismatch can occur when a searcher is unfamiliar with new terminology. MetaAdaptRank can learn the semantics of specialized domains to more accurately rank results, even for topics or keywords for which information is scarce.
SaliencyMeasure uses reinforcement learning to predict the likely future importance of a scientific paper, which helps balance the ranking of older and newer publications or authors rather than relying solely on citations.”