IT’S the proverbial needle in a haystack. The more information there is online, the easier it is to overlook the most important stuff. Now an automated tool has been set the Herculean task of mining every science paper it can find online to help researchers come up with new ideas.
Semantic Scholar, launched this week by the Seattle-based Allen Institute for Artificial Intelligence (AI2), can read, digest and categorise findings from the estimated 2 million scientific papers published each year. Up to half of these papers are never read by more than three people. The system aims to identify previously overlooked information and connections with other research.
“Our vision is of a scientist’s apprentice, giving researchers a very powerful way to analyse what’s going on in their field,” says Oren Etzioni, director of AI2. A researcher will be able to ask what the literature says about middle-aged women with diabetes who use a particular drug, for instance.
The system works by crawling the web for publicly available papers and then scanning their text and images. By identifying citations and references, Semantic Scholar can determine the most influential or controversial papers. It also highlights key phrases from similar studies, extracting and indexing the data sets and methods used by each researcher.
AI2 is not the only organisation intent on digitising and analysing the world’s scientific discoveries. Meta, a big-data start-up in Toronto, Canada, announced a similar service this week called Meta Science, which scans publishers’ libraries and university websites to rank scientific papers. In 2013, a system using IBM’s Watson AI technology, called the Knowledge Integration Toolkit (KnIT), mined 100,000 papers to successfully predict the interactions of a tumour-suppressing protein. IBM says KnIT is now fully automated to work without human oversight. The Defense Advanced Research Projects Agency (DARPA) in the US is also working on technology, code-named Big Mechanism, to read all papers on certain types of cancer to help identify potential treatments. It is scheduled for completion by the end of 2017.
Kenneth Forbus of Northwestern University in Chicago, Illinois, is confident that such services will prove useful. “Machines that help us filter could increase the rate at which we find, if not diamonds in the rough, then at least useful nuggets,” says Forbus. “One might miss something, but professors already routinely use graduate students and colleagues for the same service, so the risks are well-understood.”
“Machines that help us filter information could increase the rate at which we find useful nuggets”
At launch, Semantic Scholar is focusing on computer-science papers. It will gradually expand its scope to include biology, physics and the remaining hard sciences.
Etzioni says the plan is to increase the system’s power over time to see how deeply it can understand what a paper is about. “Ultimately, perhaps a human scientist doesn’t have to read it at all.”