Science publications - the next 'big data'?

Blog No.99
Author: Ramon Brasser

Several colleagues of mine dutifully check the arXiv every morning for new scientific papers in my field of planetary science. They scan the titles and abstracts and maybe read those papers that catch their attention or are in their area of expertise. I, on the other hand, have given up long ago. The sheer volume of papers in my field of research that are posted each day makes it all but impossible for those of us unable to speed read, such as yours truly, to keep up with the recent literature.

This large volume of publications presents researchers with a dilemma: they can spend time reading, furthering their knowledge and trying to find open problems, but at the expense of doing science, or they can do science and risk wasting time by doing something already covered in the literature they ignored. Naturally this risk is always present, but since the Kepler mission the field of planetary science has seen exponential growth in the number of papers and active scientists with no indication of letting up. Keeping up with recent developments becomes increasingly unsustainable.

This excess of information calls for a different manner in processing the scientific literature. Reading abstracts is not always enough, because they leave out detailed information by design, and only a few papers move the field forward while others are incremental. Personal judgement is usually best for determining which papers are worth reading and which ones should be binned, but this approach only works when the volume of literature is low.

With the job market and funding ever tightening, and supply of long-term jobs increasingly exceeding demand, young researchers are forced to publish at a ferocious pace in a rat race to attempt to have a leg up over the competition, taking time away from reading and mulling over recent developments. This begs the question of how best to digest the current literature without becoming overwhelmed.

I am of the opinion that processing the scientific literature is increasingly reminiscent of a Big Data problem. We have all heard the Big Data hype, and how it will transform and improve our lives, if we believe its mantra and in the goodwill of the big corporations that hoover up every bit of every trail that we leave. While I will admit that Big Data has its merits and has certainly showed interesting developments, Big Data and machine learning are unable to create new theories. All it can do is find patterns and perhaps (dis)prove existing theories and trends, and for this reason it is perfectly suitable to be applied to disseminating the scientific literature and determine which trends and projects deserve further scrutiny and which ones do to not.

At ELSI our multidisciplinary pursuits demand that our literature access is as broad as the topics that we cover, and therefore the need to effectively process this literature and filter out what is pertinent to our active topics of research and what is not, becomes ever more necessary. It is time that we sit around the table and discuss how we can optimise our knowledge intake while maximising our scientific productivity.

20150528_099_RBrasser
cartoon 'Big Data' by Thierry Gregorious