Disappearing Fruit: How Big Data Affects Science and the Role of ELNS in Information Management
Let’s imagine Science as a fruit tree, and its fruit as scientific knowledge. If scientists were responsible for harvesting this tree, their predecessors would have picked all the low-hanging fruit. Centuries later, only the heaviest fruit, positioned far into the tree’s treacherous branches, would remain.
How does this relate to scientific research?
Research is becoming increasingly complicated, or as Professor of Ethics at The University of Pennsylvania wrote in a blog post, “[it] is asking more precise questions but the answers are harder to get.” This is especially true in fields such as genetics and neuroscience, which have been strongly influenced by “big data.”
Unfortunately, those who work with big data run the risk of experiencing “data deluge,” a situation in which scientists generate immense amounts of information but are unable to effectively manage and/or use it. To combat this information overload, digital tools such as electronic laboratory notebooks (ELN) and data-intensive computational programs are essential technologies that apply information management principles and powerful computational approaches toward the standardization, collection and integration of data.
The Age of Big Data
Big data is characterized by the three Vs of information: volume, velocity and variety. Bereft of low-hanging fruit, scientists have created new technologies that increase the quantity and type of information that can be stored (in other words, maximize the Vs of information). For example, Harvard professor Pardis Sabeti creates tools to analyze billions of base pairs in the human genome to identify genes that quickly “rose to prominence” during human evolution.
Another example: in Nov. 2014, the prestigious journal Nature Neuroscience devoted an entire issue highlighting the use of big data in neuroscience. Applications of big data in this field include collecting large proteomic data sets, analyzing the connections and activities between billions of neurons and much more. At the end of their article, the authors write, “Although it’s impossible to predict the size of the effect big data will have on the way neuroscience research is done and what progress will be made…it’s clear that the wave of big data is not coming, it’s here to stay.”
Naturally, this quantum leap in both complexity and the sheer amount of information that can be gathered has generated some concern. As author David Weinberger writes in his book Too Big to Know, “With the new database-based science, there is often no moment when the complex becomes simple enough for us to understand it…They are so complex that only our artificial brains can manage the amount of data and the number of interactions.”
Indeed, Tom Siegfried, a writer for Science News warns readers “collecting data and storing information is not the same as understanding it.” Siegfried writes that a lack of understanding predisposes scientists to using the wrong tools and subsequently, generating false conclusions.
This leads to another problem: accountability. Most studies show that only 10-30 percent of experiments highlighted in scientific journal articles are reproducible. This is especially disheartening when considering the time and resources scientists’ invest trying to reproduce the work of others (which is likely irreproducible) or the $31 billion dollars the US government spends on research ($9.3-28 billion dollars of which is conceivably wasted).
Yet the irreproducibility of 70-90 percent of scientific research can also seem entirely feasible. Science has become very complicated and data can be difficult to interpret. Now add competition, low pay and job instability to the milieu and we’ve created the perfect storm. For example, in 2013, NIH funding dropped dramatically and so did the number of investigators. Some of these labs closed and others “survive[d] by other means;” however, the message was clear: science is complicated, data can be overwhelming but when there is little money and resources it’s important to work quickly (sometimes at the expense of careful science)–or your lab will lose funding.
Utilizing ELNs and Digital Tools for Effective Data Management
With all of these competing pressures, how can scientists and researchers improve the way science is done?
One important step is to utilize appropriate digital tools, like ELNs. As presented in an editorial published in Nature, “Too often when errors or cases of fraud occur in science, the lab data required to reconstruct what happened have gone astray.” Thus, ELNs can provide a clear audit trail in order to monitor experiments and ensure data integrity, two key components of accountability. ELNs are also easily searchable and maintained in posterity so when lab members move on, their information remains.
We’ve previously discussed the dangers of fabricated science, but the benefit of ELNs is not limited to the prevention of scientific fraud. ELNs enable scientists to systematically capture, manage and structure information, which can then be leveraged toward answering scientific questions and generating knowledge. And as we advance toward an age in which data deluge becomes the norm, ELNs and similar technologies will become agents of change, transforming global organizations by enabling collaboration, fostering innovation and driving growth.
In returning to the thought of science as a fruit tree, ELNs and other modern technologies enable researchers to climb further into this tree, dig deeper into its branches, grab the heaviest of fruit and in doing so, teach us more than we believed possible. Regardless of the size of your datasets, the careful organization and processing of data wins the day — or helps publish the papers. To see how the BIOVIA Notebook might assist your research, please visit our website.