The sharing paradox

The prisoner’s dilemma is the current scientist’s hara-kiri. Competing, marketing and even lobbying for resources appears indispensable to survive as a scientist in the XXIst century. We need funding to have the means to produce data to publish papers to attract more funding. However, this virtuous loop is turning into a vicious cycle. In the best scenario, one needs at least to promise to generate new data, while the analyses of existing data are indeed far from being exhausted. As science costs go up, budget cuts follow. While job opportunities are scarce, specialization entwined with multi-disciplinarity requires more and more scientists, technicians, engineers, managers, and communicators. The fear to publish or perish kills the passion to know. In a word, the means defeat the purpose.

Open science in its manifold expressions (publication access, free software, open data) is a pressing alternative and, perhaps, the only way out. Its lemma is as simple as compelling: in order to survive and prosper, we must share. But, why is it so hard to share?

Material sharing implies division, virtual sharing allows for multiplication. We call this the sharing paradox. Within the realm of material stuff, a simple principle applies: if I give you a piece of my apple, I end up with less. However, when we deal with virtual things, the governing rules seem unintuitive: if I give you a piece of my data, I do not lose anything. The law of conservation of mass and energy can be violated even further: if I share it all, each of us will have more in the end. In analogy with the second law of thermodynamics, we could say that the sharing entropy of an open system never decreases. While we know how to deal with physical objects, we are relatively naive when handling information. In other words, apple bites are not data bytes.

In my opinion, our greatest shortcoming is the inability to grasp the sharing paradox. During millions of years our predecessors evolved under the constraints of a strictly material world. As a result, our instinct for survival considers everything in the outside world as a potential threat. This habit, contracted long ago by the human mind, is constantly at work in science too (i.e. my lab fellow becomes my enemy). It is only for the last few years that we, human beings, have had a radically different experience. As we interact collectively and instantaneously in a virtual universe, we are starting to realize that material constraints do not enfold reality anymore. If something I own is something everyone can use, property becomes service and the space of the possible explodes. Data falls in this extraordinary category. Its transformative potential is yet unforeseen.

Data is wealth. Multivariate by nature, it is amenable to several uses and users. When revisited, every projection might reveal something new. However, like paper napkins, we often condemn data to a single-use: one grant, one dataset, one paper. Why do we waste the very same (finite and shrinking) resources we thrive to obtain? Part of the time and energy we spend writing grants to attract funding to generate more data could be devoted, instead, to thinking carefully about it. Open data fits perfectly the principles of replacement, reduction and refinement. Furthermore, being able to compare across datasets is a necessary condition to test the replicability of scientific investigations. This is only possible having access to the data in its raw state, before preprocessing, filtering, analysis and plotting. Finally, taxpayers should not pay twice for the same thing. Duplication efforts drain public resources and are unethical.

To sum up, scientific data is as indispensable as abundant. Cynicism, sensationalism and naive optimism must be substituted by pragmatism: sharing is mainly constrained by our willingness to share.