Monday, March 1, 2010

Open Notebook Science for Paleontology?

Open notebook science is kinda like open access for your data. In other words, it recognizes that a scientific contribution is more than just the resulting publication. These publications are often underlain by hours of thought, months of data collection, and weeks of analysis. In an open notebook approach, these "behind the scenes" activities are tossed out there for others to view, critique, utilize, and build upon. Note that this is different from open access, which is usually taken to cover only the final publication.

While at ScienceOnline2010 back in January, I sat in on a stimulating session about open notebook science. Our presenters (Jean-Claude Bradley, Steve Koch, and Cameron Neylon - all with blogs that are well worth checking out) shared their own experience with open notebook science, and the entire group discussed the ups, downs, plusses, minuses, and issues associated with the concept. It got me thinking - could I make at least some of my research open notebook? In this post, I want to explore the issue briefly, and solicit your feedback.

Why we need open notebook science
Good science is about reproducibility. There's no way around this. In a historical science like paleontology, "reproducibility" might involve remeasuring a specimen, retaking a photograph, or rescanning a bone. Some stuff we just can't reproduce. Once a bone is out of the ground, you'll never be able to retake the same precise stratigraphic or taphonomic data. Some data are ridiculously difficult to reproduce - not everyone can afford to fly to every country to remeasure some limb bones, or get the permission to rescan a specimen. Is it really necessary to have to reinvent the wheel?

And let's consider the long term. Whether we like it or not, we're all going to die someday. We can't take our data with us - why should they be locked up in some archive, or tossed out by whoever has to clean out the filing cabinets? Why don't we treat our data with the care that we show our specimens?

Objections (and solutions) to Open Notebook Science
In my own thought on the subject, I've wrestled with a number of issues relevant to open notebook paleontology. Many of these were covered in the ScienceOnline session, and I would refer anyone who is really interested to check out the YouTube videos when they get posted. In brief, objections include:

-Time and money. It takes time to digitize notes and put them into a form usable by others, and long-term data repositories cost money. This is a valid concern - particularly if you have years of undigitized data. When it comes to my museum research, my past three or four years of notes are almost entirely digital, though. And, the issue of a repository is a serious problem. Beyond journals' supplementary information, there is no permanent system for our field.
-Embarrassing errors. When taking notes, our interpretations of specimens change. Sometimes we make a mistake. Do we want to broadcast that to the world? Worse yet, what if someone else uses our mistake? This too can be a genuine concern - but I don't think it's an excuse for locking up raw data. A prominent caveat would probably be sufficient.
-Being scooped. Again, this is a legitimate concern that becomes irrelevant after publication. If you are worried about being scooped, just don't post in-progress data prior to publications. Or, consider the fact that having a time-stamped observation out there on the Internet is pretty unambiguous evidence of priority.
-Being scooped (2). I've heard multiple times (and used to subscribe to this philosophy myself) that one shouldn't release data until every single possible piece of information or side project is leached out of it. Wrong. Simply wrong. If your data are used to create a published summary table, graph, or even other types of figure, they need to be available. This doesn't mean you necessarily should release all of the "extra" data - but at the bare minimum, an interested individual should be able to see the information directly related to your methods, results, and conclusions. And the whole enchilada should get out there at some point.
-Locality data. Another common objection is that we should release precise locality data for sites, to avoid poaching. I agree with this 100 percent. But, there are still tons of data that could be distributed.
-Image rights. Have you ever read the agreement that museums make us sign in order to take pictures? Sadly, most of us don't own the photos that we take of specimens. I would love, love, love to have a Flickr stream of every specimen photo I've ever shot, but it just ain't happening yet. It is understandable that museums don't want someone profiting off of a giant coffee table book of fossil photos - but I'll be the first to admit that 99.9 percent of my photos aren't commercially saleable. Could anyone conceivably profit off of 20 closeup photos of a fragmented ceratopsian jugal bone? And, don't forget that a significant number of specimens in American museums are property of the American people (situations may vary elsewhere). A museum is seriously forgetting one of its reasons for existence if the institution actively hampers scientific progress by not allowing non-commercial distribution of specimen photographs. I don't have a good solution.

Thoughts? Comments? Is this ever the sort of thing that paleontologists will buy into?

6 comments:

220mya said...

"Beyond journals' supplementary information, there is no permanent system for our field."

Well, it remains to be seen how permanent even these archives are. But I'm not worried - as too many people have downloaded copies, so they're backed-up somewhere.

The main reason I highlighted this statement is that I think a few NSF funded projects would beg to differ, such as Digimorph, MorphoBank, and the Paleobiology Database. These aren't perfect for open notebook science, but they do serve as permanent repositories for data that may not make it into the paper.

Andy said...

I agree that all of those are wonderful projects. . .but permanence is a relative thing nowadays. NSF funding is great for a few years, but what happens in two or three decades when priorities shift or the main people behind the efforts retire? The problem isn't unique to paleontology, but it will require more than just a string of NSF grants. I'd really like to see SVP and PS get behind a data archive where anyone can submit any data, but realistically this won't happen in the current financial environment. Long-term deposition of data requires long-term financial support from the stakeholders.

220mya said...

I agree that we don't know the long-term permanence of these databases, but the institutions have made long-term commitments towards them. Web-hosting is relatively cheap these days.

But what real guarantee of permanence will there be for a society-backed data archive. I'm in total favor of it, but I don't see how there is any better guarantee for this type of thing over the databases I just mentioned. Societies often become strapped for cash and could jettison the costs for maintaining a data archive to easily save a chunk of change.

Andy said...

Data curation is a neglected part of this cost, too. . .it's more than just web hosting. Software gets outdated, servers crash, data formats change. . .Which is, as you state, part of where institutions can step in.

I guess the main point here is that becoming dependent on any single funding source--whether it's NSF, an institution, or whatever--is not sustainable in the long-run.

Jean-Claude Bradley said...

Andy - the more I follow what you are doing the more impressed I am! It think that the publication of a book as we discussed could be a solution that would answer many of the criticisms out there about archiving challenges. We are eager to help with providing images for the book - we just need your decision about whether to do it for each dinosaur or dinosaur family.

Bill Hooker said...

"Could anyone conceivably profit off of 20 closeup photos of a fragmented ceratopsian jugal bone?"

Rule 34.