Thursday, January 28, 2010

Where is paleontology?

Last week, many of the leading journals in evolutionary biology - including The American Naturalist, Molecular Ecology, Journal of Evolutionary Biology, Evolution, and a number of others - announced a data archiving policy. In short, this policy states that the data behind the results of a paper should be publicly archived in well-known respositories such as Data Dryad, GenBank, or TreeBASE. Do you notice anything missing in this illustrious list of publications?

Not a single one of those journals explicitly focuses on paleontology. Last time I checked, we paleontologists like to think of ourselves as evolutionary biologists. Time and time again, we lament how we're not allowed a place at "The High Table" of evolutionary thought, and how paleontology is viewed as largely irrelevant by the "people who matter." So why weren't any paleontology publications on this list? Will we see any on the list in the near future?

The article in The American Naturalist gives a good run-down of the arguments for sharing data, so I'll only briefly summarize them here:
  • It allows reproducibility of analyses.
  • It allows others to build upon your work more easily.
  • Papers that release their data may get cited more frequently.
  • The data will be lost to science otherwise.
  • It's the right thing to do.
And to counter some potential objections:
  • This would only request the release of data directly relevant to the study. Not your pages and pages of raw notes. Just that Excel spreadsheet that you already generated on your way to the analysis. Seriously. It's not a lot of extra work, if any.
  • This is not requesting the digitization and distribution of video, CT scan, or similarly large and unwieldy data (although that would be nice in the future).
  • No, it does not mandate the release of locality data, or similarly privileged information.
  • The policy does not require immediate release of the data, if there's a good reason (i.e., another pending publication) to do so. I'm not sure I entirely support this (if you're publishing the analysis, you should publish the data), but I understand it as a necessary compromise to get more individuals on board. I won't let the perfect be the enemy of the good.
Some of the most ground-breaking and high-profile work in paleontology is happening on account of large meta-analyses of data pulled together from the literature - largely thanks to efforts like the Paleobiology Database. This work has real implications for big questions facing our science and our world: Climate change. The pace of evolutionary radiations. The origins of modern biological diversity. These sorts of databases focus primarily on geographic, stratigraphic, and taxonomic data - but think how much more powerful they could be if all of the morphological data ever published were available! Or if the PBDB volunteers didn't always have to transcribe the information from a PDF file. And look at the great strides that molecular biology has made with the ready availability of sequence data on GenBank! This would not have happened with a mentality of data hoarding.

Look. Amateur hour is over. If we want to play in the big leagues, we have to start acting like a real science. Real science is reproducible. Real science is data-driven. Real science involves sharing data. Yes, I know it's hard. It's new. We haven't done things this way before. There are potential problems. Not everyone is adopting it quickly. But if we always wait five years to "see what happens," we paleontologists quite frankly don't deserve a place at the High Table. Let's be leaders, not followers.

Piwowar, H. A., R. S. Day, and D. B. Fridsma. (2007). Sharing detailed research data is associated with increased citation rate. PLoS ONE 2(3):e308, DOI: 10.1371/journal.pone.0000308.
Whitlock, M., McPeek, M., Rausher, M., Rieseberg, L., & Moore, A. (2010). Data archiving. The American Naturalist, 175 (2), 145-146 DOI: 10.1086/650340

For previous posts on data sharing in paleontology, see here and here. Want to get involved? Spread the word. Talk to your local journal editor. Let the people who count know what you think.


Mike Taylor said...

Excellent news that evolutionary biology as a discipline is getting its ship in shape -- thanks for passing it on!

We can't be responsible for our whole discipline's conspicuous absence from this initiative of course, but I would be interested to know specifically how SVP missed this boat. Anyone have any idea whether the committee even knew this was on the horizon? JVP's absence is keenly felt.

Andy said...

The official announcement was a surprise, although Data Dryad had many of the supporting societies and journals already up on their website. Several of us have been working to raise awareness of the issue with the powers-that-be. . .here's hoping that the American Naturalist piece is an effective piece of ammunition to really get the ball rolling!

Prof. Wimsey said...

Paleobiology actually has been involved with Dryad. Starting this calendar year, our acceptance letters have included a "strong encouragement" for authors to upload data onto Dryad (complete with url linking them to the necessary pages).

Whether this becomes mandatory policy is something that the Paleo Society has not yet decided. (If I have my way, then it will be: but I'm not going to be editor of Paleobiology after this year!)

SVP is its own beast: hopefully some of the younger VPers can convince SVP to do the same.

Andy said...

@Peter - thanks for the heads-up on that! I am glad to see Paleobiology taking some step. . .any step. Here's hoping that you're able to get it mandated - my sad suspicion is that 3/4 of the authors won't deposit their data otherwise.

Ross Mounce said...

Brilliant. Thanks for making me aware of the Piwowar et al. (2007) paper. To me, it is stating the obvious that making your data (as an author) more freely available increases your chances of citation but its great to have evidence to back up this intuition.

Personally, I can't believe that (in 2010!) when I email an author politely asking for a copy of the cladistic matrix they used in their (already) published analysis, a small minority of them point blank refuse my request. Seriously. Why?

I for one can't wait for Open Science to be adopted in Palaeontology.

Andy said...

Ross - the situation you describe, in which you were refused a copy of a data matrix supporting a published cladistic analysis, is flat-out unethical. It's a tough one - as a student, there aren't many courses of action (ranging from doing nothing to contacting the journal editors) that don't harm your career in some way or another. Sad.