Tuesday, May 20, 2008

Data and the Open Source Paleontologist 2

The previous post on this topic outlined some resources for the posting and dissemination of primary paleontological data on the internet. In this post, I'll take a look at why more people don't do so, and what we can do about it.

Why Aren't More Data Posted?
Myriad factors contribute to this issue - some of them are genuine roadblocks, and others are simply opportunities to change attitudes and common practice.

Laziness
Sometimes it's a lot of work to get your data posted online. You may have to reformat everything, or re-enter the data, or engage in digital gymnastics that take longer than the research itself took. In other cases, it's just one more thing to do on an already crowded research schedule. How to counter this? Perhaps my best suggestion is better awareness of the importance of these data being available - if people demand it, it will be viewed as an item of high importance, just as needed as the peer-reviewed publication itself. Some repositories, such as MorphoBank, also allow you to enter the data as you collect them, rather than doing the whole thing at the very end. This might also be a good talisman against the rush to upload a whole bunch of data files at the end of a program.

Museum Policies
In the case of posting photographs of specimens, many museums have policies that are unclear or seem to prohibit general dissemination of photographs. These policies are in place for good reason in some cases - this discourages commercial concerns from profiting off of images of specimens without a museum's knowledge. Although it's my understanding that most museums don't have a problem with posting things into scientific databases, it's probably best to check. Does anyone out there have experience with this issue?

Priority of Publication
If your data are online, this means other people have access. This can lead to productive collaborations - or, it could potentially lead to being "scooped." Here, the safest thing is to delay uploading of data until after the major resulting publication. The important thing is to get those data out there! And, if you use data from an online database, you have a responsibility to credit the person who did the primary work. Anything less just isn't very nice. There are always going to be people who are stingy with sharing already-published data, even when it isn't warranted (or in the case of CT scan data, even when the museum requests that a publicly-available copy be reposited with the institution!). The most important thing is to work to change attitudes and foster a culture of openness. Recent events in paleontology have perhaps made this a little more difficult, but I like to think that things will work out in the long run.

What Can Be Done?
Above, I've outlined a few solutions to some of the problems. In addition to the suggestions given above (some of which are more practical than others), I think we really need more databases. And more encouragement to use these new (and existing) databases. Gene squeezers have GenBank, but why aren't there more Paleobiology Databases out there? Advisors - make your students reposit their data online. Students - get your data out there, even if your advisors don't encourage it! And paleontologists in general - welcome to the 21st century! I hope that time and a new generation of tech-savvy paleontologists will change all of this for the better.

1 comment:

Robert Huber said...

Good analysis!
Indeed everybody likes to get easy access to data but the number of researchers who give their data to data centers is still too low.
Some of our thoughts on the special situation in paleontology can be found at our blog, eg.:

http://stratigraphynet.blogspot.com/2008/02/paleontology-very-late-adaptors.html
and this (with links to further readings) might also interesting for you?http://stratigraphynet.blogspot.com/2008/04/technological-twists-on-taxonomy.html