Thursday, April 17, 2008

Data and the Open Source Paleontologist

Paleontological research generates data, and lots of it--photographs, measurements, CT scans, character matrices, etc. Data are the cornerstone of most good papers, yet for reasons of space and journal style, often never make it into print (or are relegated to that zone of "online supplementary information"). This is a Catch-22, because anyone hoping to evaluate, reproduce, or build upon your work needs these data.

With the growth of digital media and the internet, things are beginning to change. It's now much easier for paleontologists (and other scientists) to make available primary data - if they choose to do so. This post surveys a few on-line data repositories that are out there, and looks to the future. I'm going to focus on those that are most relevant to paleo types (sorry, no GenBank).

MorphoBank is an on-line data editor and repository for cladistic data matrices. Registration is required to start your own file, after which you can upload images and data matrices. The image upload is particularly nice, because it allows you to link a character coding in a taxon to an image of a particular specimen. This means that someone trying to figure out your character states actually has a prayer of understanding what is meant by "mastoid process elongate (0) or fungiform (1)." The only real downside is that, at present, there doesn't seem to be support for uploading large CT datasets.

This is a relatively new site, intended to archive the basic data underlying publications in evolutionary biology. A number of partner journals have signed on (e.g., Evolution, Systematic Biology, etc.), but unfortunately no paleo journals are there yet. One set of paleo data is available on Dryad, related to the Xenoposeidon type specimen. Kudos to Mike Taylor and Darren Naish for that! Data that could be archived here include photos, data matrices, measurements, and other media. Because the site is in such an early stage, the amount of available data and the search functions are relatively limited currently.
This is another relatively recent website for which I have high hopes. The site focuses on finite element modeling in vertebrate biology, with background information and material properties databases. Of even greater interest is an area where published FE models can be downloaded for others to try out. It would be really, really nice if more researchers went ahead and put their models out there!

This is one of the earliest data archives out there, focusing on CT scan data. Interested users can download movies of 3D reconstructions or slice sequences, download surface models (usually STL format), or read more about the scans. Unfortunately for most specimens, there is no way to download the actual data - so if you want to analyze some part of a specimen, you're out of luck (unless you contact the Digimorph folks directly and have them mail you a DVD). I had high hopes for the UTCT Data Archive, which did post TIFF and JPG stacks of images. But, this effort seems to have lost its wind, and very few datasets have actually been posted. Regardless, DigiMorph has done an admirable job of getting at least the basic CT data out there for a number of publications.

This is another new website, appearing in just the last few months. The basic goal is to make available 3D reconstructions generated from serial section data (whether CT or "old-fashioned" thin sections), in an environment where you can rotate and examine the specimens. Because it's in early stages, content is mostly limited to frog specimens (but how cool they are!). All files are in OBJ format, for which a Windows and Mac viewer is provided (I had no problem getting it to run in Wine, once I turned off virtual desktop). For objects with multiple parts (for instance, a frog head with bone and brain segmented separately), you can change colors on certain pieces or make them transparent. It's a nifty little toy for viewing morphology in 3D. The only downside is that the software features are pretty limited (turn part on or off, change color, change transparency), and you can't take measurements of any sort. Also, the raw data from which the reconstructions were generated aren't available. But, it's another great way to get 3D morphological information out there!

Paleobiology Database
This database brings together faunal, floral, and stratigraphic data from a variety of published and some unpublished sources. It's a fantastic resource for looking at patterns of distribution, extinction, and diversification. Detailed morphological data (beyond body size or tooth measurements) and images are pretty much absent, because they are beyond the primary scope of the database.

Coming Next. . .
As you can see, a number of resources (and these are just a few highlights) are available already. But, a casual user will notice that paleontological data are pretty scarce on many sites, and data capable of further analysis are even more scarce. In the next post, I'll examine the reasons behind this, and what we can do about it.

1 comment:

Robert Huber said...

Nice list! If you plan to continue here are some suggestions:

Another interesting data source for 'real' primary data (mostly micropaleontology), such as measurements, abundance counts etc. can be found at PANGAEA:

Some taxonomic and stratigraphic data can be found at our site:

Also stratigraphic data, samples etc can be found at

And there is (was?) which also has some paleontological as well as stratigraphic data.