Wednesday, September 7, 2011

The OSP on Twitter

For the past few months, I have begun to switch much of my regular on-line communication to Twitter. Like this blog, my Twitter feed (@andyfarke) covers open access issues, recent paleontological discoveries, and the like. I am a bit of a late adopter, but have to say that I'm generally finding it quite useful. If you're not a Twitterhead, you can read the most recent posts in the blog sidebar.

Upcoming post: A survey of open access policies, OA fees, data availability, and the like for many major paleontological journals.

Friday, September 2, 2011

How do you read the literature? Thoughts on academic maturation

How much should you trust the scientific literature? Reflecting on my own academic maturation, as well as observing on-line discussions of dinosaur paleontology for over 15 years (yikes, I'm getting old!), I have concluded that most of us pass through three stages: 1) Credulity; 2) Cynicism; and 3) Maturity.

Credulity
This is inevitably one's first stop on the journey through the scientific literature: accepting everything that's published at face value. Credulity is also paired with the assumption that the most recent publication must be the most conclusive. For instance, let's say Dr. X described a new species in 2001. Dr. Y published a new paper in 2010, saying that the new species is invalid. Dr. Y must be correct, because she had the last word, right?

Another symptom of this stage is fanboy(girl)-ism. Anything published by Dr. Glamour is the bee's knees (it's widely featured in the news media, so it must be true)! Wow, Dr. Glamour published a new theory on the dinosaur extinction - it will revolutionize the science! Any nay-sayers are just jealous, or afraid of change.

I hit this stage during high school and college.

Cynicism
Suddenly, everything comes crashing down. You talk to another paleontologist, who tells you that Dr. Glamour's work isn't actually that highly regarded. Maybe he has a reputation for massaging his data just a little too much, or conveniently omits contradictory evidence in his papers. Then you find out that Dr. Z has just published a paper saying that Dr. X was actually correct in the first place, and Dr. Y's synonymization was a little too hasty. Your obvious conclusion: the scientific literature is untrustworthy. Everything ever written is a steaming pile of unreliable ramblings.

Most people don't go through a full-blown case of cynicism, of course. Usually we just get an incomplete case. Everything written by Dr. Glamour (but only some of the stuff by Dr. Y) is untrustworthy, etc. A related syndrome focuses on the methodology; a paper is considered horrible because it used or didn't use a particular technique.

I hit this stage between the end of my undergrad and the early to middle parts of my graduate career.

Maturity
Most of us reach this stage only after a lengthy amount of time in the field (or the end of our graduate student career). Our BS detectors are honed to an appropriate level, and we accept that many of the papers out there aren't half-bad, and a minor mistake or two isn't enough to relegate research to the dustbin.

For my part, I still occasionally waver between cynicism and maturity; I might cast an exceptionally suspicious eye on research coming from certain researchers or using certain techniques (even if it's not necessarily warranted). Maybe I even have a little credulity at first, if it's a technique or area of science I'm not yet completely familiar with. At the same time, having been around the block a few times as a scientist, I am a little more understanding when it comes to the perceived shortcomings of a paper. As long as the basic science is still good, live and let live. A paper can have a fantastic morphological description, but a pretty weak discussion. With a little practice reading the literature, it's becoming easier and easier to pick up on the high and low points of a publication.

Summing it up
We all relate to the academic literature in different ways, depending on our life experience, scientific goals, and "academic maturity." It's up to us - with the help of trusted friends and colleagues - to continually work to improve our own approaches.

Tuesday, June 28, 2011

How to Inspire a Future Paleontologist

I was sorting through some files today, and found this. Back when I was 10 years old, I knew I wanted to devote my life to paleontology, and paleontology research would be even better. So, I started writing letters to researchers I had read about in books and magazines. Some didn't respond (everyone is busy, so I can't fault them too much), and some sent really nice replies. It's those replies that propelled me into a serious career as a paleontologist. Thank you, to those who wrote back.
Little did I know that I would be visiting those collections as a researcher, only 10 years later

Friday, April 8, 2011

Life After Death At Yellowstone: An Interview with Josh Miller

ResearchBlogging.orgIn my last post, I introduced a ground-breaking study recently published in PLoS ONE, that shows how we can infer long-term trends in animal populations just from their bones. This work has big implications for ecology, conservation, and public policy, and is also a really neat piece of science. For this post, I talked to the author of the study, Josh Miller, about his work and some of the tidbits that didn't make it into the paper.

Yellowstone NP gets a lot of visitors, and you surely must have had some interactions with them during your fieldwork. How did they react to what you were doing?
JM: I work in areas that are generally well off trail and in places most Yellowstone visitors just don't see. Over the years, there are have been very few times when tourists actually ever saw my teams conducting our bone work. Most of the time, conversation with the public occur in the evenings back at camp. We generally use the public campgrounds for our homebases and my research will often come up in conversation with tourists. When folks learn what my teams and I are up to, they are always very interested and ask lots of questions. Our National Parks are an important resource, and I think people like to be reminded of their biological and scientific value. At the same time, I think it gives folks a way of looking at Yellowstone in a new and exciting way. I know lots of people who talk to us one day and keep an eye out for bones the next.

Miller studying bone survey data sheets on Northern Range, Yellowstone National Park. Photo by Scott Rose.

You looked at hundreds of bones during your survey. Was there any particular specimen that stuck out in your mind? What about it was interesting?
JM: I looked at over 20,000 bones during my work in Yellowstone. And you are right, there are a few that really stand out. Some of the most memorable bones are those of animals with severe bone maladies. In some individuals we found severe arthritis or broken bones that didn’t heal properly. Other memorable bones include rare and unusual species. One of the most exciting finds was the skull of a mountain lion. We just stumbled upon on it one afternoon walking from one transect to another. This beautiful rounded huge cat skull just lying in the grass staring up at us –a rare and amazing site.

This paper focused on bones from large animals, but surely there are a lot of small animal bones out there too - rodents, bats, rabbits, etc. Do you think they would show a similar correlation over time between abundance in life and death? Or are the taphonomic effects too different between large and small animals to expect the same pattern?
JM: Stay tuned! I kept careful attention to the bones of the small mammals we found. My bone survey teams were amazingly good at finding bones of all shapes and sizes (from bison skulls to limb bones of squirrels). One of the challenges, unfortunately, is the lack of high-quality data on the living populations in Yellowstone. One thing I'll say at the moment, however, is that the record of small-bones is surprisingly rich and diverse on the Yellowstone landscapes.

I see that you used the open source stats program R to do your data analysis. Was this something you picked up just for your dissertation work? Why did you choose R over some of the other commercial packages that are out there?
JM: I was introduced to R during the early days of my graduate work. R is a very powerful statistics language, in part, because of the large community of scientists and academics that use R and contribute to its ever-expanding utility. Another reason I use R is that I can completely control all aspects of the analysis. In canned programs, much of the analysis sits under a black box and uncovering exactly how the data were analyzed can be very difficult. But most of all, R just fits how I do science.

Thank you for your time, Josh!

Citation
Miller, J. (2011). Ghosts of Yellowstone: Multi-Decadal Histories of Wildlife Populations Captured by Bones on a Modern Landscape PLoS ONE, 6 (3) DOI: 10.1371/journal.pone.0018057

Note: I'm an academic editor at PLoS ONE, but had no role in the handling of this paper.

Sunday, April 3, 2011

Life After Death at Yellowstone

ResearchBlogging.orgTaphonomy - the study of what happens to an organism after it dies - is integral to reconstructing the past. Perhaps the most important lessons come in inferring ecological interactions. Did that group of animals live and die together, or were they jumbled long after death? Were all of those shark teeth with the plesiosaur bones from a feeding frenzy, or just a fluke of currents? How closely does a set of fossils represent the relative abundance of the different species during their lifetime? Such examples are numerous, and thus we commonly think of taphonomy as a study in deep time. This is certainly true, but also certainly incomplete. In fact, some of the most ground-breaking taphonomic work has been done in contemporary ecosystems. Kay Behrensmeyer, for instance, has spent decades studying bone accumulations in Kenya, and a 1927 work by Johannes Weigelt (complete with photos of dead cattle) is still considered a classic.

A new study by paleontologist and taphonomist Josh Miller, just published in PLoS ONE, shows some of the great insights that can arise from looking at taphonomy in modern settings. Josh and his field assistants trekked through Yellowstone National Park (one of the western USA's oldest and best-known parks), cataloging the identity and physical condition of every animal bone sitting out on the surface (an elk skeleton from the project is shown at right; photo courtesy of and copyright Josh Miller). Using these data, Josh found that you can actually infer the major ups and downs of animal populations from their old bones. This is quite exciting, not just from a gee-whiz factor, but because it may be possible to infer population trends for areas where historical surveys are absent or spotty. Such data are important not only for ecologists, but for informed public policy. It sounds magical, so how was the study done?

Based on other studies (in combination with radiometric dating), it's known that bones in excellent condition usually came from animals that died only recently, whereas bones in crummy condition are from animals that died longer ago. By using the condition of the bones as a proxy for time since death, Josh estimated how long the various bones of various animals had been around. Then, based on the bone ages, he estimated the relative population of each type of animal a given number of years ago. We have very good wildlife census data for Yellowstone, and it turns out that estimates from the bones match the "real" values quite nicely. Boom years for animals (such as elk) mean lots of bones going into the system, bust years mean few bones, and these trends shows up in bone surveys.

You can read all about it at PLoS ONE, or here, here, and here. I recently talked to Josh to get a few behind-the-scenes tidbits. Stay tuned for the interview later today! [update: now posted here]

Citation
Miller, J. (2011). Ghosts of Yellowstone: Multi-Decadal Histories of Wildlife Populations Captured by Bones on a Modern Landscape PLoS ONE, 6 (3) DOI: 10.1371/journal.pone.0018057

Note: I'm an academic editor at PLoS ONE, but had no role in the handling of this paper.

Monday, March 28, 2011

Building Momentum for Open Data in Paleontology

Thanks to a variety of "real world" concerns and deadlines, I've been a little sparse on the blog for the past few weeks. But, that doesn't mean that important things haven't been happening elsewhere in the realm of digital paleontology. If you haven't already, take a look at and consider adding your signature to "An Open Letter in Support of Palaeontological Digital Data Archiving." Kudos to the folks who got the ball rolling on this effort! As paleontology becomes more data driven, and as more of those data are digitized, we need to get our act together as a community now.

Tuesday, February 22, 2011

Data Archival and the JVP

It finally happened - Journal of Vertebrate Paleontology has taken a few more tentative steps into the 21st century! Both in an editorial in the most recent issue (note: full text is paywalled), as well as in an updated version of the instructions to authors, the journal has announced a formal data archiving policy.

What does this mean?
Quoting from the JVP's new instructions to authors, "all data files needed to replicate phylogenetic or statistical analyses published in the journal should be made accessible via the JVP website as online supplementary material." In other words, if you analyzed numbers of any sort, you need to show your source data. This includes cladistic matrices (publication of these is already standard practice) as well as measurements or other data used in statistical analyses. Additional kinds of data - for instance, extraneous measurements unrelated to the study, raw field notes, or raw CT scans - are not included in this proposal (even if it's good scientific practice to make sure this information is available for posterity).

Why is this a good thing?
  • Data archival allows others to build upon previous work more easily. For instance, let's say I publish a statistical analysis of molar size in the early horses Mesohippus bairdi and Mesohippus westoni. Maybe there is another worker out there who wants to look at variation in some other Mesohippus species. If my dataset is available, it is much easier for another research to quickly advance beyond my work (assuming they trust my data, of course - see below).
  • Data archival allows new and unexpected uses for data (thus increasing citations). My p-values and arithmetic means of Mesohippus teeth are interesting, but not that useful outside the context of my paper. If I publish the raw data, though, other individuals can use these data (and cite my paper) in all other sorts of contexts. Maybe someone wants to throw the data in her study of horse tooth evolution (hey, it's another citation!). Maybe someone else is interested in Oligocene herbivore ecology as evidenced in molar properties (and there's another citation!).
  • Data archival ensures transparency. Everyone makes statistical or analytical mistakes. Unfortunately, these mistakes may render the results of a paper highly suspect at best, or worthless at worst. With the availability of raw data, it is much easier for someone to reproduce a study or correct misuse of statistics. (as a case study from my own work, I discovered that nearly all paleopathology studies in the literature were using incorrect statistical assumptions - and a reanalysis of the data forced some new interpretations!) Additionally, taxonomy frequently changes, meaning that previous categories applied in an analysis are hopelessly outdated. Not so, if you can go back to the author's original data, make a few corrections, and rerun the analysis!
  • Publicly funded research deserves to be public. So much of paleontology research is funded by government grants, or conducted on company time. It is not a good use of our limited resources to keep data locked up after the original study has been published. This is somewhat analogous to writing an NSF grant to collect fossils for one's personal collection. Why should data be any different?
Answers to some common objections
  • "I have other plans for the data." Some researchers want a monopoly on their data. They have this fear in the back of their head that someone is going to go out and do exactly the same next step study planned by the original researcher. I have several responses to this. First. . .really? Second, I would remind authors that it is bad science (perhaps even unethical) to publish research results that are not transparent to scrutiny. Third, I would remind authors that they are never obligated to publish all of the tangential data. If you are publishing a paper on dentary lengths in hadrosaurs, you don't have to release the data on predentary dimensions too! Finally, I would remind authors that this is just a lame excuse to put off their own follow-up research. We all know the stories of this or that researcher who has sat on a dataset for years. Science is not being helped by keeping those data secret.
  • "Interested researchers can just contact the authors." As an example of why this is a bad idea, please refer to the work of Leonard Radinsky. He published a number of wonderful morphometric studies of fossil mammals, clearly based on hundreds of measurements. But, he also passed away in 1985. Unless you have a Ouija board that actually works, it's highly unlikely that anyone will be able to exactly reproduce the results in his oft-cited "Ontogeny and phylogeny in horse evolution." Authors leave academia, pass away, or lose their data sheets all of the time. It's a pipe dream to assume that "data are available upon request." [to be fair to Radinsky, his paper did not indicate that the data were available - I just chose it as one prime example where the data are probably irrecoverable]
  • "It just encourages lazy research by data miners, because you should never trust anyone else's research data." There is a grain of truth in this - inter-observer error may creep into measurements, and maybe a certain author likes to measure plaster reconstructions. But once again, this is just a lame excuse for lazy research by the person who is objecting to data transparency! After all, if you can't trust the data, you can't trust the paper, so what's the point in publishing? It's a slippery slope. The benefits far outweigh the drawbacks.
  • "It's just more work for the authors." This too falls into the "lame excuse" category. If you've already gone to the trouble to put together an Excel spreadsheet for your statistical analysis, you can spend an extra 10 seconds transferring those data to the manuscript submission system. If it takes you longer than that, you may want to reconsider your data management practices.
Recommendations for JVP
I have just a handful of recommendations for the editors at JVP, based on my own experience as both a data user and a data generator. Some of these suggestions may already be incorporated, and others may be planned. Others may be impractical at this time. Either way, I think it is helpful to consider the following:
  • Make sure the data files are in a usable format. Historically, supplemental information at JVP has been launched as PDF files (with some NEXUS files). This is great for casual reading, but horrible for analysis. Just try copying 3,382 measurements from a PDF table into an Excel spreadsheet, and you'll see what I mean. This does not mean you need to choose a single format - why not have the data in PDF, Excel, and raw text? Multiple formats ensure maximum usability of the data across multiple platforms (as well as flexibility in the face of future software upgrades).
  • Consider a data embargo for reluctant authors. Many journals allow a six month or (maximum) one year embargo on supplemental data, to allow authors the chance to finish up any outside projects. Although I philosophically disagree with this option, I see its utility. And, it is an appropriate compromise between protecting author rights and protecting scientific integrity.
  • Consider partnering with DRYAD or a similar data repository. A number of other evolutionary societies are doing this - why shouldn't SVP be a part of this?
  • Solicit society input. The members of SVP and the authors of JVP probably have some great thoughts on what they would like to see in data archival. Why not solicit input from the community to find out what the community needs? This will only solidify ownership of the data archival efforts by paleontologists!
  • Check out a recent publication on this very topic. Michael Whitlock recently published a great review article [paywall] on best practices in data archival - many of the points mentioned above are contained there. (thanks to Randy Irmis for passing the link along)
The Last Word
All in all, I am pleased to see JVP take these steps. Congratulations to the editors of the journal, for taking this stand for good science!

More Reading
Berta, A., and Barrett, P. M. 2011. Editorial. Journal of Vertebrate Paleontology 31: 1. doi:10.1080/02724634.2011.546742 [paywall]

JVP Instructions to Authors [link to pdf]

Whitlock, M. C. 2011. Data archiving in ecology and evolution: best practices. Trends in Ecology & Evolution 26: 61-65. doi:10.1016/j.tree.2010.11.006. [paywall]