Wednesday, May 28, 2008

Aetosaurs and the Open Access Dissertation

It's done. The Society of Vertebrate Paleontology has weighed in on allegations of plagiarism and claim-jumping centered on those spiny aetosaurs. The end verdict is "not guilty" on one charge and "inconclusive" on the other (but please read it for yourself), and I won't comment here where others have already (summarized here). I do wish to discuss, however, one point from the official SVP document that has not been addressed elsewhere.

"Sixth, the expectation that theses and dissertations that have not been republished in widely read periodicals will be read by most workers or manuscript reviewers is unlikely to be realized. If students publish material in theses or dissertations that they intend to republish in other venues, they should be wary about circulating their work until publication is well under way, if they are concerned that their work is topical enough that other workers might want to draw immediately from their findings." [p. 3 of SVP executive committee statement; italics are my own]

My main concern here is with the statement that it's unlikely that dissertations and theses will be read by other workers. This may have been true 20 years ago - today, this is changing very rapidly. You can find dissertations on Google Scholar, Dissertation Express, Theses Canada Portal, and DATRIX, just to name a few (although it's admittedly easier on some of these options than others). UMI now offers the option to distribute your dissertation under an open access scheme (with options for an embargo, for those concerned about such things). I have chosen to release my dissertation on open access (and will update here when my dissertation is readily available). Searching for dissertations and theses on a research topic should be part of any basic literature search (although whether or not this would have avoided the problems leading to the ethics investigation is certainly debatable).

The responsibility runs both ways. Students have an obligation to ensure that their thesis or dissertation is available and accessible via the information superhighway. This means making it available through relevant databases (and UMI's dissertations and theses have been crawled by search engines since 2006, apparently, with more complete access since 2007), and in most cases could [?should?] probably entail open access (with or without embargo). All paleontologists have a responsibility, too - to keep on top of the literature and other researchers' work. Even without a search engine, it wouldn't take a genius to figure out that a student who has had one or more conference presentations on thesis-y sounding research may have a thesis in his or her name on that topic. And with a search engine, there really is less of an excuse now. Sure, there will still be dissertations that slip through the cracks - but is this any different from not finding a peer-reviewed article just because it was in a journal outside your normal reading list? So--make those dissertations and theses available, and spend a few minutes on Google!

[This discussion is not intended to comment on the correctness or incorrectness of the SVP's general ruling about the charges. As Kevin Padian said, "There’s something for everyone to like – and dislike – about the statement. . ." I'm just calling attention to an area that fits in nicely with the mission of this blog.]

Friday, May 23, 2008

The Open Source Dissertation

My university has done a wonderful thing, in accepting only PDF files for deposition of a thesis or dissertation with the graduate school. Gone are the days of printing 5 copies of a 300-page document on acid-free paper that costs 20 cents a page (and then finding out that one of the margins is 0.1" too wide, so please correct and resubmit before the deadline in three hours). The transition is a wonderful step forward, and also means that it is much easier to distribute the dissertation.

As a proponent of free and open source software (having made the big switch about a year ago), I wanted to do as much as I could within the realms of that universe. This posting summarizes the software I used, with the hope of inspiring others to follow a similar path (whether in whole or in part).

Data visualization: I processed all of my CT scan data in 3D Slicer. For segmenting structures, generating surfaces, and measuring volumes, look no further! [I still need to do a more complete post on this one.] Additional analysis was done in ImageJ.

Data analysis: Initial data entry in OpenOffice.org's Calc, with analysis primarily in R and an occasional venture to PAST.

Figures: Raster image editing was done in the GIMP, and line drawings or composite figures were assembled in Inkscape.

Word processing: All done in OpenOffice.org's Writer. The PDF output function was very nice for sending drafts to committee members and advisors, and the software's Microsoft Office compatibility is such that I could also send and receive marked-up documents (in .doc format) pretty easily. For the final document, I exported each chapter in PDF format.

Referencing: All of my references were sorted, organized, and rendered as bibliographies with Zotero. Along the way, I created custom styles for Journal of Vertebrate Paleontology and Zoological Journal of the Linnean Society. More on this process in another post.

Document assembly: To assemble all of my dissertation's chapters into a single PDF document, I used Ghostscript. The output was quite pleasing, and easily accomplished through the command line in a matter of seconds.

Presentations: For my oral dissertation defense, I created my presentation using OpenOffice.org's Impress.

Tuesday, May 20, 2008

Data and the Open Source Paleontologist 2

The previous post on this topic outlined some resources for the posting and dissemination of primary paleontological data on the internet. In this post, I'll take a look at why more people don't do so, and what we can do about it.

Why Aren't More Data Posted?
Myriad factors contribute to this issue - some of them are genuine roadblocks, and others are simply opportunities to change attitudes and common practice.

Laziness
Sometimes it's a lot of work to get your data posted online. You may have to reformat everything, or re-enter the data, or engage in digital gymnastics that take longer than the research itself took. In other cases, it's just one more thing to do on an already crowded research schedule. How to counter this? Perhaps my best suggestion is better awareness of the importance of these data being available - if people demand it, it will be viewed as an item of high importance, just as needed as the peer-reviewed publication itself. Some repositories, such as MorphoBank, also allow you to enter the data as you collect them, rather than doing the whole thing at the very end. This might also be a good talisman against the rush to upload a whole bunch of data files at the end of a program.

Museum Policies
In the case of posting photographs of specimens, many museums have policies that are unclear or seem to prohibit general dissemination of photographs. These policies are in place for good reason in some cases - this discourages commercial concerns from profiting off of images of specimens without a museum's knowledge. Although it's my understanding that most museums don't have a problem with posting things into scientific databases, it's probably best to check. Does anyone out there have experience with this issue?

Priority of Publication
If your data are online, this means other people have access. This can lead to productive collaborations - or, it could potentially lead to being "scooped." Here, the safest thing is to delay uploading of data until after the major resulting publication. The important thing is to get those data out there! And, if you use data from an online database, you have a responsibility to credit the person who did the primary work. Anything less just isn't very nice. There are always going to be people who are stingy with sharing already-published data, even when it isn't warranted (or in the case of CT scan data, even when the museum requests that a publicly-available copy be reposited with the institution!). The most important thing is to work to change attitudes and foster a culture of openness. Recent events in paleontology have perhaps made this a little more difficult, but I like to think that things will work out in the long run.

What Can Be Done?
Above, I've outlined a few solutions to some of the problems. In addition to the suggestions given above (some of which are more practical than others), I think we really need more databases. And more encouragement to use these new (and existing) databases. Gene squeezers have GenBank, but why aren't there more Paleobiology Databases out there? Advisors - make your students reposit their data online. Students - get your data out there, even if your advisors don't encourage it! And paleontologists in general - welcome to the 21st century! I hope that time and a new generation of tech-savvy paleontologists will change all of this for the better.

Monday, May 19, 2008

Back From Hiatus - With a Ph.D.

After an unannounced hiatus, I'm now ready to resume posting! Many of you probably knew I have been dealing with a dissertation over the last month - submitting a final copy to my committee, preparing for an oral defense, and then working on revisions. Today, at 11:45 a.m., I received final approval of my document from the graduate school. Which means - I'm done! Thursday is the hooding ceremony.

So I've a few posts that have been on the back burner, that will be up here soon, hopefully. I'm first going to finish the "Data and the Open Source Paleontologist" series, and then move on to some of my experiences with crafting a dissertation primarily using open source software.