Friday, July 4, 2008

Fieldwork!

I'm off to the field this evening, and will return in a little over two weeks. Not that any readers of this blog aren't used to the occasional unexplained hiatus. . .

Cool Tools for Google Earth

Google Earth by itself can be a useful tool for the field paleontologist, as outlined in my previous post. Yet, some aspects of paleontological field mapping aren't supported "out of the box" within this software. For instance, many field localities are in the "Township and Range" format of the Public Land Survey System. That's the "NW1/4 of the SW 1/4, Sec. 21, T14N, R21E" style of plotting things - pretty much anyone who's done paleontology in the United States knows what I'm talking about (and how miserable it can be as a method of mapping and relocating things, versus a high-precision GPS coordinate!). Unfortunately, the default settings in Google Earth can't do anything to help on this. One or two handy plug-ins can save the day, though!

Township and Range Coverage
The Earth Point website has a KML file that provides township and range data for most of the western United States, and a handful of more easterly states. To access this, click here. For the field areas I've frequented, I've found the data to be quite accurate and easy to use. As you progressively zoom in, you can get right down to the section (and then click to get the full legal description). This is an extremely handy tool, and I strongly recommend it for any paleontologist utilizing Google Earth! Another must-have is the Township and Range Decoder. Enter a legal land description, and get it converted into Lat-Long format - or the reverse! So, so, so much easier than trying to fudge something on a topo map.

Topographic Map Coverage
I haven't used this feature as much, but did uncover two potentially handy tools. Map Finder allows you to quickly and easily find 24K topo maps within the U.S.A. - and download them for free as a TIFF file. The whole setup seemed to work pretty well for me. The Google Earth Blog details another cool plug-in, which is supposed to put the map right into Google Earth. I haven't tried it yet, but certainly will at the soonest opportunity (after I get back from the field!).

Conclusions
Even if it ain't open source, Google Earth has become a standard tool in my digital paleontology arsenal. It has saved me oodles of time and money, both in the field and back at the lab. If you're a field paleontologist - check it out!

Thursday, June 26, 2008

Digital Prospecting With Google Earth

For those who do paleontological prospecting in new field areas, I probably don't have to spend too much time singing the virtues of good satellite imagery. Perhaps the single best thing about it is that it can allow you to quickly evaluate where the worthwhile exposures are, and where the low-relief, grassy pastures are covering up perfectly good fossiliferous rocks. Used in concert with a geological map, this "digital prospecting" can save a lot of time and annoyance out in the field.

There are two ways to go about this. . .one is by old fashioned aerial photographs and a geology map. The other is by using Geographical Information System (GIS) software. Because I've barely touched GIS since I took a course as an undergraduate (and I am lucky enough to have a skilled friend who volunteers for the occasional map-making project; one of these days I'll get around to learning GRASS), in this post I'll focus on the "lazy researcher's GIS" -- Google Earth.

Now in version 4.3, this digital globe runs smartly in Windows, Linux, and presumably the Mac OS (although I've never tried the latter). Find a prospective field area, and do a virtual fly-over to locate promising outcrop. Mark prospective points, and transfer the coordinates to your GPS unit. It's that simple!

So what are the upsides of using Google Earth?
- It's free!
- Pretty much global coverage.
- The user interface is intuitive, without a lot of annoying extras.
- High resolution (in many areas)

And the downsides?
- An internet connection is pretty much required, if you want to go someplace that isn't already in the cache. So, don't plan on being able to use it that well in the field.
- The DEMs (digital elevation models) are pretty crude in most areas, and don't necessarily show detailed topography all that well, if you want to pan around an actual landscape.
- It's tough to import GIS data, if you want to add geological data or something (although it can be done - to be addressed in the next post). Furthermore, basic GIS functions, such as intersections of layers, just can't be done easily within the program itself (as far as I know).
- Resolution varies across the maps. Sometimes the remotest areas have crisp, true-color resolution - and the field area just around the corner is a fuzzy, false color mess.
- Township and range aren't supported by default (but see an upcoming post for a solution!).

Is Google Earth useful for paleontologists?
If you're in the early stages of a field project, or are trying to evaluate outcrop potential in a far-away locale, Google Earth is perfect. But, be aware that this program is not a complete substitute for a good GIS package (and a person to run it!) for many tasks, and satellite coverage limitations may cause problems in some regions. For the most part, though, Google Earth is a quick, cheap tool for planning out a field season (and one that I use quite frequently!).

Coming up. . .Handy tips and tricks to make the paleontologist's use of Google Earth easier.

Wednesday, June 18, 2008

JVP - Zotero Style!

In a previous post, I lamented how difficult it is to create custom style files in the program Zotero. As a brief recap, this reference manager does a fantastic job of downloading references from the web and creating citations and bibliographies in Word or OpenOffice.org Writer. But, if you want to venture beyond the default style library (which is steadily expanding with a number of add-ons), you have a little style-file writing to do.

Creating a Zotero style file is not for the faint-of-heart. These are written in an XML scripting language called "CSL" (for "Citation Style Language"). As I was writing my dissertation, I needed to format one of my chapters with the format for Journal of Vertebrate Paleontology. Of course, Zotero didn't have a style available for this, and I wasn't in any mood to format everything by hand. So, I decided to invest an afternoon in learning enough CSL to be dangerous.

In actuality, I didn't learn much CSL at all. I took an existing style (American Psychological Association) and retrofit it for JVP. The APA style was close enough to start, and I had to tweak author orders, abbreviations, etc. The result is uploaded here. Caveat emptor!!! As you will find, my work is nowhere near perfect - I got the citation styles down pretty well for journals and edited volumes, and beyond that things might be a little wonky. Anyone who wishes to do their own tweaking is welcome to do so - and I would appreciate it, in fact!

Because I'm not a real code-head, I had to rely on a few "crutches" to limp through modifying the style file. First, a style preview tool, written by Dan Stillman, was invaluable. Follow the directions here in order to use that. Second, I relied on the CSL schema, which was the final authority on what various parameters meant. After that, it was tweak, test, re-tweak, and re-test, until I got something I could live with. It took a few hours of time, but was well worth it.

Good luck! Anyone else have their own style files they've written?

Wednesday, May 28, 2008

Aetosaurs and the Open Access Dissertation

It's done. The Society of Vertebrate Paleontology has weighed in on allegations of plagiarism and claim-jumping centered on those spiny aetosaurs. The end verdict is "not guilty" on one charge and "inconclusive" on the other (but please read it for yourself), and I won't comment here where others have already (summarized here). I do wish to discuss, however, one point from the official SVP document that has not been addressed elsewhere.

"Sixth, the expectation that theses and dissertations that have not been republished in widely read periodicals will be read by most workers or manuscript reviewers is unlikely to be realized. If students publish material in theses or dissertations that they intend to republish in other venues, they should be wary about circulating their work until publication is well under way, if they are concerned that their work is topical enough that other workers might want to draw immediately from their findings." [p. 3 of SVP executive committee statement; italics are my own]

My main concern here is with the statement that it's unlikely that dissertations and theses will be read by other workers. This may have been true 20 years ago - today, this is changing very rapidly. You can find dissertations on Google Scholar, Dissertation Express, Theses Canada Portal, and DATRIX, just to name a few (although it's admittedly easier on some of these options than others). UMI now offers the option to distribute your dissertation under an open access scheme (with options for an embargo, for those concerned about such things). I have chosen to release my dissertation on open access (and will update here when my dissertation is readily available). Searching for dissertations and theses on a research topic should be part of any basic literature search (although whether or not this would have avoided the problems leading to the ethics investigation is certainly debatable).

The responsibility runs both ways. Students have an obligation to ensure that their thesis or dissertation is available and accessible via the information superhighway. This means making it available through relevant databases (and UMI's dissertations and theses have been crawled by search engines since 2006, apparently, with more complete access since 2007), and in most cases could [?should?] probably entail open access (with or without embargo). All paleontologists have a responsibility, too - to keep on top of the literature and other researchers' work. Even without a search engine, it wouldn't take a genius to figure out that a student who has had one or more conference presentations on thesis-y sounding research may have a thesis in his or her name on that topic. And with a search engine, there really is less of an excuse now. Sure, there will still be dissertations that slip through the cracks - but is this any different from not finding a peer-reviewed article just because it was in a journal outside your normal reading list? So--make those dissertations and theses available, and spend a few minutes on Google!

[This discussion is not intended to comment on the correctness or incorrectness of the SVP's general ruling about the charges. As Kevin Padian said, "There’s something for everyone to like – and dislike – about the statement. . ." I'm just calling attention to an area that fits in nicely with the mission of this blog.]

Friday, May 23, 2008

The Open Source Dissertation

My university has done a wonderful thing, in accepting only PDF files for deposition of a thesis or dissertation with the graduate school. Gone are the days of printing 5 copies of a 300-page document on acid-free paper that costs 20 cents a page (and then finding out that one of the margins is 0.1" too wide, so please correct and resubmit before the deadline in three hours). The transition is a wonderful step forward, and also means that it is much easier to distribute the dissertation.

As a proponent of free and open source software (having made the big switch about a year ago), I wanted to do as much as I could within the realms of that universe. This posting summarizes the software I used, with the hope of inspiring others to follow a similar path (whether in whole or in part).

Data visualization: I processed all of my CT scan data in 3D Slicer. For segmenting structures, generating surfaces, and measuring volumes, look no further! [I still need to do a more complete post on this one.] Additional analysis was done in ImageJ.

Data analysis: Initial data entry in OpenOffice.org's Calc, with analysis primarily in R and an occasional venture to PAST.

Figures: Raster image editing was done in the GIMP, and line drawings or composite figures were assembled in Inkscape.

Word processing: All done in OpenOffice.org's Writer. The PDF output function was very nice for sending drafts to committee members and advisors, and the software's Microsoft Office compatibility is such that I could also send and receive marked-up documents (in .doc format) pretty easily. For the final document, I exported each chapter in PDF format.

Referencing: All of my references were sorted, organized, and rendered as bibliographies with Zotero. Along the way, I created custom styles for Journal of Vertebrate Paleontology and Zoological Journal of the Linnean Society. More on this process in another post.

Document assembly: To assemble all of my dissertation's chapters into a single PDF document, I used Ghostscript. The output was quite pleasing, and easily accomplished through the command line in a matter of seconds.

Presentations: For my oral dissertation defense, I created my presentation using OpenOffice.org's Impress.

Tuesday, May 20, 2008

Data and the Open Source Paleontologist 2

The previous post on this topic outlined some resources for the posting and dissemination of primary paleontological data on the internet. In this post, I'll take a look at why more people don't do so, and what we can do about it.

Why Aren't More Data Posted?
Myriad factors contribute to this issue - some of them are genuine roadblocks, and others are simply opportunities to change attitudes and common practice.

Laziness
Sometimes it's a lot of work to get your data posted online. You may have to reformat everything, or re-enter the data, or engage in digital gymnastics that take longer than the research itself took. In other cases, it's just one more thing to do on an already crowded research schedule. How to counter this? Perhaps my best suggestion is better awareness of the importance of these data being available - if people demand it, it will be viewed as an item of high importance, just as needed as the peer-reviewed publication itself. Some repositories, such as MorphoBank, also allow you to enter the data as you collect them, rather than doing the whole thing at the very end. This might also be a good talisman against the rush to upload a whole bunch of data files at the end of a program.

Museum Policies
In the case of posting photographs of specimens, many museums have policies that are unclear or seem to prohibit general dissemination of photographs. These policies are in place for good reason in some cases - this discourages commercial concerns from profiting off of images of specimens without a museum's knowledge. Although it's my understanding that most museums don't have a problem with posting things into scientific databases, it's probably best to check. Does anyone out there have experience with this issue?

Priority of Publication
If your data are online, this means other people have access. This can lead to productive collaborations - or, it could potentially lead to being "scooped." Here, the safest thing is to delay uploading of data until after the major resulting publication. The important thing is to get those data out there! And, if you use data from an online database, you have a responsibility to credit the person who did the primary work. Anything less just isn't very nice. There are always going to be people who are stingy with sharing already-published data, even when it isn't warranted (or in the case of CT scan data, even when the museum requests that a publicly-available copy be reposited with the institution!). The most important thing is to work to change attitudes and foster a culture of openness. Recent events in paleontology have perhaps made this a little more difficult, but I like to think that things will work out in the long run.

What Can Be Done?
Above, I've outlined a few solutions to some of the problems. In addition to the suggestions given above (some of which are more practical than others), I think we really need more databases. And more encouragement to use these new (and existing) databases. Gene squeezers have GenBank, but why aren't there more Paleobiology Databases out there? Advisors - make your students reposit their data online. Students - get your data out there, even if your advisors don't encourage it! And paleontologists in general - welcome to the 21st century! I hope that time and a new generation of tech-savvy paleontologists will change all of this for the better.