Monday, December 24, 2007

Reference Managers on Parade - The Conclusion

Other Options
My wife, a physics graduate student, is constantly puzzled by the fact that the paleontology community hasn't adopted LaTeX. Odds are that most of you reading this (someone does read this blog, right?) have never heard of LaTeX. It's somewhat akin to HTML, in that it's essentially a markup language for scientists. Thus, it's a little scary for those who have never ventured beyond the confines of their word processor. But. . .it's incredibly powerful. There are a whole host of bibliography management tools for LaTeX - JabRef is one example. The main reason LaTeX hasn't entered my sphere is because I collaborate with a lot of people who don't use it - so, there isn't a lot of incentive for me to learn it. Maybe one day, though. . .

Closing Thoughts
There isn't really a "perfect" open source reference manager out there yet. All of the packages have significant strengths, but also sometimes significant weaknesses. I think that the next year will experience major gains in open source reference managers, and hopefully by this time next year there will be several extremely good options. For the time being, I recommend experimenting to find one that works for you. Zotero is my current reference manager of choice - its integration with Firefox and capability to easily dump formatted references into a word processor move it to the top of the pack.

Saturday, December 22, 2007

Reference Managers on Parade - Part IV

OpenOffice.org Bibliographic (OOoBib)
First, a disclaimer: This software doesn't exist yet. But, I've included it here just to generate a little excitement for what could be a great addition to OpenOffice.org.
Pros: According to the project website, this will greatly augment the bibliographic features for OOo Writer. We'll have to see what this entails in the long run.
Cons: Microsoft Office users are probably out of luck here. Switch to OpenOffice.org if you want to give it a spin! And the biggest current downside: this program isn't functional yet.
The Bottom Line: Look for this to appear sometime in the second half of 2008.

Thursday, December 20, 2007

Reference Managers on Parade - Part III

Zotero
Zotero is a very nifty little plug-in for Firefox that has very, very quickly become my reference manager of choice. It is in quite active development, and has a very promising future ahead of it, I think.
Pros: Web integration is insanely good! So you find the webpage for the latest article in Nature. A little icon appears in the browser's navigation bar. You click the icon, and all of the article's information - authors, abstract, direct URL, etc. - is dumped into your database. Zotero also accepts the standard "reference export" option, for sites that don't yet support direct export. Also, Zotero has two very functional plug-ins that allow users to "cite while you write" in Microsoft Word and OpenOffice.org Writer, and it functions in any operating system that supports Firefox. You don't have to be connected to the internet to use Zotero, either (because all files are stored locally).
Cons: The two biggest downsides that I've run into are 1) it is insanely difficult and not at all intuitive for the average user to create output styles in the current version, and the available output styles are quite limited (although they promise to correct this in the near future); and 2) character formatting (italics, underlining, etc.) is not an option within the database. Some journal homepages (notably JVP's BioOne page) aren't yet supported for direct linking. But. . .you can still import references using the standard "reference export" option (exactly as you do in Endnote).
Note: If you export from the BioOne website, use the Procite or Reference Manager format - the Endnote format doesn't seem to capture all relevant data. And BioOne, if you're listening, it's really, really annoying that you force the titles and author names of your reference exports into all caps. No journal on earth uses this format!
The Bottom Line: Zotero is *the* open source option for reference management, and it's only going to get better.

Tuesday, December 18, 2007

Reference Managers on Parade - Part II

Bibus
Bibus is one of the more functional open source bibliographic managers. Based on an SQL backend, it will run in Linux, Mac, and Windows OSes.
Pros: The interface is pretty intuitive, and it is easy to create style files for output in specific journal formats. Bibus also includes a feature akin to "Cite While You Write," compatible with both Microsoft Word and OpenOffice.org Writer. Manual input of references is pretty easy.
Cons: The Bibus development team is quite small, and it can be a looooong time between updates. There is no support for character formatting in the database either (italics, underlining, etc.), which is quite annoying if your references have scientific names in them. Additionally, it's not entirely straightforward (although certainly possible) to import bibliographic information from journal websites. In the current version of Ubuntu Linux, lots of folks are having trouble getting it to link in with OpenOffice.org. This is the main reason I abandoned Bibus for the package discussed in my next post. . .
The Bottom Line: Bibus is pretty functional, but has its quirks. It's a good choice for Windows and some Linux users, but requires a little effort sometimes.

Sunday, December 16, 2007

Reference Managers on Parade - Part I

Everyone has a massive reference library, but few know how to manage it. Ideally, you want an application that will let you record all of your papers and import formatted citations into a document. In the upcoming series of posts, I'm going to review some reference managers, both open source and commercial.

Endnote
This program is the classic reference management program, and probably one of the most widely supported by publishers. It has been around for years now, and it shows the polish and feature richness that you would expect for a mature program. This is not an open source or free program, but I include it here just as a standard of comparison.
Pros: The Cite-While-You-Write feature is quite handy; this tool allows you to build your paper's reference list automatically while you type the paper (hence the name for this feature!). Also, Endnote has a very broad output styles database, and it is quite easy to build reference styles (JVP even has a style available from the journal website). You can format italics within each bibliographic entry. It is quite easy to import references from journal websites, too. Available for both Windows and the Mac OS.
Cons: It's expensive - $250 to download, and $300 (oops, I mean $299.95) to have a physical copy shipped to you. Also, there are occasional functionality issues if you try to run the program under WINE (and Cite-While-You-Write doesn't work in OpenOffice.org). No native Linux version is available.
The Bottom Line: If you can afford it, Endnote is the way to go. It has polish and pizazz, and it is widely supported. Linux users are better off looking elsewhere, though.

Saturday, December 15, 2007

Books on the Job Hunt

Now it's time for a post about something not computer related - the job hunt! As some who know me may know, I am in the middle of my job search. This has its ups and downs - there are stressful moments, entertaining moments, and moments of pure confusion. I've gotten a lot of really great advice from many friends, colleagues, and advisors, which has made things infinitely easier (and probably infinitely more successful, in the long run). But what do I do when my advisor doesn't want 20 phone calls a day about how to get started on writing this or that? A selection of handy books have been a real life-saver in this regard!

In this blog post, I'm going to discuss two books (and a website) that I've found especially handy in guiding my job search. They each have their pros and cons, and unique styles. Word of advice to fellow grad students: Read these books sooner rather than later - preferably at least six months in advance of when you plan your own search.

The Chicago Guide to Landing a Job in Academic Biology. By C. R. Chandler, L. M. Wolfe, and D. E. L. Promislow.
This book came out earlier this year. I first spotted it at the U of Chicago Press table at the SICB meetings, and knew I had to have it. Although the title says "Biology," it really would be of interest to most any paleontologist aiming for a job in higher education. The authors of this book lay out a nice and tidy sequence of events - both from the perspective of the applicant as well as from the perspective of the hiring department. As I've read through it, I've found myself nodding my head in agreement with most everything they have to say. The writing style is informal, like you're having a conversation over a beer after hours, and the book is filled with anecdotes laying out the do's and don'ts of the job search from people who have been at both sides of the equation. The example CV's and letters are generally quite helpful, although it might be nice to see a few more samples. Also, they bring up many points about interviewing skills, applications, etc., that I never would have thought of and never had thought to ask about. Two gaps in the books coverage might reduce its usefulness for some folks. First, it doesn't really cover academic jobs outside of the university system - i.e., museum curatorships and the like. Also, it probably won't be that helpful for people aiming for positions as collections managers or preparators. I don't know of any source, beyond chatting over a beer at SVP, that really offers this sort of advice. Second, much of it is geared for people in the North American system - I would be curious to learn from my non-North American colleagues how things work! Despite these gaps, if you buy only one book on the job hunt, "Finding a Job in Academic Biology" would be it.

The Academic Job Search Handbook (3rd Edition). By M. M. Heiberger and J. M. Vick.
This book, now in its third edition, is really intended to address all folks seeking work in academia - whether they're philosophers, paleontologists, or specialists in late 16th century Algerian literature. As such, there are many aspects of the book that aren't all that useful or relevant. The several pages of sample C.V.'s in the humanities and social sciences are usually just flipped through when I'm utilizing this. It has roughly the same content as the Chicago Guide, but offers more detail in certain areas (such as potential questions during an interview). The authors take themselves a little more seriously than the authors of the Chicago Guide, but that's ok too! The Academic Job Search Handbook is a good choice to round out your job hunting book collection.

Chronicle Careers
.
This website, from Chronicle of Higher Education, has some really nice forums and advice columns, which have been particularly helpful for me. They also have a yearly "CV Doctor," in which people send in their CV's for evaluation and comment. Of course, it covers all fields of academia (and all stages, from grad school to emeritus), but there really are some good things on this site. Ms. Mentor is always good for a chuckle, too.

Thursday, November 29, 2007

PAST in Linux

Talk about good timing. In my last post, I lamented the incompatibility of the latest versions of PAST with Linux. Just this morning, Alejo C. Scarano posted a tidy little work-around on the PAST users mailing list. Many thanks to Alejo for sharing this! Here's what to do:
  1. First, you need to get the latest version of WINE (0.9.49). For this, go to http://www.winehq.org/site/download and follow the instructions for your distribution.
  2. Install this latest version of WINE, following their directions.
  3. In the WINE configuration menu (winecfg), turn on "virtual desktop" under default settings for WINE, and then set PAST to run using the global settings. A more elegant way to do this is to call PAST from the command line, by using a command similar to:

    wine explorer /desktop=hl,1024x768 c:\whereeverpastis\past.exe

Friday, November 23, 2007

Statistics Software

Most any paleontologist will, at some time, have to delve into statistics in order to answer some sort of question related to his or her research. Unfortunately, many of these statistical tests exceed the options available in Excel (I find it highly unlikely that Excel will ever have a principal components analysis, for example). So, what's a researcher to do? In this post, I'll address some of the statistical packages available out there as freeware or open source software.

  • PAST. This is probably the easiest to use statistical package out there, and it is geared especially for paleontologists (as you might guess by the name, which is short for "PALaeontological STatistics"). You can run diversity indices, PCA, and a whole bunch of other methods. The interface is quite user-friendly, although it has occasional quirks in how it wants the data aligned in the columns. Bugs, once reported, are quickly ironed out, and new features are added relatively frequently. The statistical plots it produces are generally quite good, but there aren't a lot of options to customize them. The website and documentation are generally pretty good, if a bit simplistic (in the case of the documentation). Unfortunately, after version 1.56b, you can no longer run the software under WINE in Linux (but you can download version 1.56b from the PAST site). Available for Windows
  • R. The gold standard in statistical analysis--this is for people who are really serious about their data. One big plus with R is that it handles large data files without batting an eye - this was a lifesaver when I had FEM outputs with over 150,000 values (to be fair, PAST loaded this too, although much more slowly and only with a lot of data massaging. SPSS choked.)! R has a very active development community, and you can find packages to do just about anything. The big downside (for some users) is that R is command-line only (although front ends such as R Commander now allow access to some, but not all, of R's features via a graphical user interface). But, it is incredibly powerful, and it is very easy to set up little scripts to run through whole masses of data in a matter of seconds. The graphical outputs are highly customizable and easily exportable into widely used formats. The user's manual is some of the best I've ever seen in open source software, too.
  • (S)MATR. This handy little program, available as a standalone executable for Windows (also running under WINE in Linux), an R package, or a MATLAB toolbox, will fill all of your reduced major axis regression needs. It's fast, powerful, and about the best way I've found to deal with data that don't meet the assumptions of Model I regression (ala Sokal and Rohlf). The downside is it doesn't produce graphical plots - but it does all the statistical tests that PAST doesn't.
Any search out on the web will also uncover other statistical packages. The above are just the ones with which I am most familiar.

Wednesday, November 7, 2007

Open Source and Free Software: Pros and Cons

Just like commercial software, open source and free software has its pluses and minuses - for anyone who is new to the concept, it's important to be aware of all of these. A small list follows here.

The Cost: In terms of dollars, open source couldn't be better. You pay nothing (unless you choose to donate to the projects), and get a piece of software that you're free to use and install on as many computers as you wish. No annoying anti-piracy dialogs, no serial numbers, or anything. In most cases, you can give copies of the software to your friends without any restrictions.

Timeliness: This is where open source and free software often shines. New versions of the Linux operating system version Ubuntu are released every six months - compare this to two and a half years for new versions of Mac OS X or five years for Windows. For the statistical analysis system R, there might be four months between releases--and smaller updates and new extensions are added constantly. For many open source, programs, bug fixes and new features are added nearly continually. That is, if you're lucky. Some projects lose steam or just plain die--just as happens with some commercial projects. It's worth it to do a little investigation on an open source or free software project in order to see update histories and if there is a prospect for long-term continuation of the software.

User support: This is a little more variable. Some open source programs--such as R--have have excellent documentation, in the form of lengthy user's manuals and active support forums. Others have no formal user's manual and little or no user community. It completely depends on the software package--again, you'll probably want to do a little research.

Compatibility: Here is another area where you'll want to do some homework. For most purposes, many open source programs will read documents from their closed source cousins relatively easily. For instance, OpenOffice.org can open most Microsoft Office documents--unless the latter has some really fancy formatting or odd macros. Export is also usually a pretty reliable thing--and let me emphasize usually. But don't forget--many of these issues plague commercial software, too!

Required Geekiness: Some Linux distributions (basically, "flavors" of the operating system) practically require a degree in computer science to install and use them. Others, such as Ubuntu, are now at the point where a reasonably computer illiterate person could use them with ease. Similar concerns apply to other programs. If you can use Microsoft Office, you can use OpenOffice.org. But, it takes a decent bit of patience to get R working with your data (although it is worthwhile to note that graphical interfaces for R are now out there).

Saturday, November 3, 2007

Why Do I Use Open Source Software For My Research?

This is one of those really tough, multifaceted questions. As this is intended as another introductory post for the blog, I'm going to start out rather broadly and then move to specifics.

First, some definitions. Most software with which you're probably familiar, such as Microsoft Office, the Windows or Macintosh operating systems, or Endnote, is "closed source." This means that the source code (the lines of programming that tell the program what to do) is not available to the general public. It also typically, but not always, translates into "commercial software." For a commercial software model, this makes sense--why give away your trade secrets?

Another broad category of software is "open source." This means that the source code is available for anyone to download, modify, rebuild, or improve. If you run the Firefox web browser, you're already using open source software. Linux (in its various forms) is an open source operating system. Open source software is usually free, but many companies (such as Novell and Red Hat) sell technical support. Not all free software is open source, though. A good paleontological example of this is PAST. Because some of the software libraries used to develop this free statistical analysis program are copyrighted, it is not possible for the source code to be released.

For me, the descent into open source software started as a matter of necessity. A few years back, I started getting CT scans that I needed to analyze as part of my dissertation. I didn't really have a few thousand dollars floating around to buy one of the commercial packages (such as Amira), so I poked around on the internet and stumbled across the program called 3D Slicer. It turns out that this was perfect for what I needed to do (more on this in a later post)!

Fast forward a few years to this past March, when I bought a new desktop computer for my office. By this time, Windows Vista had come out, and it was the default operating system for most computers. The nature of my research--which includes lots of analysis of big CT scan datasets and finite element modeling--is extremely processor and memory intensive. The relatively massive memory requirements of the Vista interface, along with its new memory management strategy, did not mesh well with my computing needs (even after turning off unnecessary features). Thus, I turned to Linux and all of its associated free and open-source programs, and I haven't looked back.

Today, nearly all of my computer-based research time is spent in open source software. I write manuscripts in OpenOffice, browse in Firefox, manage my library in Zotero, do statistics in R, and analyze my CT scans in 3D Slicer. I also make use of some free (but not open source) programs such as PAST. I've retained a Windows installation on my hard drive, primarily because the commercial finite element analysis software I use is Windows-based (and there aren't any good open source alternatives that I have found to fit my needs yet). Otherwise, it's all Linux, all the time. Yes, that officially makes me a nerd.

So ultimately, then, why do I use open source and free software?
  • The price is right - free!
  • The software does the job I need it to do, and it is getting better all of the time.
  • Some features I have only found in open source or free software.
  • It provides a bit of an interesting challenge.
In the next series of posts, I am going to discuss the pros and cons of using open source and free software (lest you think I'm an anti-Microsoft, Linux fanboy). Stay tuned for more!

Thursday, November 1, 2007

So What's With the Title?

There's a great myth in paleontology these days--that you need technology, and expensive technology at that, in order to do quality research. This includes everything from CT scanning and computer reconstructions to statistical analysis.

Let's face it. Paleontologists are often uncreative when it comes to the use of technology. We use this protocol or that program because we saw someone else use it in an SVP talk. But what if you want to try something different? Do we always need commercial software for our research? Is that fancy-dancy analysis even necessary or useful?

Over the last few years, I've been on the slippery slide into open source software. It started when I wanted to analyze CT scan data without a $4,000 piece of software. The next thing I knew, I had Ubuntu Linux as my primary operating system and I was writing dissertation chapters in OpenOffice. This blog will review how these and other pieces of software may be of use for other folks in paleontology (or other sciences). I'll highlight some of the software I've been using, along with pros and cons.

Finally, I want to make this something other than just a technology blog. In the spirit of open source, I will discuss advice on graduate school, grants, and other topics, with the hope that this information will be useful for many people starting out in their education. And, because I'm a paleontologist, I'll probably discuss that topic a little bit, too.