Tuesday, March 6, 2012

Self-archival: a good start, but not the full solution

We all want our work to be discovered, read and cited. There is little doubt that closed access systems hamper this - a paywall to an article is a hefty obstacle, and we all encounter them at least occasionally no matter how extensive our library access is. From an author's perspective, freely-available PDFs of their work are a major boost.

In recent discussions on Twitter and in the blogosphere, I've chatted with Mike Taylor, Ross Mounce, and others about self-archival as one of many mechanisms to bring about open access. Mike's recent blog post at SV-POW! summarizes much of the discussion to date, and I thank him for helping me to crystalize my thoughts on the topic.

For those who are not familiar with the term, self-archival refers to placing a freely-downloadable copy of a publication (or other work) on one's personal (or departmental, or whatever) web page. In this post, I want to discuss the pros and cons of such an approach.

Pros
  • The PDF is freely available to anyone who wants to see it. No paywalls. No hassle.
  • Once picked up by search engines, your posting may be the first one web users find - even above the "official" journal page!
  • If users browse your website with the PDF, it means that they might discover closely-related work. This can be a big plus for getting the word out about your research program. 
Cautions
  • A personal archive is probably not a permanent archive. Barring special arrangements, your personal or institutional web page is not likely to last substantively beyond your lifetime. Free hosting services such as WordPress may not be around in 20 years (remember Geocities?), so it may be worthwhile to pay for hosting. And make sure your descendents pay for hosting, or that your departmental web administrator doesn't delete your page 15 years after you retire. I have little faith that the PDFs I post on my own web page will be around 200 years from now, at least at that website. That sure would stink for that researcher in 2212, who wants to read all about ceratopsian sinuses.
  • Author-hosted archives are not independent. There is nothing to prevent someone from removing embarrassing details or adding fraudulent information to their publications, and little that a casual reader can do to detect such fraud. The great majority of academic authors are honest - it's that tiny minority we have to watch out for. An independent archive, hosted by an institution, library, or publisher, provides a firewall protecting the literature from the authors.
  • As article-level metrics gain prominence, author-hosted PDFs may skew some statistics. For instance, let's say I publish a paper in PLoS ONE, and also post a copy of the PDF to my site. Because PLoS ONE records and posts view and download statistics for its own site, any downloads or views from my site are not recorded there. Thus, the statistics are spread across several venues. This is not a major issue in my opinion, but some people may care.
  • Under the terms of publication, a publisher may not allow you to post a PDF of your paper. Or, they may only allow you to post a pre-review copy. Or a post-review, unformatted copy. Things get complicated quickly, especially for those concerned about following the letter of the law.
The Up-Shot
If you are active researcher, you should be posting whatever PDFs of your own work that you (legally) can.  If you don't, you're missing out on innumerable opportunities to publicize your work and interact with colleagues. However, personal archiving is not enough to ensure permanence. For the long-term, a bigger solution is needed. Institutional archives, journal archives, society archives, whatever. The ultimate answer may take some time to sort itself out.

    Friday, March 2, 2012

    The Open Museum Notebook - Torosaurus Style

    A new paper on the Torosaurus / Triceratops issue was just published in PLoS ONE, bringing some additional analysis to the table. I won't comment on it any more here (I'm saving my thoughts for a formal reply on the PLoS ONE website itself), other than to refer you to my own paper and the Scannella & Horner response.

    In any case, I have a pile of notes from my own work on Torosaurus (or whatever we should call it), and figured it was time I distribute them a little more widely. So, I just uploaded my notes on the Yale Torosaurus specimens to figshare.com. There isn't really anything earthshaking in there (most of the meat of it has been previously published), but in any case now other folks can use them. The sketches of real bone vs. reconstruction should be particularly useful.

    My sincere hope is that at least a few other paleontologists will follow suit with their own notebooks - there are a lot of unused data that will never see the light of day otherwise. I also have a goal of gradually digitizing and posting my other museum notebooks, but that will probably take some time!

    Citation and Link
    Notes and Observations on Specimens of Torosaurus at the Yale Peabody Museum of Natural History. Andrew Farke. Figshare. Retrieved 15:40, March 02, 2012 hdl.handle.net/10779/664bf2cb5ac486da32c7fb7261e595cd

    Update: Since this posted, I have uploaded a number of other notebooks. Find them on my figshare author page.

    Wednesday, February 8, 2012

    Restoring that sense of wonder

    These can be depressing times for a paleontologist - funding is poor for most, the job market is dim for many talented friends and colleagues, and rhetoric-ridden battles for scholarly publishing rage. That's enough to suck the joy right out of the field. In instances like this, it's nice to step back for a second and think about the really cool stuff going on.

    So, I've put together a list of wondrous things that have happened in paleontology over the past several years. Why are they cool to me? Mostly because they challenge ideas that I acquired while a little, dinosaur-obsessed kid. And they also challenge ideas I've acquired as an "educated" professional. Sometimes it's nice to have our comfort zone stretched.

    Symbols of the new paleontological revolution: an eye-catching Sinosauropteryx crouches on top of mammoth DNA, overlain on a thin-section of dinosaur bone (sources at end)
    In no particular order:
    • We know what colors were on parts of the body of some dinosaurs. Really. How cool is that? Sure, it's not perfect, and there is lots we'll never know, but the mere fact that you can plausibly reconstruct parts of the pelage of a feathered dinosaur is amazing. Especially because I had always believed the truism that we'd know the texture of dinosaur skins, but never the color.
    • I can download a genetic sequence from a woolly mammoth. Or a Neanderthal. Or any number of extinct organisms. I had always known that Jurassic Park would never be a reality. It probably never will be (at least for non-avian dinosaurs). But to stare at the A's, T's, G's, and C's of an extinct organism still gives me some goosebumps.
    • I can listen to a Jurassic katydid. Yes, yes, there are some assumptions in the reconstruction. But let's suspend criticism for a moment, and accept that it's probably at least a decent approximation. These are noises that haven't been heard in 165 million years.
    • We know the sex of some individual dinosaur specimens. Thanks to studies of medullary bone and comparative anatomy, the seemingly impossible is made real. Wow!
    • Similarly, we know the age of some dinosaur individuals at death (give or take a few years). The notion that sauropods only got big because they grew for a century can't be supported anymore. Once again - wow!

    This is just my personal list - what's on yours?

    Sources for image: Mammoth DNA sequence in background from GenBank Accession FJ655900 (published by Enk et al., 2009); dinosaur bone histological section modified from Woodward et al. 2011 Figure 1C (colors inverted and adjusted); Sinosauropteryx modified from original by Marty Martunuik. Image released under Creative Commons Attribution-Share Alike 3.0 Unported license.

    Tuesday, February 7, 2012

    How Big Commercial Publishers Can Help Themselves

    Big commercial publishers - especially Elsevier - have been getting a lot of flack lately. There's the usual background noise about high costs of institutional subscriptions and individual PDFs for non-subscribers, and now we have concerns over SOPA, PIPA, RWA and the burgeoning Elsevier boycott. I think it's fair to say that the argument has been dominated most strongly by the publishers' critics. Nonetheless, there is invariably someone who pipes up in comment threads (or in posts at sites like The Scholarly Kitchen) in defense of the publishers.

    Pro-commercial publisher arguments almost always include the term "added value" or something similar. In other words, the big publishers add something beyond the raw manuscript and figures that are provided by the authors. I think very few people will dispute this claim, at least at its face*. The publishers:
    • facilitate peer review by paying for a manuscript handling system (either licensing a commercial product or installing an open source product on servers they pay for) [note that this is not the same as doing the peer review, which is done by volunteer referees and unpaid or minimally-paid editors]
    • do some copy editing
    • format the manuscripts into a pretty PDF and web page
    • provide a veneer of respectability with well-known journal "brands"
    • distribute the journals to libraries and interested readers, via subscriptions, web hosting, and proprietary search engines
    • and other miscellaneous things
    [*To forestall the inevitable comments, yes, some of these "services" are of dubious value to many users]

    Look, I appreciate the fact that all of this costs money. Somebody needs to be paid to do the formatting into the appropriate medium (whether web page or PDF), technical staff need to make sure the authors submit the files in the right format, it costs money to run a server, programmers don't come cheap, and all of the various functions of a business/journal aren't free (office space, salaries for necessary employees, etc.).

    But does it really cost so much that publishers have to charge $37.95 for a single PDF file, or $392 for a personal subscription to a journal?

    Maybe the answer is yes (forgetting the 30%+ profits for many major publishers). Maybe it does cost a lot of money to produce an article. Fine. Just do a better job of convincing me that it's worth it. Particularly when some of the most labor-intensive tasks (typesetting and peer review) are provided for free by the authors and their colleagues.

    Many large publishers have an established list of things they do that cost money. They've done a decent job of publicizing these talking points, judging by the facts that they show up so often in comment feeds and that I was able to assemble the bullet points above virtually from memory.

    However, publishers have performed miserably at convincing us that $37.95 is a reasonable price for a PDF download. Elsevier and company could deflect much criticism if they were to be more honest and transparent about the costs behind a journal article. How much time/money actually goes into formatting? How much does it really cost to serve a file to the internet, over multiple years? What is the honest per-article cost for the manuscript submission system? How many people actually buy articles? Instead we're stuck with the broken record of "oh, this stuff costs money, OA advocates just think it all happens for free. . ."

    Finally, here's my most pressing question: If economies of scale apply to publishing, why are the largest publishers providing some of the most expensive services? (in terms of solo journal subscription rates, individual PDF downloads, and open access fees) Wow, would I love the answer to that one!

    Post script: It seems that many folks are having similar thoughts. Check out Björn Brembs' round-up here.

    Monday, February 6, 2012

    PLoS ONE 2011 - Final Round-Up

    Back before the new year, I reviewed all 17 of the new fossil taxa that were published in PLoS ONE during 2011. Here, I look at the general trends for paleontology in the journal, both last year and over its entire history.

    Topics and Biases
    Paleontological Topics in PLoS ONE, 2011
    The chart above shows the general topics covered by PLoS ONE papers in paleontology during 2011 (for those of you adding the numbers, a handful were counted in two categories). Just as for new taxa, there is a major skew towards archosaurs. Much as I love dinosaurs, we really need to get a broader diversity of taxonomic coverage. Part of this is probably the result of different cultures of publishing among different groups of specialists - dinosaur workers are comfortable with PLoS ONE, whereas trilobite workers aren't. We need some pioneers in invertebrate paleontology, paleoicthyology, and elsewhere.

    The Big Picture
    By my count, there were around 65 paleontology-related articles published in PLoS ONE last year (2011). This is up from 39 articles in 2010, and reflects a continuing increase since PLoS ONE was founded in 2006.
    Trends in Number of Paleontology Papers at PLoS ONE
    Compare this count of 65 for PLoS ONE with 95 papers in Journal of Paleontology and 120 papers in Journal of Vertebrate Paleontology during 2011. PLoS ONE is still smaller than some "conventional" journals, but I think it is safe to say that it may overtake these alternatives in annual volume within the next year or two. Whether or not this is a good thing for PLoS ONE and paleontology is another question - if the quality of the papers submitted to the journal as well as the editing process can be maintained (or improved where necessary), perhaps yes.

    Many paleontologists clearly are warming up to the idea of PLoS ONE. It is tough to know what factors are behind this - whether it's availability of high-resolution color figures, cost-effective outlets for lengthy papers, frustration with "conventional" journals, the impact factor, broader acceptance of open access, or something else altogether. Other paleontology journals - and paleontological societies that publish their own journals - would be wise to see what they can do to match or improve upon the attractive points of PLoS ONE. As much as I love PLoS ONE, the last thing I want is a publishing monoculture. Unless others journals adapt, though, this may be the result.

    The oldest Eucalyptus in the world - from South America! Modified after Gandalfo et al., 2011



    [note: although I am a volunteer editor at the journal, this post reflects only my personal opinions]

    Sunday, January 22, 2012

    ScienceOnline2012 - Parting Thoughts

    My thoughts on Days 2 and 3 of ScienceOnline2012 are found elsewhere - here I sum up some other impressions.

    Twitter at ScienceOnline
    This is the first time I've actively tweeted through an entire meeting, and found it to be a worthwhile addition. It was cool to see what other folks in my sessions were thinking (at times it was like passing notes in class), and also nice to be able to follow the sessions in other rooms. Over 300 active users participated (on and off-site), and over 17,000 tweets discussed the meeting (see this cool summary map)! It's this broad participation that took Twitter from just being a small piece of the meeting to an essential component - an important observation for groups like Society of Vertebrate Paleontology that might want to acknowledge (or even encourage) Twitter.

    Some thoughts on the state of blogging
    One perception I have after ScienceOnline 2012 is that blogging - as an activity and as a medium of communication - seems to have reached a relatively mature state. Sure, there are incremental advances and changes, but by and large I don't really get the sense that there is much substantively new going on (other than new people joining the blogging fold on occasion). This is somewhat reflected by the blogging-relevant sessions at ScienceOnline2012 - they are much the same kind of stuff you might have seen at ScienceOnline 2010, or 2009, or 2011. Topics like getting students involved in blogging, increasing acceptance of blogging in academia, use of images on blogs, etc., are important but really not much advanced beyond where we were a few years ago. [brief note - this should not be interpreted as me saying that I think things are just OK as they are - in fact, it is a rather sad thing that some of these issues are still issues!]

    I don't mean this as a criticism, but just a state of how things are. In fact, stability is partly a good thing in that someone new to the world of blogging can jump in with clear role models, expectations, and pathways to success (whatever success may be). Many of the broad principles have been laid out, and now we're working on refining the details. Some big issues do remain (we can always increase the acceptance of quality blogging for academic career advancement, for instance), but many of these will probably just require the imperceptible cultural shifts that happen over time.

    Some thoughts on the state of online science
    Perhaps it just reflects my own intellectual trajectory, but it seems like we're approaching some measure of stability for many of the old issues in science  communication. Open access - important, but not really novel anymore. Blogging - same thing. Social media - ditto. As all of these trends started, I took a wait-and-see approach before engaging myself. As such, I have missed out on getting in at the very, very beginning of some trends, but have also avoided wasting time with trends that haven't much gone anywhere or have fizzled out (e.g., SecondLife and GoogleWave, to name just two). Based on my attendance at ScienceOnline 2012, the areas to watch include:
    • Crowdfunding: Small donations can add up to decent funding for a focused project, and present unique outreach opportunities. In a field of shoestring budgets like paleontology, I see crowdfunding as a potentially important new trend.
    • Article-level metrics and data set archival and citation: I've tied these two topics together because they reflect a major advance beyond the old journal-level metrics like Impact Factor. Neither topic is completely new, but I saw plenty of new tools at ScienceOnline that may move the discussions and usage of these metrics forward. Furthermore, there is still a long way to go for community buy-in.
    There may indeed be some major issues to watch in science art or writing that I have missed because I'm not really plugged in to those communities, so please comment if there is something I missed!

    Saturday, January 21, 2012

    ScienceOnline2012 - Day 3

    In Day 3 of ScienceOnline 2012 (my second day), we had a fun mix of split sessions and common gatherings. Areas of interest for me included:
    • Students as Messengers of Science: This discussion focused on how to engage high school and college students in science blogging. There are no easy solutions, but there were some tips to get them started. In particular, planning is key. What is the goal? Who are the potential readers? 
    • Why the Resistance to Science Blogging? This session was pretty much as advertised. Unfortunately, there was little new here - yes, there are downsides to putting yourself out there on a blog, but for the most part it seems like it will just take slow attrition of the skeptics to normalize blogging for non-blogging scientists. Same issues as in 2011, 2010, 2009. . .but little in the way of new solutions. One good piece of advice, though: should we put blogging activity on our CV, and if so how? In many cases, there are impactful ways to describe this activity - online outreach editor, web editor, etc. These or similar terms can be honest, accurate descriptors that are more positive for those who might be instinctively averse to the word "blog."
    • Raising Money for Your Science and Journalism with Crowd Funding: This session filled in many of the details related to yesterday's demo - and was quite interesting. One clear worry is that crowd funding in science could be hijacked by "stodgy" forces that try to impose NSF-style limitations on the crowdfunding community (e.g., layers of vetting by experts, etc. - in fact, I think the odds are quite good that someone will un-ironically submit an NSF proposal in the near future to put together a service to validate and serve as a clearinghouse for crowdfunding science). This could have the chilling effect of squeezing out small players in favor of big institutions that are already comparatively well-funded. Vigilance is required - and the situation will doubtlessly change rapidly over the next few years. Either way, it has cool potential.
    • CyberScreen Science Film Festival: Again, what the label says. I'm hoping to find a link to a list of the films - there were some really excellent ones.
    • Closing Plenary Panel on Scientist/Journalist Relations: This isn't a new topic (see here for one recent post), and is getting a little tiresome for many. Lots of discussion, little movement from either side. My thought is that the real problem is not with the journalists or scientists at ScienceOnline, but the reporters who aren't science specialists, or who just copy press releases, or who throw stuff together without contacting relevant scientists.
    Next. . .parting thoughts.