Tuesday, March 6, 2012

Self-archival: a good start, but not the full solution

We all want our work to be discovered, read and cited. There is little doubt that closed access systems hamper this - a paywall to an article is a hefty obstacle, and we all encounter them at least occasionally no matter how extensive our library access is. From an author's perspective, freely-available PDFs of their work are a major boost.

In recent discussions on Twitter and in the blogosphere, I've chatted with Mike Taylor, Ross Mounce, and others about self-archival as one of many mechanisms to bring about open access. Mike's recent blog post at SV-POW! summarizes much of the discussion to date, and I thank him for helping me to crystalize my thoughts on the topic.

For those who are not familiar with the term, self-archival refers to placing a freely-downloadable copy of a publication (or other work) on one's personal (or departmental, or whatever) web page. In this post, I want to discuss the pros and cons of such an approach.

  • The PDF is freely available to anyone who wants to see it. No paywalls. No hassle.
  • Once picked up by search engines, your posting may be the first one web users find - even above the "official" journal page!
  • If users browse your website with the PDF, it means that they might discover closely-related work. This can be a big plus for getting the word out about your research program. 
  • A personal archive is probably not a permanent archive. Barring special arrangements, your personal or institutional web page is not likely to last substantively beyond your lifetime. Free hosting services such as WordPress may not be around in 20 years (remember Geocities?), so it may be worthwhile to pay for hosting. And make sure your descendents pay for hosting, or that your departmental web administrator doesn't delete your page 15 years after you retire. I have little faith that the PDFs I post on my own web page will be around 200 years from now, at least at that website. That sure would stink for that researcher in 2212, who wants to read all about ceratopsian sinuses.
  • Author-hosted archives are not independent. There is nothing to prevent someone from removing embarrassing details or adding fraudulent information to their publications, and little that a casual reader can do to detect such fraud. The great majority of academic authors are honest - it's that tiny minority we have to watch out for. An independent archive, hosted by an institution, library, or publisher, provides a firewall protecting the literature from the authors.
  • As article-level metrics gain prominence, author-hosted PDFs may skew some statistics. For instance, let's say I publish a paper in PLoS ONE, and also post a copy of the PDF to my site. Because PLoS ONE records and posts view and download statistics for its own site, any downloads or views from my site are not recorded there. Thus, the statistics are spread across several venues. This is not a major issue in my opinion, but some people may care.
  • Under the terms of publication, a publisher may not allow you to post a PDF of your paper. Or, they may only allow you to post a pre-review copy. Or a post-review, unformatted copy. Things get complicated quickly, especially for those concerned about following the letter of the law.
The Up-Shot
If you are active researcher, you should be posting whatever PDFs of your own work that you (legally) can.  If you don't, you're missing out on innumerable opportunities to publicize your work and interact with colleagues. However, personal archiving is not enough to ensure permanence. For the long-term, a bigger solution is needed. Institutional archives, journal archives, society archives, whatever. The ultimate answer may take some time to sort itself out.


    Unknown said...

    Wow...i remember Geocities and friends who lost a LOT when it went offline :-( Good post.

    Mike Taylor said...

    "That sure would stink for that researcher in 2212, who wants to read all about ceratopsian sinuses."


    Andy said...

    Come now, don't laugh! My crystal ball tells me that obscure details of Triceratops anatomy will be a fundamental area of research in the 23rd century.

    Stevan Harnad said...


    1. Self-archiving of the author's refereed draft is done in order to maximize research usage and impact by providing access to all would-be users who lack subscription access.

    2. The long-term preservation problem is not for the author's self-archived refereed draft but the publisher's version of record (and that has nothing to do with open access or self-archiving).

    3. Once self-archiving prevails globally, through the adoption of universal self-archiving mandates by universities, research institutes and research funders, the solution to many other problems will not be far behind (publishing reform, copyright reform, digital preservation, improved and integrated webwide metrics, improved and integrated webwide search and navigation, version control, etc.).

    4. Self-archiving in the author's own institutional repository is definitely preferable to any other form of self-archiving (whether personal website, external host, or central repository) for many reasons, one of them being stability and long-term support. http://roar.eprints.org

    5. Yes, authors' self-archived drafts may sometimes contain errors, and are sometimes updated -- but having them is the difference between night and day for would-be users and research progress, compared to access-denial.