Sunday, January 9, 2011

Citation Format Wars

Over at SV-POW!, Mike Taylor recently addressed the issue of how to format in-text citations. Writing in his inimitable style, he makes the case that PLoS ONE is simply doing it all wrong; the majority of commenters there have agreed. I posted a lengthy comment there, but realized that it would be appropriate to revise and republish those thoughts here too.

First off, let's have a quick recap of the issue. When writing a scientific paper (or any paper, for that matter), it is essential to credit the sources of information and ideas. Not only does it allow the reader to learn more about the topic, it's the ethical thing to do. Rather than a simple reference listing at the end of the paper, most scholarly works also reference the relevant works within the text. This is called an in-text citation, and allows the reader to know precisely which information was associated with which author.

Two Worlds
Two styles of in-text citation dominate the scientific literature. The first of these is author-year, which looks something like this: (Farke, 2010). The second is numbered, which looks like this: [1]. This number then refers to a specific bibliographic entry at the end of the paper. Many variants of each style exist.

PLoS ONE uses numbered citations, in common with many other high profile journals (such as Nature), and in marked contrast to most of the paleontological, geological, and anatomical literature (such as Journal of Vertebrate Paleontology, The Anatomical Record, Geology, and others). The SV-PoW! post, of course, argues that the numbered format is vastly inferior to the author-year format. Let's boil the argument down to its essentials, and delve into the pros and cons of both formats in more detail.

Two essential reasons are given for why the author-date format are preferable: 1) ease of reading for authors familiar with the literature; 2) paleontologists don't like it. PLoS ONE thus chose a numbered reference format simply because they wanted to copy the glamour magazines. Do any of these arguments hold up?

Advantages of Author-Year (and disadvantages of Numbered)
Of course, there are some significant advantages to the author-year format. These include:
  1. It's easy for readers who are familiar with the literature to know exactly what's being discussed. If I quote from my 2010 JVP paper on ceratopsian sinuses, "Less detailed descriptions have been published for other chasmosaurine and some centrosaurine ceratopsids (e.g., Gilmore, 1917; Lehman, 1990; Sampson, 1995; Sampson et al., 1997)," a long-time ceratopsian worker will know right off the top of her or his head that I'm talking about the Gilmore Brachyceratops monograph, Tom Lehman's paper in the Dinosaur Systematics volume, Scott Sampson's description of the Two Medicine centrosaurines in JVP, and the ZJLS paper with Scott, Michael, and Darren. I see pages from those papers when I close my eyes, and I could almost write the citation for each of them off the top of my head.
  2. You don't have to flip back and forth between the main text and the reference list. For the ceratopsian expert described above, there's no need to waste time skipping around the paper (or PDF). It's just easier.
  3. It helps readers new to the field to become familiar with the major names and papers. See the names "Wedel," "Taylor," "Wilson," "Curry-Rogers," and others often enough, and you probably have a good picture of a few of the major recent workers in sauropods.
  4. It's easier for authors to keep their references straight. When writing and revising without use of a citation manager, the numbered system can get very unwieldy. If you add a reference in the middle of the paper, you not only have to renumber the entire bibliography after that reference, you also have to change the numbers within the manuscript itself. Miss one, and your readers are going to be grumpy when the number and citation don't match up.
  5. It's familiar to the paleontological community. As mentioned above, "It's Got What Paleontologists Crave."
Disadvantages of Author-Year (and Advantages of Numbered References)
As you might have guessed, there are some disadvantages, too:
  1. The author-year format is helpful only if you are already familiar with the relevant literature. Otherwise, you're still in the game of flipping back and forth to the reference section. Anticipating that most of my readers are savvy to vertebrate paleontology, but not to the latest in tectonics, contrast my above example in point 1 with this example (Najman et al., 1997, Geology 25:535-538): "Why is this so, as crustal thickening and metamorphism are thought to have occurred by this time (Frank et al., 1977; P. Zeitler in Hodges and Silverberg, 1988; Inger and Harris, 1992; Searle, 1996, and references therein; Vanny and Hodges, 1996)?" Although I understand the meaning of the sentence, the names and dates have absolutely no meaning to me, other than to help me find the appropriate citation in the back. I'm not familiar with that literature, so I'm annoyed by the extra text.
  2. Not every reader wants to become an expert on a given subspecialty. Believe it or not, I may not be reading a plate on Indian tectonics (or sauropod vertebrae) because I want to become an expert on said subject. Let's say that I'm chasing the above-mentioned example from Najman because I want to know the context for some fossils I found in a format described in that paper. I just want the bare minimum of info, and I don't care about Frank, or Zeitler, or Hodges, or Silverberg, or Inger, or Harris, or Searle, or Vanny. Sure, maybe I'll chase some of those references for alternate opinions, but once that's done the names will probably never cross my mind again. This leads to the next point. . .
  3. The author-year format clutters the text. I'm not the first person to state this, and I'm not the last. By editing my ceratopsian quote above, you now get: "Less detailed descriptions have been published for other chasmosaurine and some centrosaurine ceratopsids [1-4]." Try the same with the Najman quote. Much shorter and more easily readable. A comment on the SV-POW! post by Zen Faulkes gives some more nice supporting opinions.
  4. Most of the rest of the scientific world uses numbered citations. I think people are giving Science and Nature a little too much credit for driving the numbered citation game. Yes, they certainly are the most visible journals to those of us in paleo/geo/zoological sciences, but that's a rather myopic view. I did a quick survey of the other 99 percent of the scientific literature, and numbered citations simply dominate. Even arXiv - the epitome of digital presentation with no real standard format - has a vast majority of papers with the [1,2,3] style (in fact, the only counterexamples I found were in a handful of biologically-oriented papers). The medical literature (medically oriented papers are the great majority of PLoS ONE submissions), computing literature, physics literature, etc., most often use numbered citations. Let's face it - paleontologists are not the biggest fish in the sea. It doesn't mean we're wrong or can't change things, just that it's a very uphill battle.
Closing Words
So, I have to say that the arguments for author-year and against numbered references are not as simple as one might hope. Major advantages and disadvantages characterize both formats. In the end, I suspect much of it comes down to "what we were born into." I like the author-year format because that's all I've ever known. My spouse, who is a physicist, surely thinks otherwise, but then again all she has ever known is the numbered format. She also thinks paleontologists are silly because we don't use LaTeX (and good luck getting that instituted, no matter how easy it would make things for us).

Interestingly, I came into this with a strong preference towards the author-year citation format, but after thinking about it I'm not sure that numbered citations are the Great Evil that they have been made out to be. What are your thoughts?

Update: The above-mentioned Zen Faulkes has a post strongly coming down on the side of numbered references. He argues that numbered references decrease overall manuscript length, greatly improve readability, and level the playing field for both readers and cited authors. The last argument is particularly novel, and strikes at the heart of the true purposes of citations. I'm not sure I totally agree, but it's definitely food for thought. [12 January 2011]

(As an interesting side-note, the author-year referencing style may be so common in the paleontological and zoological literature because of a historical accident - the format was apparently invented by a Harvard zoologist, and spread throughout the zoological part of the literature. I suspect the weight of the Harvard name didn't hurt.)

Disclaimer: Although I am a volunteer editor at PLoS ONE, this posting is written strictly as my private opinion.

Thank you to the many commenters at the SV-POW! blog, whose thoughts inspired this post.


Jeffrey W. Martz, PhD said...

glI'd like to point out an additional advantage of the author-year citation: Its a lot easier to find a particular reference that you are looking for if they are alphabetized, rather than trying to remember the number. Just to give one example where this is useful, I will sometimes be trying to remember the full citation for a particular article, and pick up a paper where I am pretty sure it was cited to get it. If the refs are alphabetized, I can go straight to it instead of having to scan the pages for it.

"The author-year format is helpful only if you are already familiar with the relevant literature. Otherwise, you're still in the game of flipping back and forth to the reference section."

Only the first time, though. Once you do the first flip and see the full citation, you will recognize it when you see it subsequently in the text. At least for me it's pretty common to find a useful new reference by seeing it cited once, and then encountering other information attributed to it later in the text. I'm more likely to recognize the citation from previous times I saw it if it is given by name, rather than number, which I am less likely to remember.

"The author-year format clutters the text."

I've personally never had any particular problem with this. Once you've been reading it a while, your eyes learn to skip over the citations if you don't happen to be interested.

I don't think the issue is whether or not numbered citations are a "great evil". The issue is simply whether or not they offer more or fewer advantages than the author-date. The only real advantage of numbered citations you gave was not "cluttering the text"; the first three disadvantages that you list for author-year really boil down to the same thing.

Mike Keesey said...

I think this is going to become a non-issue within our lifetimes. As digital media become more frequently used, we'll have a more elegant system than either of these. Probably the best system would be hyperlinked text with rollover popups listing the cited sources. That would be even less obtrusive than numbered references and much more informative than Harvard-style.

Of course, printed media are not going to vanish (nor should they), so this issue will still be relevant there. But people are probably going to increasingly favor digital media. (I, for one, can't even remember the last time I read an article from a scientific journal in print--probably a couple of years ago.)

Andrea Cau said...

From a reader's point of view, I see no big differences among them.
At the same time, I could be also a potential writer, so, when writing a papers, I prefer the most functional and faster way to edit the ms. Re-ordering the citations every time a change in the text occurs (before and after the review) is an absurd loss of time (and energy).
Since this is the blog... are there OPEN SOURCE citation managers online?
If the answer is "yes", both methods are almost equivalent.
If "no", I think I would prefer the Author-Year one.

220mya said...

Regarding the pros/cons of point #1, I completely agree with Jeff. If I'm not familiar with a particular citation (e.g., Doofus, 2011), I'll surely flip to the back to see what it is the first time. However, after I've learned what reference "Doofus, 2011" is, its alot easier to remember when the author/year are cited, then when its just a number. Its very easy to forget whether "Doofus, 2011" was ref. 1 or ref. 3 in the numerical citation style.

I think your physicist wife example is a bit of a red herring. Yes - as paleontologists we are conditioned by the literature we read. However, PLoS One is largely a biosciences journal (and I would consider paleo to be part of the biosciences). Beyond Nature, Science, and a few other journals (where there are good space saving reasons for numerical citations), the vast majority of biosciences journals use author/year citations, including Cell. Shouldn't PLoS One cater to their major audience (i.e., bioscientists)?

I agree with Mike Keesey's comments that new technology should make this argument obsolete in the near future. However, I don't think hyperlinks do the job, because they still navigate you away from the text that you were reading. Only mouse-over pop-ups will solve the problem, because you can stay where you were reading.

Finally, in terms of the practical aspects of writing a manuscript, citation management software (e.g., EndNote) does not fully solve the problem. If one uses this software to manage in-text citations, it can be a real problem if any of your collaborators has this same software installed. If they try to change an in-text citation, it causes huge issues in screwing up the order and content of the citations. Therefore, although I use EndNote to format my references, I do not use it to manage any in-text citations.

Anonymous said...

While the space saving argument doesn't hold much weight with largely electronic media, numbered citations do significantly cut down on length. I prefer to use the author, year style while writing but found on a recent grant proposal that by switching to a numbered format I bought myself a whole extra page to write about the project. Using the author, year style approximately 13% of the text would have been in-text citations.

For those who lament the demise of print media the cost savings associated with printing should be considered. That would also translate into thinner volumes saving space on library and office shelves. I'd love it if I could free-up 13% of my bookshelves and filing cabinets. That might help me get away from the "stratigraphic filing system" that seems to have taken over any horizontal surface in my office.

Jeffrey W. Martz, PhD said...

I still prefer to have a hard-copy of a paper in hand to read or edit more than reading off a screen, but that is an issue of page cost at my end, not the jounral's. This is probably going to continue to be the case for years to come, until things like iPads and Nooks get sophiticated enough that I can carry around a single clipboard sized (and weight) screen which can hold my entire library and has a stylus that I can scribble notes with. Still, I'm willing to lose a few more cents printing out 13% more pages for the reading convenience.

The point about numbered citations in grant-writing is an excellent point though. I wish I had thought of that a few months ago...

Mike Keesey said...

On Andrea's point, most word processing software automates numbered references. Since this is an open-source-themed blog, I'll mention OpenOffice's Write, wherein adding a numbered reference is as simple as Insert > Footnote.

There is absolutely no reason these days for anyone to put themselves through the hell of manually creating numbered references.

(I do use Harvard-style on my blog, where there is no automated footnote system.)

Bill Parker said...

How about a compromise? First time a reference is cited in a paper it needs to be done in full text with the year and following this in brackets the citation number. Then for the rest of the paper only the citation number can be used to say space and clutter.

This is analogous in some ways to the oldtime usage of "ibid" or the full write out in the first usage of all author names instead of "et al" in citations with three authors, which is used by some journals.

Jon said...

Andrea--Zotero is probably the easiest open source ref manager (and it has been featured here before). BibTEX is also great, if you're a LaTeX user.

I think I agree with the previous sentiments that it simply isn't that large of a deal. If you read digitally, it will probably soon turn into hyperlinks. If you read a print out, just keep the references unstapled. It only matters if you are looking at the print journal, and that makes it inapplicable to PLoS.

Andy said...

Excellent discussion here, and many of you bring up some issues I hadn't thought of. Jeff, you're absolutely right that an alphabetized bibliography is essential for usability (no matter what one's level of familiarity with the topic). One alternative, I suppose, would be to combine alphabetized and numbered. The minor problem with this (I'm thinking out loud here) is that when folks cite a series of 4 papers in an intro, you get [3,12,19,43], rather than [1-4].

Mike brings up an interesting point about the advancing wave of digital formatting, but I think one way or another we're still going to be tied to a paper-influenced format for a long time. Of course, it's so easy to switch citation formats (particularly if documents are marked-up properly) that it will probably be a non-issue. I love the image of roll-over citations, and think the effective implementation of this is essential for scholarly publishing in a digital format.

@Randy, I mainly mentioned my spouse as a counter-example to Mike's claim that All Science used author-year. You're completely right that this format is common in biosciences. . .so maybe it's more of an issue for PLoS ONE than we thought? I guess to some extent the journal is "hampered" by a presumed common publishing platform and format between all of the PLoS-labeled journals (although again the argument would be why we can't have author-year for all of them, and just be done with it - maybe it does come down to GlamourMag syndrome?).

As a follow-up to Randy's comment on citation management, sharing documents between authors without such software is a major headache.

Anonymous's comment on numbered citations for grants is pure gold.

Anyhow, good stuff all around. There was also some interesting discussion at SV-POW! about how to reference a specific "page number" in digital documents (esp. when quoting text or referring to a particular section). Definitely worth checking out.

Heinrich Mallison said...

I must say I strongly dislike numbered citations for the simple reason of having to flip to the refs again and again and again each time the same paper crops up again, within the same article. Or do you learn the numbers by heart afresh for each paper you read?


however, slowly entering the shiny digital future, I know a wonderful way of combining the advantages of both methods, and avoiding the disadvantages:

I have seen journals that use numbers in their text, and online have a side bar giving the full citation in the same line. Others use pop-ups: hover the mouse over the number and you get the full citation.

So far, so good, but what do we do about print-outs and PDFs?
Well, the way I hear computers people talk it should be a piece of cake for them to program websites and databases to give me all options:
1) author-year, reflist alphabetically sorted at the end
2) author-year, refs of each page as footnote
3) author-year, refs on margin
4) numbers, refs on margin
5) numbers, refs number-sorted at end
6) numbers, refs as footnote
7) any combination of the above.

So, geeks: shut up or money up!

Andy said...

@Heinrich - it would be completely awesome if journal websites instituted customizable presentations of their papers. Most major journals these days (including PE, I see!) are already marking up their published manuscripts appropriately, so it should be rather trivial (on some levels). At the same time, though, it would probably require a rewrite of the web publishing software that won't happen for awhile.

I got an eReader (Kindle) for Christmas, and have been reading quite a bit of both scientific and non-scientific work on there. It's making me realize that academic publishing has a long, long way to go in order to catch up to the evolving reading technologies.

220mya said...

@Andy: One of my colleagues recently showed me PDFs of paleo papers she had uploaded on the Kindle. I was quite impressed with the readability and image quality. The only drawback was that the zoom feature seemed a bit clunky. Otherwise ereaders/iPads may be the wave of the future!

@Heinrich: I completely agree!

Andy said...

@Randy - I completely agree with all of your points about PDFs on the Kindle. The (current) lack of color can be a hindrance for some papers (yes, I know that the iPad has color, but its screen is not nearly as comfortable to read from for long periods; and, I don't own a pair of hipster glasses).

On a related note, someone at another forum pointed out that Nature has already implemented hover-over citations. Their appearance is a little clunky (it would be great to have a hanging indent, and full journal titles), but it's a great start. Science has a similar feature, and similarly clunky (and occasionally buggy, when I tried it; sometimes the reference didn't pop up).

Mike Keesey said...

"The (current) lack of color can be a hindrance for some papers"

It shouldn't be. Papers should not rely on being in color, for the simple reason that some people are colorblind.

"So, geeks: shut up or money up!"

Looks like a decent project when I have some spare time. Would make a good JavaScript library....

Heinrich Mallison said...

@ Mike: I'll take you up on that one ;)

@ Andy: and I'll add a gripe with that mark-up stuff right here and now (wikipedia solved this better, albeit only slightly): you click the link, it takes you to the ref, you're happy. Now you click------ uhm, what DO I click to go back????

on wiki, you can hover the mouse over the link and see which instance of this particular ref you're on (i.e., is it being used the first, second, fivehundredth time), click it and be taken to the ref list, then click the corresponding number to be taken back. OK, you must remember to check before you click, otherwise you're lost, but at least you CAN. In PDFs and on web pages it is not that easy. You may be able to use the back button, but sometimes that is not possible, it takes you back the correct page but not the correct paragraph.

Andy said...

Most color-blind individuals have a red-green deficit, so this can (usually) be overcome by careful choice of color schemes (having a color-blind prof in grad school made me much more aware of this fact!). And, there are some types of graphics - for instance, photographs of anatomical dissections or outcrops - that really benefit from color but cannot be adjusted by a simple change in fill. However, in an age where many people still print out their stuff in grayscale (or journals only allow this), it's critical to make sure that the figures are at least passable in both formats.

Paleontology Student said...

As Mike T. Keesey said, no problem for digital media, but for printed media, is there any real obstacles to reaching a creative, compromise solution?

How about restructuring the page layout so that it allows the use of numbered in-line citations AND the number-referenced corresponding author-year citation on the margin, in close proximity, or in a special box or something?
Best of both worlds?

Armin said...

I personally like the footnote variant with superscript numbers. The full reference is put on the bottom of the same page.

It does not clutter the text.
The reader does not have to flip back and forth.
It is easy to add small comments to the citation such as "cp".
The bibliography at the end is sorted alphabetically.

It was once quite common in Europe. Now most Universities prefer numbered endnotes. Anyone knows why it came out of fashion?