Sunday, December 26, 2010

Common Mistakes in Scientific Writing [or, A Pedant's Paradise]

In scientific writing, proper terminology is everything. I learned early on that many of my favorite turns of phrase were technically incorrect - and I have been working to improve my writing and editing ever since. Below, I've included some of my "favorite" stylistic oddities. . .hopefully this is useful for at least a few readers! This may be old hat for some of you - in that case, please post a comment with your own grammatical grumblings.

"Outcrops" as a verb
Despite rampant misuse, there is no verb form of "outcrop."
Incorrect: "The Barstow Formation outcrops in southern California."
Correct: "The Barstow Formation crops out in southern California."

"Monophyletic clade"
A clade is, by definition, monophyletic. So, save your space and only use one of the two words!
Incorrect: "Dinosauria is a monophyletic clade."
Correct 1: "Dinosauria is monophyletic."
Correct 2: "Dinosauria is a clade."

"Data is. . ."
The word "data" is plural; "datum" is the singular. You're bucking against popular culture, but think of how delightfully smug you can feel whenever you use the words correctly.
Incorrect: "The data is overwhelming."
Correct 1: "The data are overwhelming."
Correct 2: "The datum is overwhelming, which is odd because it's only a single measurement."

"e.g." and "i.e."
"E.g." is an abbreviation from the Latin "exempli gratia", basically translating as "for example." "I.e." is the abbreviated form of the Latin "id est", translating as "that is." The meaning for the former should be pretty clear; the latter is used when one wishes to provide further clarification of a point.
Incorrect 1: "Many dinosaurs are found in the Hell Creek Formation (i.e., Triceratops and Tyrannosaurus)."
Correct 1: "Many dinosaurs are found in the Hell Creek Formation (e.g., Triceratops and Tyrannosaurus).
Incorrect 2: "Bird skeletons are pneumatized; e.g., they are filled with air sacs."
Correct 2: "Bird skeletons are pneumatized; i.e., they are filled with air sacs."

Lower/Upper vs. Early/Late
Unless you have had a solid introduction to geology (and even then, it's easy to forget), most people probably don't know that there is a major nitpicky difference between Upper Cretaceous and Late Cretaceous. The Upper/Lower designation refers to lithostratigraphic divisions of rocks; they are not the same as the geochronologic ages of the rocks. In other words - Upper Cretaceous refers to a physical lump of sedimentary rocks; Late Cretaceous refers to the age of these rocks. Whenever I try to figure out which word to use, I concentrate on whether I'm talking about time (Early/Late) or position in the rock column (Lower/Upper).
Incorrect 1: These Early Cretaceous rocks are full of fossils.
Correct 1: These Lower Cretaceous rocks are full of fossils.
Incorrect 2: Tyrannosaurus is Upper Cretaceous in age.
Correct 2: Tyrannosaurus is Late Cretaceous in age.

Want some more? The style guide for Journal of Vertebrate Paleontology (available in PDF format) has lots more great hints and tips!


Darrin Pagnac said...

Let me beat a dead equid a bit more. Unfortunately, the term "middle" is in common usage for both stratigraphic and temporal descriptions. Jim Martin has always encouraged his students to use the term "middle" for stratigraphy, and "medial" for temporal descriptions. I've done this in my publications and it has worked very well.

Thomas R. Holtz, Jr. said...

ka, Ma, and Ga are for dates; kyr, Myr, and Gyr are for durations. Nothing lasts for 10 Ma, any more than things last for 1492 AD.

Anonymous said...

"Data" is the Latin plural. We speak English, not Latin. If you just say what comes naturally, you'll find yourself using "data" in the singular. I bet you're not 100% consistent on that point anyway.

Do you insist on using "opera" as a plural too? It's the Latin plural of "opus" you know. Language evolves. Don't tell someone that they're wrong if they're following the predominant usage.

220mya said...

Andy - Great post! But I'm afraid you have not quite got the Lower/Upper and Early/Late thing correct. Indeed, Lower/Upper should be applied to lithologic units, but this is lithostratigraphy, whereas chronostratigraphy is simply the application of geochronology to the geologic timescale.

Anonymous - 'data' are in fact plural in English too - just see most of the entries here: [Definition of 'Data']. Regardless of its use colloquially, 'data' are most definitely plural in scientific writing; any usage to the contrary is a failure of the author and/or editor.

220mya said...

At risk of winning a prize for pedantry, here's one that really gets me (originally courtesy of my PhD advisor):

'Since' as a synonym for 'because'
The word 'since' implies relative time as an adverb, preposition, and/or conjunction. However, it should not be used as a conjunction when implying causation (i.e., where you'd use 'because').

Incorrect: Since the foramen is absent, we cannot code character 32.
Correct 1: Because the foramen is absent, we cannot code character 32.
Correct 2: Since the beginning of the Cretaceous, flowering plants have diversified to become a major component of terrestrial ecosystems.

Andy said...

@220mya - thanks for the correction. Done.

@Anonymous - This post is primarily concerned with scientific, not popular, communication. In many cases (as outlined in this post, for instance) the predominant usage simply does not belong in a scientific paper. Data vs. datum is a prime example - good technical writing demands precise, correct usage. As 220mya points out, all major dictionaries give "data" as the plural form in scientific usage. (and for the record, I do strive to use the two correctly when speaking, but don't get too upset if members of the public don't)

@everyone - thanks for the contributions; keep 'em coming!

Tor Bertin said...

"Comprised of" in the place of "composed of."

Matt BK said...

Regarding Lower/Upper and Early/Late, it's also only proper to capitalize the lithostratigraphic terms (Early and Late) when they refer to real chronological divisions. The Cretaceous period has an Early and Late epoch, but no Middle, whereas you can refer to the Early, Middle, and Late Jurassic. Hence, you can talk about Upper Cretaceous and Lower Cretaceous, but middle Cretaceous has to stay lowercase because it is not formalized.

Mike Taylor said...

Useful thread, Andy. I could not agree more about "monophyletic clade" -- it bugs me as much now as it used to when my primary-school friends referred to a "round circle".

Tom, I've been using Mya -- is that Just Plain Wrong?

Anonymous: while it's true that language changes, that doesn't we have to blindly follow every widespread mistake that gets perpetuated via 4chan and Reddit. I will never write "alot" when I mean "a lot", nor "could care less" for "couldn't care less". Still, it's not clear that singular-data is Just Plain Wrong in the same way as those. I think I'd write "data are" in my own own, but I probably wouldn't correct "data is" if I found it in a manuscript that I was reviewing.

On the Upper/Lower vs. Late/Early distinction: I understand it, but I don't see the point. Really, what information would be lost we as a community just dumped Upper/Lower and used Late/Early everywhere?

on Since vs. Because: oh, please. The use of "since" in these cases is perfectly clear and unambiguous. Like a lot of the rules in the JVP style-guide, it represents nothing more than someone arbitrary preference. It is a COMPLETE waste of time.

Bill Parker said...

In North American journals we put a comma after e.g. (e.g.,) but in European journals a comma is not used.

Thomas R. Holtz, Jr. said...

"Mya" (and even lowercase "mya") are considered perfectly acceptable abbreviations by the USGS and the like, as they are unambiguous in being dates rather than durations. Ma is more formal, however.

Martin Brazeau said...

By far the most common:

"X is the most basal Y".

No terminal taxon in a cladogram is any more "basal" than another—by definition. What people mean is: "X is the sister group of all other Y's". Furthermore, we all know this, and only apply the term "most basal" when we have an imbalanced clade. Nobody calls the Actinopterygii the "most basal" osteichthyans, but they're perfectly happy calling Chondrichthyans "basal gnathostomes".

Anonymous said...

Might seem weird that a linguist is reading this blog, but I had to weigh in, because if people who write style guides that admonish the "which/that" difference and discourage "runs" as a verb were serious about language use, then more linguists would have jobs. "Data" is a plural form in latin of "datum," equivalent to the english past participle as in "had given" or "been given." This is largely unimportant, however, because we speak English, not Latin. Non-linguists (or grammarians) are often shocked to learn that the distribution can be bimodal. if you actually look in most dictionaries, they do treat "data" as a mass noun, not a plural, even in scientific writings (See COCA - since 1990, the ratio between mass-singular/plural has been 7/16 in academic journals).

The primary problem is that dictionaries are written by grammarians, who don't like ambiguity and use etymology to dictate usage - this is like calling modern birds dinosaurs because they evolved from forms that people used to designate as "dinosaurs" - the latin pp is different than the English pp. And most style manuals are written by specialists in their field (which is for the best), who don't have the expertise to realize that grammarians are often uninformed.

There's a further issue with complex restrictive det-Phrases ("bacteria on teeth is a major problem" vs. "the bacteria on her teeth are a major problem"), because in English sometimes we nominate entire clauses, not just proper nouns, but I think I'll just show myself out...