A feisty embuggerance

When I grade my students’ paper proposals, I make a point of doing a brief Google Scholar search for each student’s proposal, which a) helps me evaluate how thorough they have been; b) helps me help them find additional material (I then give them the sources I found, but also the keywords I used to find them). One of my students in my introductory linguistic anthropology course this term is doing a paper on linguistic aspects of laughter and humor. During my search, I encountered the following citation (direct from Google Scholar to you):

Embuggerance, E., and H. Feisty. 2008. The linguistics of laughter. English Today 1, no. 04: 47-47.

After I stopped laughing, I set to figuring out what was going on.

1) I quickly discarded the theory that an unlikely duo of scholars actually had this pair of names – although that would have been too awesome for words. In fact, no other article listed in Google Scholar has an author named ‘Embuggerance’ (although there are a couple other Feistys).

2) I also considered the possibility that this was one of the many metadata errors in Google Scholar; for instance, there are thousands of articles whose purported authors are named Citations or Introduction or Methods, due to errors where it interprets headings like “IV. Methods” as a name “Dr. I.V. Methods”. But this seemed unlikely in the extreme in this case.

3) This left the possibility that these were pseudonyms adopted by particularly amusing authors as part of a parody article.

In this case the article is in fact a book review (which I could tell because it’s all on one page), so I didn’t recommend it to the student, but I did request it for my own edification. Lo and behold, it arrived today as a PDF.

‘The linguistics of laughter’ is a book review of a The Language of Humour by Walter Nash. It’s perfectly ordinary and non-satirical, and it does not contain the words Embuggerance or Feisty. But next to it is another book review, entitled ‘Concise and human’ which contains the following passage (emphasis added):

Silverlight’s concise and human reports cover a surprising range of curious items, from Acid Rain through Bottom Line, Catch 22, Dinner/Supper, Embuggerance, Escalate, Feisty, Holistic, Krasis, Ms, Naff, Quorate, Shambles and Viable to Yomping.

The four bolded words appear on a single line, and the fact that the Google Scholar metadata thinks that the initials of the ‘authors’ are Dr. E. Embuggerance and Dr. H. Feisty seals the deal. This is the source, and so something like option 2 above is correct. But this is really weird. Not only do the pseudo-authors appear in the middle of a contextualized sentence (not in headings), but the sentence is in the wrong review – a review that itself is found (mostly correctly) in Google Scholar!

To make matters even worse, at the end of the reviews section the phrase ‘Reviews by Tom McArthur’ appears – an attribution which is found in the metadata for ‘Concise and human’ but not for ‘The linguistics of laughter’. And, as if this were not bad enough, even though both reviews are listed as being from 2008, the PDF clearly shows them as being from 1985. If I were a gambling man, I’d wager that 2008 is the year when the metadata was added and/or the file was scanned.

Now, mostly this is just a humorous anecdote; I don’t mean this as an indictment of Google Scholar, which I consider to be the most useful way for most scholars to find academic literature, and which I use virtually every day. But one has to wonder at the process (automated or otherwise) that leads to this comedy of errors. A great deal of virtual ink has been spilled over at Language Log (here and here, for instance) on the metadata problems with Google Books / Google Scholar and its implications for linguistic research, for tenure cases that rest on faulty citation records, and other potential problems. Until there is a way for these sorts of errors to be corrected by end users, we may all be well and truly embuggered.

Google Street View, maple leaf edition

Turning from ancient epigraphy to contemporary epigraphy: Today, Google Street View went live in many Canadian cities, including Montreal. As I’m currently putting together a book prospectus for Stop: Toutes Directions, this is of great interest to me. Google’s images aren’t high enough quality to evaluate damage, wear, and vandalism, much less actually photograph and read the vandalism. On the other hand, it does allow me to easily identify new (currently un-surveyed) areas where there is a lot of linguistic variability. It took me about two minutes, for instance, to find this intersection at the corner of Churchill and Cornwall in Ste-Anne-de-Bellevue, a bilingual community at the western tip of the island of Montreal, where there are two ARRETs, one STOP, and one ARRET/STOP at a four-way intersection. We only have a handful of intersections with all three sign types in our database currently. Or alternately, one of our pet theories is that airports and border crossings tend to have greater numbers of bilingual stop signs, and this could be checked out rapidly without needing a road trip. Just as Google Earth allows archaeologists to find new sites online, but requires a lot of ground-truthing, Google Street View is a handy tool but doesn’t let you skip the hard part. For any of my co-authors who may be reading, though, rest easy: I’m not about to freak out and ask you to start collecting new data online, although I did think about sending you a prank email to that effect, before I thought better of it.

Variant Roman numerals: a project

Yesterday I thought of a great new project that could be a nice little article, or, if I had a grad student with a background in classical archaeology, as a nice little thesis, or, if someone else wants to work with me, a co-authored paper. Heck, if you scam my idea, more power to you – I will cite you widely if it’s good, and mock you widely if not! You see, the Epigraphische Datenbank Clauss-Slaby is a searchable full-text database with over 350,000 Latin inscriptions (including over 20,000 images). You can enter a word (e.g, Germaniae) and it returns all the inscriptions that have that word. Nifty, huh? Just in mucking about with EDCS today I discovered two or three things that will be coming out in my book that are in need of revision, which makes me only a little bitter.

Now of course I’m not a classicist (I have three terms of Latin under my belt, but that’s hardly enough to make me an expert), but I do know a thing or three about Roman numerals. The study of Roman numerals is sorely neglected in modern epigraphy, which is a shame because there are some really interesting social questions to be asked relating to regional identity and literacy (the sort of stuff, e.g., that Greg Woolf does). We think that we know Roman numerals: just take I, V, X, L, C, D, and M, string them together in groups of no more than three, use subtractive notation for numbers like 9 and 44, and you’re done. But it isn’t so simple.

The Roman numerals are not a static and unified system; there are various expressions for the same number (e.g. XVIII vs. XIIX for 18, or XXXXX vs. L for 50). Back in the 1950s, Arthur and Joyce Gordon did some interesting statistical analysis, indicating some potential sources of this variability (chronological, regional, and textual), but he didn’t have the sort of massive resources that the EDCS provides. So, for instance, it is often said that IIIII for 5, XXXXX for 50, and CCCCC for 500 (i.e., not using the sub-base signs V, L, and D) are particularly found in African inscriptions. Well, a quick search for ‘CCCCC’ and ‘XXXXX’ suggest to me that this isn’t a full explanation. Are certain types of inscription more likely to contain these variants? Could we be dealing with a chronological difference? Could we be dealing with a variant typical of minimally literate writers, or writers of informal texts? Or could it be that the shorter forms are used when there’s less room on the medium, with longer variants used when space is not at a premium? I have no idea, but the only way to find out would be to build a list of inscriptions that use these variants, map them in time and space, and evaluate them in terms of the texts in which they occur.

Now, there are some methodological complexities: some of the interesting variation is between different forms for the same character, and there is no way to search for that. Some of the Roman numeral forms (the use of a horizontal bar or vinculum over a numeral to indicate multiplication by 1000) aren’t represented consistently, or at all, so one would just need to rely on other published material to find the relevant inscriptions. And quite a lot of the project would require taking the database results and then referring to the Corpus Inscriptionum Latinarum. But ultimately it would be taking what seems to be a rather dry subject (variability in Roman numerals) and potentially correlating it with variability in social identities (class, ethnic, professional). Well, I think it’s cool, anyway.

Ig Nobel 2009

The annual Ig Nobel awards “for achievements that first make people laugh, then make them think” were given out last night, and once again, anthropology has been well-represented. Catherine Bertenshaw Douglas and Peter Rowlinson won the award for veterinary medicine for their demonstration that cows that are humanized by giving them names produce more milk than those that remain, uh, anonymous. Although they are veterinary scientists their work appears in the interdisciplinary anthropological journal Anthrozoös. Meanwhile, the Ig Nobel for physics went to the biological anthropologists Katherine Whitcome, Liza Shapiro and Daniel Lieberman for their work (which appeared in Nature a couple of years ago) explaining why pregnant women don’t tip over. This is extremely important as it bears directly on the evolutionary costs and benefits of bipedalism, among other issues.

See the full list of winners here.

Bertenshaw, Catherine and Peter Rowlinson. 2009. Exploring Stock Managers’ Perceptions of the Human-Animal Relationship on Dairy Farms and an Association with Milk Production. Anthrozoös, vol. 22, no. 1, pp. 59-69.
Whitcome, Katherine, Liza J. Shapiro & Daniel E. Lieberman. 2007. Fetal Load and the Evolution of Lumbar Lordosis in Bipedal Hominins. Nature, vol. 450, 1075-1078.

A typology of quotation marks

I’ve been thinking about quotation marks lately (okay, now I’ve lost 99% of my readership already, way to go, Steve!) and the different ways we use them. Because I have a strong interest in literacy and culture and the way in which language gets turned into text, these sorts of things excite me in a way that is probably not entirely healthy, but then again, if I wasn’t, you wouldn’t have a post to read. So without further delay, I give you…

“A” “typology” “of” “quotation” “marks”

Quotative: This is the common case in which quotation marks serve to distinguish matter spoken or written in another context, with the presumption that the quoted matter is being reproduced somewhat faithfully. The material may have been spoken or written originally, but there is a much higher expectation of word-for-word reproduction when quoting written material, for the obvious reason that the writer can copy from a written source. This was the original, and remains the most common sense of quotation marks in printed matter. It helps us to distinguish sentences like

Martha said, “Canada is a fascist dictatorship.”

from

Martha said that Canada is a fascist dictatorship.

In the first case we are clearly meant to understand that Martha spoke those words, where in the second Martha might well have said, “Our government is heading towards fascism” or any number of other things.

Neologistic: Quotation marks are frequently used when an author coins a neologism, or coins a phrase using already existing words. One is not quoting some earlier source directly; one is seeking instead to indicate the novelty of the term being used. So, for instance, in this post, I write:

The effect of this ‘conspicuous computation’ was to impress the reader with the vastness of the quantity, serving as an indexical sign of Rome’s military might.

I’m not quoting myself here – I’m coining a new phrase and using quotation marks to alert the reader to this fact. We get into trickier ground when we put quotation marks around a single, existing word that we intend to use in a new sense, as in the following passage:

Let me explain first what I understand by “sociolinguistic”. I use the term in its adjectival form and speak of “sociolinguistic” kinds of research rather than “sociolinguistics”. (Hymes 1971: 42)

The context strongly suggests neologism, but another reading is that Dell Hymes (the author, a renowned sociolinguist / linguistic anthropologist) is seeking to dismantle the entire concept of sociolinguistics, or at least to shift its meaning substantially in this context. If so, we’re dealing with another sense entirely.

Distancing: The quotation marks serve to distance the author from the matter in quotes, but where that matter is not a faithful reproduction of other matter. One finds these very often in the titles of British newspaper articles, possibly because British libel laws are very strict and one could find oneself liable for making a statement that is not a direct quotation of another source but which is also not hard fact. They frequently have a quotative smell to them, insofar as they often relate to assertions or claims by another party, but in fact they are not quotative at all, and often appear to be paraphrases at best. I posted about this elsewhere a couple of years ago, and I still find this use jarring. An example:

‘Many killed’ in Yemen air raid

The BBC is not trying to say that someone wrote or said the words “many killed” in that order or even that the quote is an abbreviation of “many people were killed”. It is reporting on others’ claims, true, but the purpose is not quotative. We can think of the distancing quotes as being quotative minus the condition of (near-)faithfulness.

Ironic: Ironic quotation marks often also distance the author from the words written, but more importantly, distance the meaning of the quoted matter from its standard or accepted one. These are often called “scare quotes” by academics, a term which I find bothersome because they aren’t meant to scare anyone. I am indebted to my colleague Jacalyn Harden who came up with the metaphor of quotation marks as eyelashes – ironic quotes serve as a textual “wink” alerting the reader that some novel sense is intended. Wikipedia uses an example from my late mentor, Bruce Trigger:

Moctezuma II was reported to have had two wives and many concubines, by whom he had a total of 150 children. The king of Texcoco was said to have had more than two thousand “wives” by whom he had had 144 children, 11 born of his chief wife. (Trigger 2003: 178)

So we understand here that two thousand “wives” in the second sentence is not to be understood in the same sense as two wives in the first sentence. In both cases, a ruler is claimed (“reported” vs. “said”) to have some number of wives, so we can tell that the difference is not due to the quotative vs. non-quotative distinction. Because I knew the author of those words and worked on that very book, that I do not think that Trigger meant them deconstructively (see below) or in any other sense. Rather, it is because having two wives is not at all uncommon (even having two wives simultaneously is hardly a historical anomaly), but having two thousand wives strains credulity: the semantic associations we derive from the word wife could never be extended to the relationship between one man and two thousand women.

Deconstructive: There are scare quotes, and then there are “scare quotes”, and these are the latter. Where ironic quotes use the word in a different sense than that intended, deconstructive quotes imply that the object being quote-marked does not in fact exist. So, for instance, when one talks about “race” as opposed to race, one is noting that there is no biological reality to the race concept. Perhaps the most fantastic and potentially incomprehensible example is the following, from the linguistic anthropologist Michael Silverstein:

The important fact, then, is that “I” am to a certain extent what “I” say about “what” “I” drink as much as what “I” say about “it” reflects what “I” can discern about “what” “it” is. (Silverstein 2006)

In Trigger’s example above, he cannot mean that wives do not exist at all – he explicitly rejects this by his use of the un-quote-marked word in the previous sentence, and the un-quote-marked word wife in the second sentence as well. There are wives, and then there are “wives”. But in Silverstein’s example, he is really saying that “I” and “what” and “it” (the latter two referring to ‘that which I drink’) do not exist as real entities – they are socially constructed, to use one well-understood if less-than-ideal term. In ironic quotation marks, “A” is not A, but B, while in deconstructive ones, “A” is not A and is not anything else either.

Emphatic: The quotation marks serve as visual emphasis alone, and are not meant in an ironic, distancing, or quotative function. Most writers, I suspect, would treat this usage as an error, but it is widespread enough to deserve our attention. It is most frequently found on mercantile and informational signs, especially handmade ones. I refer you to The “Blog” of “Unnecessary” Quotation Marks, which gives such great examples as:

Closed “Monday”

“Fire Exit”
Please Do Not Use
Alarm Is On

and my personal favourite:

Thank “God” For All the Troops

These can clearly be excluded from the five other categories. Instead, the quotation marks serve as a sort of typographic highlighter, a means of emphasizing some words in the text. This is confirmed by the contextual association of emphatic quotes with billboards, signs, placards, and other texts meant for wide public visibility, and by the fact that many of the quote-marked words are also emphasized in some other way: boldface, underlined, capitalized, or in larger letters than the rest of the text. Are they truly “unnecessary”? Yes, in the sense that there are other ways to emphasize text, and because this sense is non-standard, some humor derives from understanding emphatic quotes as meaning something else (usually ironic). For instance, take this discussion at the unnecessary quotations blog over the sign Sellersburg Welcomes “President” George W. Bush. Sly jab at perceived electoral fraud, or over-ebullient semantic extension of well-known punctuation? You decide.

It would be very interesting to expand this analysis to specify more clearly the “etymology” (ironic) of each of the six forms and then to examine the historical and semantic relations among them. For instance, I suspect that the quotative and neologistic usages are earliest but that the broad semantic aspect of distance is what unifies all the senses except the emphatic. I also think one could do some very interesting corpus linguistics using students to code instances of quotation reliably, both in terms of frequency in different texts and in terms of this semantic typology. Finally, I haven’t even discussed the use of single versus double quotes (which could have some interesting correlations with my typology), or talked about “embodied” (neologistic) quotation marks in the form of “air quotes” (quotative?). Well anyway, if I write the paper, it’ll give me something to “talk” (ironic) about.

Works cited
Hymes, Dell. 1971. The Contribution of Folklore to Sociolinguistic Research. The Journal of American Folklore 84, no. 331 (March): 42-50.
Silverstein, Michael. 2006. Old wine, new ethnographic lexicography. Annual review of anthropology 35: 481-496.
Trigger, Bruce G. 2003. Understanding early civilizations: a comparative study. Cambridge Univ Press.