Writing systems blogs: filling a gap

You may have noticed that I have (very slightly) changed the subheading for Glossographia from the former ‘Anthropology, linguistics, and prehistory’ to the new ‘Anthropology, linguistics, archaeology, and writing systems’.  I don’t actually post on prehistoric archaeology very much at all, but I do post on the archaeology of literate societies reasonably often.   And, in particular, I post on issues in epigraphy, writing systems, and literacy very often, so I thought it fitting to promote that subject to its current place of prominence.

Unfortunately, in my experience, there just aren’t that many blogs that focus on issues relating to writing systems and literacy.     Sure, each regional archaeological tradition in the ancient world has its blogs that occasionally discuss textual evidence, inscriptions, paleography, and so on – some of them, like Rollston Epigraphy, do so regularly, others sporadically.   Even then, I don’t know of any academic blogs that focus on cuneiform or on Egyptian hieroglyphs (though I’d love to be shown to be wrong).   And of course, Language Log and other general linguistics blogs do occasionally touch on written language.    But in terms of actual academic blogs that are dedicated in part or in whole to writing systems (their typology, their history, their linguistic features) or literacy (the social and cognitive context and use of writing), there isn’t that much out there.    Maya Decipherment is the most prominent, surely because of the importance of David Stuart in the field.   There’s BabelStone, where Andrew West has, very quietly, been blogging for eight years on Ogham, on Central / Inner Asian scripts, and on issues relating to typography and Unicode.  The wonderful Shady Characters focuses specifically on punctuation – check it out – it deserves a massive readership.  The Omniglot site covers writing systems in some detail, and the corresponding blog does sometimes cover issues in writing systems.   Years ago we used to have Abecedaria, which had general information on writing systems with a focus on the ancient Near East and Levant, but it’s long defunct.   Michael Everson  used to blog about the letter þorn at þorn.info, but that’s been quiet for a year or more.    And … well, that’s about all I know of or read regularly.  Do you know of any others?

So, the change in the header does not reflect an actual change in what I’ll be posting, but is simply a recognition that there really is a need for a blog with a focus on general issues on writing systems and literacy, and that since 2008, this has been one of a relatively small number of places that actually does that.

Is the Voynich Manuscript structured like written language?

This week has seen a bumper crop of news stories about a new piece of research in PLOS ONE by Marcelo Montemurro and Damian Zanette, who are both physicists who specialize in complex systems.     The paper in question is not about physics, however, but argues that the mysterious Voynich Manuscript has properties that suggest that it has language-like structure, based on an information-theoretic analysis of the structure of its words.    If correct, while this is certainly not a ‘decipherment’, this result would be counter-evidence to certain versions of the theory that the VM is a medieval hoax that is undecipherable because it is pseudo-writing, meant to have the appearance of language but having no decipherable content in any natural language.

Now, I am not a specialist in information theory, and I’m not truly a specialist on the Voynich Manuscript (although I have played one on TV), but I am a linguist and I do research on writing systems and allied representation systems like written numerals.      And several things bother me about this paper.  The first is that, as Gordon Rugg (the most significant modern proponent of a ‘hoax’ theory) has pointed out in a comment on the new paper, no one is seriously claiming that the VM is pure ‘noise’ – it clearly is structured, and simply because the VM has some structure, even one that resembles language in some ways, does not entail that it is likely to have a genuine linguistic structure, much less a decipherable one.  Rugg’s own (plausible) theory involves the use of a medieval ciphering system to rapidly produce language-like but meaningless text as part of a hoax, and Montemurro and Zanette have not evaluated this theory at all, as far as I can see, other than to dismiss it.

Furthermore, the only systems to which the VM is compared are two written languages in alphabetic scripts (English and Latin), one written language with a non-alphabetic script (Chinese), one computer language (Fortran), and one natural sequence (yeast DNA).  But there are a wide variety of nonlinguistic, quasilinguistic, and paralinguistic phenomena aside from these, and they haven’t compared the VM to any of them.   Montemurro and Zanette show conclusively that the VM has much more ‘information’ (structure) than the yeast DNA, which we would anticipate, but does not do a good job of accounting for the different types of encoded information, and structured non-information, which might be comparable to the VM.  What is the information structure of known codes and ciphers (both broken ones and undeciphered ones)?   What is the information structure of semasiographic systems like the glyphic system at Teotihuacan?   What is the information structure of the linguistic productions of psychiatric patients who suffer from graphomania?  What is the information structure of pseudo-writing like the Codex Seraphinianus which we know (since it’s a modern piece of conceptual art) carries no message?      None of these comparisons would be conclusive but all of them would be informative.   Right now the range of systems to which Montemurro and Zanette have compared the Voynich is simply too limited to be useful.

Montemurro and Zanette are also seemingly unaware of parallel efforts to use the information structure of undeciphered scripts to evaluate their language-like nature.  Two of the most significant such efforts are the effort to show that Iron Age Pictish graphic symbols from Scotland constituted a phonetic script (Lee, Jonathan and Ziman 2010) and efforts to show that the Indus script of Harappan-period India and Pakistan either does (Rao et al 2009) or does not (Farmer, Sproat and Witzel 2004) resemble linguistically-based writing systems.  These theories have attracted a reasonable degree of attention from linguists, and Richard Sproat, in particular, has done a lot of work trying to address the non-linguists’ methodological and conceptual approaches, some of which has been covered in extraordinary detail at the Language Log.    There’s a much longer discussion to be had there, but suffice it to say that most linguists are skeptical of studies undertaken without any linguistic expertise and assistance.   Again, without taking a position on any of these controversies, it strikes me as irresponsible literature-searching that the Montemurro and Zanette study is so fundamentally unaware of similar efforts in major publications such as Science and the Proceedings of the Royal Society.    If you’re going to use physics to study written language, even if you’re going to ignore every single linguist who’s written on these subjects, maybe you should at least be aware of high-impact articles written in the last ten years by physicists using very similar methods to your own.

For the record, I think that any information-based effort that does not involve linguists at a serious level is likely to make invalid assumptions and thus be highly prone to producing nice-looking gibberish.  For example, the Montemurro/Zanette theory seems to grant that the VM probably does not encode information alphabetically like English, and then suggests instead that it recalls “scripts where -as in the cases of Chinese and hierographical Ancient Egyptian- the graphical form of words directly derives from their meaning.” (Montemurro and Zanette 2013: 4).   Let’s assume we are prepared to set aside their use of the term hierographical, which is a bizarre nineteenth-century anachronism that was vaguely popular for a time prior to the decipherment of the Rosetta Stone, but which has never, in any European language, been a preferred term.   More significantly, it is a gross, entirely improper characterization of Chinese and Egyptian to argue that the “form of words directly derives from their meaning”.     Both scripts have massive phonographic components with some representation of morphemes, words, and semantic categories with signs, as every expert on writing systems has known for thirty years or more – certainly the work of John deFrancis shows this eminently clearly.    Even lumping the Egyptian hieroglyphic and Chinese scripts together in a single category ignores the massive differences between them.   So in essence, Montemurro and Zanette seem to be suggesting that the VM has properties similar to no writing system ever known to have been used on earth, because they do not seem to know what sorts of writing systems they are comparing things to.

In short, I’m afraid what we have here is another case of non-specialists applying the methods of one field inappropriately to some actually complex linguistics problems to evaluate a text whose decipherers (a group riddled with charlatans and cranks) have offered us everything except an actual decipherment.

Plants (humans?) are incredibly cool, but don’t do math

There’s a fascinating article on BBC News today, about a really interesting study that proposes that an internal mechanism in the Arabidopsis thaliana plant (which is used widely in scientific experiments as a model organism) regulates starch consumption in the absence of sunlight in a way that requires the plants to be able to mathematically “divide” the numbers of two different types of cells.  Now I’m not a botanist and I can’t say whether the result is correct, but I do take issue with the claim that “They’re actually doing maths in a simple, chemical way”.  The last quote from the article is more accurate: “This is not evidence for plant intelligence. It simply suggests that plants have a mechanism designed to automatically regulate how fast they burn carbohydrates at night. Plants don’t do maths voluntarily and with a purpose in mind like we do.”

All sorts of natural processes can be modelled using mathematics – so, for instance, Fibonacci patterns appear in a variety of plants in the operation of phyllotaxis (the arrangement of leaves on stems).    We don’t say that these plants ‘do math’.    And the same principle applies above to the new finding above.  It’s incredibly cool that these mathematical patterns emerge, and it’s a very interesting question why they emerge biochemically.  But that raises an even more interesting question: what do we mean when we say that humans ‘do math’?

Humans are organisms and thus part of the physical world, and so lots of the things they do unconsciously or without explicit reflection can thus be modelled mathematically.    But this is not the same as saying that all humans do mathematics.  This seems to be what is being suggested in the last quotation: that ‘doing math’ involves conscious, explicit, purposeful reflection on the mathematical aspects of reality.    Being able to throw a curveball is not ‘doing mathematics’; being able to model the trajectory of a curveball is.  And the overlap between the sets of humans able to do each task is minimal.

Let me give another example related to the plant study above.  A child has a pile of 23 candies and wants to divide it among some gathered group of five kids including herself.  She starts to her right giving one candy to each friend, continuing to pass them out until they’re all gone.    When the process is complete, each child will have 4 candies and the three to the right of the distributor will have 5 each.  We could, if we wished to, define ‘division’ as ‘the process of dividing up a group of objects among another group’ and then say ‘thus, the kids are dividing 23 by 5 and getting 4 with a remainder of 3’.  But I think most of us would be reluctant to argue that the first child understands division, or knows how to divide.    Even though distributing the candy is a conscious decision, and even though it requires some general process (one candy to one child), it does not require that the child be able to do mathematics.

For the same reason, I sometimes have some skepticism when my colleagues in ethnomathematics describe the mathematics of some human activity in terms of fractal geometry or the Fibonacci series.   It is, of course, possible that people have some awareness of the processes behind their activities, and ethnographically, when they can talk about that, it is very interesting.   For instance, if the child above says “Well, I know I have 23 candies and so they won’t go evenly, so there are going to be some left over at the end,” then we do indeed know that the child has some explicit knowledge of division.    I worry, in fact, that because so many natural processes result in such sequences, that we confuse the result with the conscious awareness of the process.  In doing so, we fail to investigate the explicit mathematical knowledge that humans do actually encode in all sorts of things they do, and we falsely attribute a sort of explicit consciousness to activities that have no explicitness underlying them (in humans, animals, plants, and even in nonliving things).

Lexiculture thanks

Thanks to all those, either in the comments or elsewhere, who helped with additional suggestions for my Lexiculture project for my undergraduate course this fall.  I now have over 50 words on my long-list for the students to choose from, which should be enough, but more ideas are, of course, welcome, especially if I decide I want to assign this project multiple years.

The language politics of math and maths

Combining two of my favourite passions, mathematics and linguistics, in a fascinating social analysis of prescriptivism, national identity, and scientific vocabulary, is this video from the Numberphile Youtube channel, entitled, ‘Is it Math or Maths?’

Numberphile regularly features short, popular videos about interesting mathematical stuff, mostly at a layperson’s level.  This video features Dr Lynne Murphy, who teaches lexical semantics at the University of Sussex, and blogs about American/British English differences at Separated by a Common Language.

For the record, as a Canadian, I say ‘math’ but I also say ‘zed’, because that’s the way we roll.