I’ve been playing around with the Google Books Ngram Viewer amidst grading and procrastinating from grading. Here are a few interesting results (along with the corpus and date range for each). Any others you can find?
soda vs. pop, American English, 1900-2008
sneaked vs. snuck, English One Million, 1800-2008
douche, English One Million, 1800-2008
milliard, British English, 1940-2008
encyclopedia vs. encyclopaedia, American English, 1800-2008 vs. encyclopedia vs. encyclopaedia, British English, 1800-2008
Unfortunately, there are some misdated works in the database. Some can be found by searching for 20th century figures like Hitler and Stalin. I’m not sure how to report them.
Oh, yes, absolutely that is the case. The metadata issues with Google Books are well-known; to their credit, I will say that they are much, much improved from last year due to publicity over some of the more ridiculous errors, in particular from Mark Liberman at Language Log. Where there are only a few hits for a search word for a particular period, the Ngram Viewer is likely to have a lot of noise; on the other hand, where there are a lot of hits, the noise is inevitably cancelled out by the massive size of the dataset.
Oesophagus vs esophagus seems to mirror the tendency toward the simplified spelling in British English that you discovered in encyclopaedia vs encyclopedia – very interesting!