I was listening to All Things Considered yesterday while preparing dinner. A short, interesting story came on: You Have An Accent Even On Twitter. The NPR host, Robert Siegel, interviewed Jacob Eisenstein, a post-doc at Carnegie Mellon who has been examining regional variances in Twitter usage.

Some highlighted examples of Twitter dialecticisms:

In New York, people tend to do “suttin” (i.e. something, and usually having nothing to do with Sutton Place)

The use of “hella” to mean “very” as in “I’m hella tired” is more commonly iterated by people who’ve lived in Northern California.

(LOL is universally understood.)

I was sufficiently intrigued to track down Dr. Eisenstein’s paper, A Latent Variable Model for Geographical Lexical Variation, presented on January 8 at the annual meeting of the Linguistics Society of America in Pittsburgh. It’s a technical article befitting an MIT graduate, with un-trendy headings like “Cascading Topic Models,” “Inference” and heavy math. Still, I enjoyed the perusal.

Eisenstein and his colleagues started with a Gardenhose Twitter sample stream, which they say contained ~15% of public messages, from the first week of March, 2010. They whittled those down by selecting for tweets geo-tagged to the continental U.S. by authors who sent at least 20 messages during that period, and without URLs. Ultimately, they examined at some 380,000 Twitter messages (tweets) from 9,500 users.

The findings are really cool. (To be clear – that would be “coo” in Southern CA, or “koo” in Northern CA.)

Good to know that “af” signifies “as f-ck” (as in “very”), and is more commonly typed in Los Angeles than in some other parts. “Ima” for “I’m going to” is a New York kinda thing. “Gna” for “going to” is popular in Boston, but sounds familiar to this mother of a teenager in NYC.

From the Carnegie Mellon press release:

Studies of regional dialects traditionally have been based primarily on oral interviews, Eisenstein said, noting that written communication often is less reflective of regional influences because writing, even in blogs, tends to be formal and thus homogenized. But Twitter offers a new way of studying regional lexicon, he explained, because tweets are informal and conversational. Furthermore, people who tweet using mobile phones have the option of geotagging their messages with GPS coordinates.

…Automated analysis of Twitter message streams offers linguists an opportunity to watch regional dialects evolve in real time. “It will be interesting to see what happens. Will ‘suttin’ remain a word we see primarily in New York City, or will it spread?” Eisenstein asked.

I guess we’ll see how this progresses. I’m reminded of sometime around 8 years ago, when I tried cracking the IM code: “POS” meant “parent over shoulder.” That was easy. “Code 9” meant suttin similar, if I recall.


