Yesterday I posted a denunciation of Google’s new Ngram Viewer as an example of what Marx called “socially unnecessary labor time”–work that takes skill and craft and time but that nobody wants or needs. Lots of people I respect think more of Google’s Ngram Viewer than I do. Friends who don’t follow the world of digital computing in the humanities thought my post on Ngram Viewer was spot on. Hmmm.
Ngram Viewer represents possibilities–that some day we could have a whole set of interesting and flexible new tools for searching large–really large–bodies of text. So why did they present this possibility in such an absurd way?
It occurs to me that one of the reasons Ngram Viewer is so irritating to people like me grows out of the way Google treats text.If you already know what an “ngram” is you might not want to read any further. If you don’t, you’ll find this both extremely interesting and possibly appalling.
If you enter search term in Google–let’s say “Bush tax cuts,” Google does not search for “Bush” or “tax” or “cuts.” It searches instead for 1 to 5 character patterns, including the spaces as characters. So it might be searching for “Bus” and “h t” and “ax ” or “sh ta”. These are the “ngrams” of the title.
Google ignores morphology: it ignores the meanings of words themselves when it searches. it’s not searching in English–you are, but it’s not. It’s searching for patterns of characters. You do a search for a good Chinese takeout place: Google doesn’t search for “chinese food” it searches for all possible “ngrams” which can be made out of the phrase “chinese food.”
Why does it do this? Because “meaning” as we know it gets in the way. You search for Chinese food, but all the other possible meanings or contexts for chinese food–what counts as authentic, what counts as food and not culture: easier to just break it up into patterns that humans don’t recognize. That way you expose patterns that people aren’t really aware they use: you bypass the pitfalls of language, its ambiguity, its fuzziness, its tendency to depend on context for meaning. Google gives humans what they want by ignoring what they mean. If you don’t find that slightly troubling, or at least interesting, then you aren’t paying attention.
Ngram Viewer reflects this disinterest in meaning. It disambiguates words, takes them entirely out of context and completely ignores their meaning, which is why Patricia Cohen can think that the increased frequency of the word “women” equals the rise of feminism. It basically tells us how often an Ngram appears, not how often a word appears. The word is just surface, like sheet metal on a car.
In that sense, it shows how the people who worked on Ngram Viewer are completely captured by the technology they work with. They aren’t interested in meaning, they are interested in command of pattern. So they produced something that’s offensive to the practice of history, which depends on the meaning of words in historic context.
But then of course, what is reading but pattern recognition? It might seem alarming to think of the English of Shakespeare rendered into the Ngrams of Google, but look how easy it is to find good Chinese take out!
- In the meantime, Ngram viewer is giving us this sort of thing. It’s entirely a map of the present, in which the person drawing up the word pairs wants to have what he already knows confirmed. That “latte” passed “lager” in frequency only reinforces what the person knew when he or she asked the questions: ti’s set up to avoid learning anything new ↩