Monday, December 6, 2010

Fun with Collocates: Orange Vodka to Blue Skies

The SCOTS team at Glasgow University has posted a handy tool the BMC ComPair for visualizing collocates in the British National Corpus. Here's an example for orange and blue, also shown below.

The instructions clearly state: "Collocates of both words are shown, together with your search words. The collocates near each extremity have a strong collocational strength with that search word, collocates in the middle are used equally with both your words."



So vodka is more likely to be used with orange, while skies is more likely to be used with blue and light is roughly equally used by both orange and blue. Interestingly, the most common orange collocate is k-type, as in orange dwarf. The results for pink and beige are even more curious.

But first let's try red and green.



Now the collocations range from deer to bethnal. As in Bethnal Green, presumably. As for black and white?



In this case the frequent collocates are pepper and hart. As in the White Hart, again presumably. Keep in mind this is the British National Corpus and not the American (where British and American frequent collocates are Airways and Airlines, respectively). Furthermore keep in mind collocation (as opposed to say, server colocation) is "a sequence of words or terms that co-occur more often than would be expected by chance" (via en.wikipedia.org/wiki/Collocation accessed Dec 6, 2010).

As for beige and pink?



Frequent collocations in this case would be floyd and maquimat. That would presumably be Pink Floyd. But maquimat? Maquimat? As in the unwikipediable maquimat?



Now as for beige maquimat, there are certainly many options, most of which appear to be offered by Lancôme. Many confounding factors aside (such as NER), it's striking that half of the top color term collocates were unfamiliar.