Normalized Google Distance
Posted by Kelvin on 30 Oct 2006 at 09:00 pm | Tagged as: programming
http://blog.outer-court.com/archive/2005-01-27-n48.html has an interesting article on Normalized Google Distance.
In short, using google page counts to determine the semantic distance/similarity between 2 words.
I unknowingly used this in a recent project where we were attempting to detect the sentiment of blogs. For example, is a blog post positively or negatively slanted towards a movie.
The general idea was to come up with a list of positive and negative words, then extract adjectives from the blog post and run these side-by-side against the positive and negative wordlist against google.
For example, given a blog post containing the adjectives "dry", "witty" and "obfuscated", without a priori knowledge or a database of +ve or -ve adjectives, we would run google queries of these adjectives against +ve words like "happy","bright", "light" and "funny", and -ve words like "unhappy", "sad", "troubled" and see which adjectives have a higher correlation against the respective +ve and -ve words.
How's that for AI?
Comments Off on Normalized Google Distance