Kelvin Tan - Solr/Elasticsearch Consultant - Normalized Google Distance

http://blog.outer-court.com/archive/2005-01-27-n48.html has an interesting article on Normalized Google Distance.

In short, using google page counts to determine the semantic distance/similarity between 2 words.

I unknowingly used this in a recent project where we were attempting to detect the sentiment of blogs. For example, is a blog post positively or negatively slanted towards a movie.

The general idea was to come up with a list of positive and negative words, then extract adjectives from the blog post and run these side-by-side against the positive and negative wordlist against google.

For example, given a blog post containing the adjectives "dry", "witty" and "obfuscated", without a priori knowledge or a database of +ve or -ve adjectives, we would run google queries of these adjectives against +ve words like "happy","bright", "light" and "funny", and -ve words like "unhappy", "sad", "troubled" and see which adjectives have a higher correlation against the respective +ve and -ve words.

How's that for AI?

Comments Off on Normalized Google Distance

Supermind Search Consulting Blog Solr - Elasticsearch - Big Data

Normalized Google Distance

Supermind Search Consulting Blog
Solr - Elasticsearch - Big Data