google experiments

Sometimes you need relative frequencies of words/terms and you don’t already have a Perl script for your data.  Google experiments are a quick approximation – just search for the sequence of words in quotes and use the number of page results (at the top) as an approximate frequency.

I used this in my PhD proposal – I wanted IDF values for some words, so I used Google page frequency for a few common words like and and took the maximum to be an approximation of the total number of documents.  Then I did the same for the words I needed IDF for and came up with reasonable values in my figures in 5 minutes or less.

I also used this years ago to help a friend with learning English but forgot until I saw a number of xkcd comics that use the method (and some BBSpot showdowns).  It can also be used for things like estimating conditional probabilities.

If you need to do anything more serious, Language Log advocates using COCA.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s