Hacker News new | ask | show | jobs
by sketerpot 5868 days ago
It can be used as a text classifier. It takes as input a collection of (output, input1, input2, ..., input_n) tuples, stored in the newly-announced Google Storage, and then uses a variety of machine learning algorithms (which I would bet includes some Bayesian stuff) to build a model which it can use to take (input1, input2, ..., input_n) tuples and predict the corresponding output.

So, that's the API. You can do a lot of the same things offline, with almost the same file format, using Weka:

http://en.wikipedia.org/wiki/Weka_(machine_learning)

So if you're interested in playing around with Google's Prediction API, you should probably download Weka and fiddle with it some. It's pretty easy to get started with, and it will definitely give you an idea for the sort of thing you can do here.

2 comments

Another interesting project is crm114: http://crm114.sourceforge.net/

It's ridiculously easy to use. I used it for identification of spam/scam messages and setting it up was just 5 lines of code.

I wrote a blog post about using it here: http://smokinn.com/blog/post/253

I built a bayesian text classifier on App Engine a few weeks ago but it was too slow to be of any use (the datastore that is). Still fun to get it to work though. Did stumple upon a Bayesian classifier web service: http://www.uclassify.com/Default.aspx

Will certainly check out Weka, it's installed on the pc's at my college so when I get the chance I will.