Hacker News new | ask | show | jobs
by food79 6683 days ago
The Cadillac of bayes classifiers is CRM114--it can use classifiers that are far more advanced than naive bayes, such as clustering, or with hidden markov modeling.
2 comments

I can not over-recommend crm114. I've been using it to classify some database entries and its accuracy is second-to-none, and its custom language makes working with strangely-stored data (like database entries) easy (after you learn the strange language)
Is there any documentation that stands out for learning the alien language?

No doubt, I'll be combing through all of the CRM114 information on the website. Is there anything that is not referenced there that will be of use?

If you are doing a ham/spam type classification, then you won't need the alien language. I am almost a total tech novice and I was able to do well with just some bash scripts. Of course the docs will teach you about better ways to train the system, if you are interested in going from 98% correct classification to 99.5% correct.

learn ham.css < file_to_learn.txt

learn spam.css < file_to_learn.txt

classify < file_to_classify.txt

I am NOT doing ham/spam type classification. I need to define some classifications for specific types of content.
Then substitute ham/spam for whatever those types of content are.
Thanks for the affirmation! It helps when jumping into territory with which I have no previous experience.
Beware, the source code to CRM114 ain't no freakin' Cadillac.