Hacker News new | ask | show | jobs
by getpokedagain 112 days ago
I worked (professionally) on a product a few years ago based upon decision tree and random forest classifiers. I had no background in the math and had to learn this stuff which has payed dividends as llms and AI have become hyped. This is one of the best explanations I've seen and has me super nostalgic for that project.

Gonna try to cook up something personal. It's amazing how people are now using regression models basically all the time and yet no-one uses these things on their own.

1 comments

I worked on a product which was the best ID reader in the world at the time 25 years ago. The OCR engine was based on Decision tree and "Random Forest" (I suspect the name did exist) with only 3 trees. It was very effective as a secret weapon of the competitiveness. I tried to train a NN with a framework called SNNS(Stuttgart Neural Network Simulator) as the 4th tree complement to the existing 3.

Today, hand writing OCR is a "hello world" sample in Tensorflow.

That's awesome and based on my experience I'm not shocked this went well. I'm not sure what the features would be in this but I am assuming they could be specific pixel combinations or other things which would be easily labeled in a few ways. I hope you had fun with it.

My previous project was far from that. https://healthverity.com/audience-manager/

I had a lot of fun, really the last fun project I've had. I hope you had fun as well.

And I still can't find a big NN model which reads historical handwriting well.
in the interest of understanding, is there any code or similar for the approach? does that OCR run anywhere today?
The technology was developed by my predecessor during late 90s when microprocessors was much less powerful, and the resolution of image sensor was low. The relatively high accuracy based on those conditions was a critical factor to use Decision Tree as OCR engine. It's used till 2007 when I left my company.

I don't think it would survive afterwards due to quick change in technology. Even the desktop OCR applications at the time didn't use Decision Tree because the CPU was much more powerful. The DT OCR engine was competitive only under special use case.