Hacker News new | ask | show | jobs
by kehrlann 3495 days ago
Agreed. Also, they don't discuss how they chose their features... It seems the problem was already solved before even applying ML to it. Maybe the example is too naive ?
1 comments

Html size and html tags count was a natural choice. If it didn't work out the next step would be to try something else. You're right that it's a very naive example and that in a way it was solved before any ml was applied. The surprising part for me was that both features turned out to be interchangeable i.e. any of them could be used. I would expect html tags count to be much more accurate / reliable, etc. Another interesting part for me was the threshold. It's somewhat clear that it should be somewhere between 20 and 1 sec probably but where exactly?
The slowest part of any parser is the lexer - guessing whatever processing they do on the parse tree structure is insignificant by comparison.
Would the person or persons down voting all the author of TFA's comments mind explaining why? Seems a bit off.