Agreed. Also, they don't discuss how they chose their features... It seems the problem was already solved before even applying ML to it. Maybe the example is too naive ?
Html size and html tags count was a natural choice. If it didn't work out the next step would be to try something else. You're right that it's a very naive example and that in a way it was solved before any ml was applied. The surprising part for me was that both features turned out to be interchangeable i.e. any of them could be used. I would expect html tags count to be much more accurate / reliable, etc. Another interesting part for me was the threshold. It's somewhat clear that it should be somewhere between 20 and 1 sec probably but where exactly?