| HN Mirror

Parsing HTML content won't get you the full benefit an inference engine would. An inference engine could easily learn that 90% of your users getting to your landing page are going to login & end up on their home screen so it would push the static resources for the home screen too. Similarly, it might know that it already pushed those resources previously in the session & only push the new static resources that are unique to you once you login (saving the round-trip of the client nacking the resource). Doing it via stateless HTML parsing is never going to work because you have no idea of the state of the session. That doesn't mean there's not a place for a mixture of approaches (& yes you could teach the HTML parsing about historical pushes but then you get back to the concern you raised about storing that data somewhere).

The HTML parsing approach is probably great from a 80% of the benefit for 20% of the effort on small-scale websites (i.e. majority). A super accurate inference engine might use deep learning to train what to serve on a very personalized level if you have a lot of users & the CPU/latency trade-off makes sense for your business model (i.e. more accuracy for a larger slice of your population). A less accurate one might just collect statistics in a DB & make cheap less accurate guesses from that (or use more "classic ML" like Bayes) if you have a medium amount of users or the CPU usage makes more sense and you're OK with the maintenance burden of a DB. It's a sliding scale IMO of tradeoffs with different approaches making sense depending on your priorities.