| HN Mirror

Right now my parser is using the combination of open-sourced parsers and combines the best results that they produce. These parsers also use different approaches. Some of them have hardcoded patterns and keywords that they are using for searching in the DOM structure. Some of them uses their own classification ML models. What about LLM, I have plans to try it too, at least for websites that cannot be parsed with existing tools. Also I am thinking about to create my own ML model that will trained on a huge amount of HTML files (but this option is too expensive for me so far)