| We've had the best success by first converting the HTML to a simpler format (i.e. markdown) before passing it to the LLM. There are a few ways to do this that we've tried, namely Extractus[0] and dom-to-semantic-markdown[1]. Internally we use Apify[2] and Firecrawl[3] for Magic Loops[4] that run in the cloud, both of which have options for simplifying pages built-in, but for our Chrome Extension we use dom-to-semantic-markdown. Similar to the article, we're currently exploring a user-assisted flow to generate XPaths for a given site, which we can then use to extract specific elements before hitting the LLM. By simplifying the "problem" we've had decent success, even with GPT-4o mini. [0] https://github.com/extractus [1] https://github.com/romansky/dom-to-semantic-markdown [2] https://apify.com/ [3] https://www.firecrawl.dev/ [4] https://magicloops.dev/ |
We even have an iFrame-able live view of the browser, so your users can get real-time feedback on the XPaths they're generating: https://docs.browserbase.com/features/session-live-view#give...
Happy to answer any questions!