|
|
|
|
|
by dehugger
184 days ago
|
|
Better idea, how about you just put a link to a csv dump of your inventory data and label it "AI Agents/Scrapers, click here to get all the inventory data", embed that on every page, then call it a day? When you are being scraper there are two possible reactions:
1 - good, because someone scraping your data is going to help you make a sale (discoverability)
2 - bad, work to obfuscate/block/prevent access. In the first case, introducing a complex new standard that few if any will adopt achieves nothing compared to "here's a link for all the data in one spot, now leave my site alone. cheers". In the second case, you actively don't want your data scraped, so why would you ever adopt this? If you are reading all the inventory data into context then you are doing it wrong. Use your LLM to analyze the website and build a mapping for the HTML data, then parse using traditional methods (bs4 works nicely). You'll save yourself a gajillion tokens and get more consistent and accurate results at 1000x the speed. |
|
Our spec handles this via @SEMANTIC_LOGIC and @BRAND_VOICE. It’s about how the AI represents your brand, not just the raw numbers.
Regarding bs4: mapping HTML to a thousand different store layouts is exactly what we are trying to escape. That is the 'fragility tax'. We are proposing a deterministic fast-lane that bypasses the need for custom scrapers for every single store.
You don't want the AI to 'guess' your data. You want it to 'know' your data.