|
|
|
|
|
by danShumway
1180 days ago
|
|
Scraping/structuring data seems to be an area where LLMs are just great. This is a use-case that I think has a lot of potential, it's worth exploring. That being said, I still have to be a stick in the mud and point out that GPT-4 is probably still vulnerable to 3rd-party prompt injection while scraping websites. I've run into people on HN who think that problem is easy to solve. Maybe they're right, maybe they're not, but I haven't seen evidence that OpenAI in particular has solved it yet. For a lot of scraping/categorizing that risk won't matter because you won't be working with hostile content. But you do have to keep in mind that there is a risk here if you scrape a website and it ends up prompting GPT to return incorrect data or execute some kind of attack. GPT-4 is (as far as I know) vulnerable to the Billy Tables attack, and I don't think there is (currently) any mitigation for that. |
|
GTP4 can't take all the blame for this. If you want a system where GTP can't drop tables, then give it an account that doesn't have permission to drop tables. Build a middleware layer as needed for more complicated situations.