Hacker News new | ask | show | jobs
by bluejay2387 1218 days ago
This is a well known problem with this technology (although I haven't seen an official term for it, so we have been calling them "recovery attacks'). It's apparently the reason companies like Amazon have banned internal use of services like ChatGPT. I should add while it has been proven to occur the likelihood of something like this is very low. It's going to be a rare occurrence.
1 comments

thanks for sharing. Do you think if all would be on the client's local server or cloud there would be still some, even rare, occurrence of that?
The problem could still occur but you would have to be capturing all the queries to your internal LLM systems and then using that data for training. You have complete control of the model so you could just choose not to do that and I would think data leaks of this nature would be less of a concern for an internal environment anyway. You would know that only authorized individuals would have access to the data. I suppose there could still be a very small chance of leaking data to unauthorized employees, but if a rogue employee wants to access data they should not have access to fishing an LLM would probably be the least productive way to do that. Your access logs for the LLM system would clearly display the attempts.

Some commercial services are starting to offer "Enterprise" licenses that prohibit the collection and use for training of your data and that would address the concern as well.