| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by varenc 32 days ago
	Are you worried about Google's response to this? Google reportedly reacts to distillation attempts "with real-time proactive defenses that can degrade student model performance". So if they detected you, they could have intentionally fed you a dumber but plausible variant of Gemini: https://cloud.google.com/blog/topics/threat-intelligence/dis... But also, this model is small and just focusing on the tool use. In terms of token usage, you're probably not anywhere near the people that are trying to distill the entire model.

2 comments

madduci 32 days ago

Well, it's like robbing the robbers, when it comes to training data

link

varenc 31 days ago

This perspective is more cut and dry when its someone like OpenAI scraping the whole internet explicitly for LLM training purposes. But Google has already been scraping the entire internet for 25+ years. At what point did building a smarter search engine transition from indexing, to 'robbing'? And it's not like training Gemini is the first time they used their internet cache to build AI. AI, as academics use the term, has been in use on Google results for a long time.

Basically, if we were okay with Google scraping the internet to build a search index, what is the line they crossed that turned this from acceptable search engine indexing, into theft?

link

tommica 32 days ago

Except one of the robberers is a massive corporation with even bigger legal team...

link

incrudible 32 days ago

It is more like imitating the imitators. There is not much of a legal case here, but poisoning the data is fair game both for those producing original data as well as for those producing its regurgitations.

link

worthless-trash 32 days ago

I think its very hard for the 'websites' to poison the data for ai though, we dont have the 'single point of ingestion' to measure when its being pumped for training data.

link

andai 31 days ago

Give visitor a test. If user fails, user probably human.

link

wordsarelies 31 days ago

well... really thank the courts... the creator of the prompt gets to own the output...

link

janalsncm 32 days ago

You could run Gemma models locally to distill them. Or any other model with tool use.

link

HenryNdubuaku 32 days ago

Yeah, but we wanted Gemini

link