Hacker News new | ask | show | jobs
by quantiq 1191 days ago
>I'm a little tired of the arguments that the large language models are just regurgitating memorized output

The arguments are valid and you haven’t provided a single counterpoint. Data leakage is a well known problem in machine learning and OpenAI has seemingly done very little to mitigate against it.

1 comments

My point is that they're not _just simply regurgitating training data_ and it's reductionist to suggest that's all they do. I don't doubt there's plenty of contamination in OpenAI's models, and I don't doubt there's some level of regurgitation happening, but that's not all that's going on and we need to take seriously the possibility that LLMs, combined with well engineered prompts, can and/or will be able to tackle problems that aren't in their training data. Where do you even draw the line anyway?

The conversation about contamination (also very important) doesn't need to be mutually exclusive to conversations about social and economic impact, and I'm pretty sure with respect to those issues the results on standardized tests, however sensationalist, however containated, are an important wake-up call for ordinary people who haven't been following along. Something is happening now.