Hacker News new | ask | show | jobs
by darkwater 3 days ago
What sense of rigour is going to be in a field (LLM usage as a user) where models, context sizes, tooling and broadly "rules" (scary quotes) change every few weeks? There is no literal change to have a scientific approach to anything, churn is too high, there are papers about model XYZ v 12345 from a few months ago that are already old because there is model ABC on version 54321 that addresses half of the issue shown in the paper and add 3 new problems though.
1 comments

With benchmarks, you can re-run them after a change. A measurement in a paper will go out of date quickly unless turned into a benchmark.