| HN Mirror

I see what you're saying, but I don't think it applies in this case. Correct use of jargon helps domain experts communicate with higher precision, and papers tend to be written by domain experts for consumption by other domain experts.

Of course there are some (possibly many!) papers where jargon is abused to make something sound smarter. Sometimes this can also happen unintentionally.

In this case, "compute-optimal X" is standard terminology used in large-scale ML model design for finding the most optimal tradeoff with regards to compute when trying to achieve X.

Here, the paper is about finding the optimal model size tradeoff when training on LLM-generated synthetic data. Imagine you have a class of LLMs, from small to infinitely large. The larger the LLM, the higher the quality of your synthetic data, but you will also spend more compute to generate this data ("sampling" the data). Smaller LLMs can generate more data with the same compute budget, but at worse quality.

The paper does some experiments to find that in their case, you don't always want the largest possible LLM for synthetic data (as previously thought by many practitioners), instead you can get further by making more calls to a smaller but worse LLM.