| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by amazingamazing 514 days ago
	each card is not 20x more useful lol. there's no evidence yet that the deepseek architecture would even yield a substantially (20x) more performant model with more compute. if there's evidence to the contrary I'd love to see. in any case I don't think a h800 is even 20x better than a h100 anyway, so the 20x increase has to be wrong.

2 comments

jdietrich 514 days ago

We need GPUs for inference, not just training. The Jevons Paradox suggests that reducing the cost per token will increase the overall demand for inference.

Also, everything we know about LLMs points to an entirely predictable correlation between training compute and performance.

link

tshaddox 514 days ago

Jevons paradox doesn't really suggest anything by itself. Jevons paradox is something that occurs in some instances of increased efficiency, but not all. I suppose the important question here is "What is the price elasticity of demand of inference?"

link

ckw 514 days ago

Personally, in the six months prior to the release of the deepseekv3 api, I'd made probably 100-200 api calls per month to llm services. In the past week I made 2.8 million api calls to dsv3.

link

shawabawa3 514 days ago

can i ask what kind of api calls you're making to dsv3? Crunching through huge amounts of unstructured data or something?

link

ckw 513 days ago

Processing each english (word, part-of-speech, sense) triple in various ways. Generating (very silly) example sentences for each triple in various styles. Generating 'difficulty' ratings for each triple. Two examples:

High difficulty:

        id = 37810
      word = dendroid
       pos = noun
     sense = (mathematics) A connected continuum that is arcwise connected and hereditarily unicoherent.
       elo = 2408.61936886416
 sentence2 = The dendroid, that arboreal structure of the Real, emerges not as a mere geometric curiosity but as the very topology of desire, its branches both infinite and indivisible, a map of the unconscious where every detour is already inscribed in the unicoherence of the subject's jouissance.

Low difficulty:

        id = 11910
      word = bed
       pos = noun
     sense = A flat, soft piece of furniture designed for resting or sleeping.
       elo = 447.32459484266
 sentence2 = The city outside my window never closed its eyes, but I did, sinking into the cold embrace of a bed that smelled faintly of whiskey and regret.

link

mrbungie 514 days ago

People act like Jevons Paradox is an universal law thanks to Satya's tweet.

link

amazingamazing 514 days ago

the jevons paradox isn't about any particular product or company's product, so is irrelevant here. the relevant resource here is compute, which is already a commodity. secondly, even if it were about GPUs in particular, there's no evidence that nvidia would be able to sustain such high margins if fewer were necessary for equivalent performance. things are currently supply constrained, which gives nvidia price optionality.

link

Scoundreller 514 days ago

Uhhh, isn’t it about coal?

link

numba888 514 days ago

> there's no evidence yet that the deepseek architecture would even yield a substantially more performant model with more compute.

It's supposed to. There was an info that the longer length of 'thinking' makes o3 model better than o1. I.e. at least at inference compute power still matters.

link

amazingamazing 514 days ago

> It's supposed to. There was an info that the longer length of 'thinking' makes o3 model better than o1. I.e. at least at inference compute power still matters.

compute matters, but performance doesn't scale with compute from what I've heard about o3 vs o1.

you shouldn't take my word for it - go on the leaderboards and look at the top models from now, and then the top models from 2023 and look at the compute involved for both. there's obviously a huge increase, but it isn't proportional

link