Hacker News new | ask | show | jobs
by eastdakota 998 days ago
It's effectively a unit of time benchmarked to what we can accomplish in that time as of Sept 27, 2023 (launch). The challenge here is that because we're abstracting away the underlying hardware it's not the same as renting a VM for a period of time. We also don't want to create perverse incentives that keep us from making the underlying system faster. It's similar to how AWS standardized EC2 to a standard compute unit. Over time, as we continue to add faster and faster hardware and better optimize models we expect the cost of a neuron will trend down but the amount of AI inference work that you can do with a neuron will remain relatively constant.
5 comments

Then call it something like Neural Time Unit (NTU) or Computational Time Unit (CTU) because neurons make people think of neural networks. As in, you pay for the size of your model.
Could you give us an example of what that means in practical terms?

The post says that 1000 neurons will give you 130 LLM responses - but of what length?

(LLMs are generally priced by input and output tokens. The longer the tokens the longer the compute time. Without an idea of what you mean by a response it's hard to understand.)

Likewise: 1,250 embeddings – How big is the text size in the example?

I'm VERY excited to see you doing this and understand it's early stages, but I wan't wrap my head around the pricing without context.

Please rename it, or at least make sure it corresponds to actual neural operations. It's terribly confusing for practitioners.
Really amazing stuff to see this launch with Hugginface! Hope to see it expand beyond text too.

“neuron” is a cute name but there’s too much conceptual overlap with floating point ops, layers, model parameters etc which are time independent. Should just call them inference credits or something. When some large model runs on multiple GPUs it’s even more confusing what neurons / dollars per second might be.

Sounds like 1 Neuron ~= X FLOPS