Hacker News new | ask | show | jobs
by amayne 998 days ago
This is very cool. I'm still trying to understand the pricing. What is a "neuron" in this context? A token? A character?

"Neurons are a way to measure AI output that always scales down to zero (if you get no usage, you will be charged for 0 neurons). To give you a sense of what you can accomplish with a thousand neurons, you can: generate 130 LLM responses, 830 image classifications, or 1,250 embeddings."

130 LLM responses of that length? 1,250 embeddings what size of text?

2 comments

It's effectively a unit of time benchmarked to what we can accomplish in that time as of Sept 27, 2023 (launch). The challenge here is that because we're abstracting away the underlying hardware it's not the same as renting a VM for a period of time. We also don't want to create perverse incentives that keep us from making the underlying system faster. It's similar to how AWS standardized EC2 to a standard compute unit. Over time, as we continue to add faster and faster hardware and better optimize models we expect the cost of a neuron will trend down but the amount of AI inference work that you can do with a neuron will remain relatively constant.
Then call it something like Neural Time Unit (NTU) or Computational Time Unit (CTU) because neurons make people think of neural networks. As in, you pay for the size of your model.
Could you give us an example of what that means in practical terms?

The post says that 1000 neurons will give you 130 LLM responses - but of what length?

(LLMs are generally priced by input and output tokens. The longer the tokens the longer the compute time. Without an idea of what you mean by a response it's hard to understand.)

Likewise: 1,250 embeddings – How big is the text size in the example?

I'm VERY excited to see you doing this and understand it's early stages, but I wan't wrap my head around the pricing without context.

Please rename it, or at least make sure it corresponds to actual neural operations. It's terribly confusing for practitioners.
Really amazing stuff to see this launch with Hugginface! Hope to see it expand beyond text too.

“neuron” is a cute name but there’s too much conceptual overlap with floating point ops, layers, model parameters etc which are time independent. Should just call them inference credits or something. When some large model runs on multiple GPUs it’s even more confusing what neurons / dollars per second might be.

Sounds like 1 Neuron ~= X FLOPS
Those don't explain the relation between neuron cost and length.
Those max tokens seem pretty low