Hacker News new | ask | show | jobs
by spelk 1 day ago
Please correct me if you have contradicting data but: Neuralwatt's price per token vs price for energy comparison doesn't seem to take into account the cost savings from cache hits that other providers offer on pure token rates. The comparison seems to assume every input token is a cache miss.

On top of that, the cloud offering doesn't seem that well-run, they randomly blocked a colleague's API key for a couple days without any heads up, had a weird rate limiting bug and they have been deprecating models without redirects with very short notice, all while taking weeks to onboard new models. I assume some of these problems would be addressed if we had an SLA/enterprise contract.

It's a promising idea though. They offer a $5 trial credit (with an aggressive rate limit) though so no harm in trying it out.

1 comments

> doesn't seem to take into account the cost savings from cache hits

Absolute false information.

From my usage panel for this month:

* Total Tokens 1.1B * Cached Tokens 1.0B 97% of prompt tokens * Cost energy pricing $26.58

The energy pricing is higher then what i actually pay because its a mix of token billing and partial subscription (60% extra "power").

From the $50 subscription, i have about 3/4 left (4.21 of 16.0 kWh used this billing cycle). Used $5.5 in token billing.

That was running 82.0% GLM 5.1, and 18% GLM 5.2. Yes, i have been busy ;)

My actual usage if we look in dollar value was ~ $18.

For your information, that is cheaper the MiMo v2.5 Pro from Xiaomi as there i was doing around 450.000t per cent. And they have the same 75% cheaper prices like DeepSeek. MiMo has a issue with cache retention between session prompts what hurts them vs DeepSeek. Yes, DeepSeek v4 Pro is 2.5x cheaper but nowhere near GLM 5.1, and especially not GLM 5.2.

In case your wondering, zai subscription light is about 80m token / week limit. So on a token/cent price, neutralwatt is about 3x cheaper (and not 5h, week limits to maximize/frustrate).

> all while taking weeks to onboard new models.

Took them 1 day to include GLM 5.2 ... Yes, the remove old models fast because they do not have the server capacity to keep old models around.

> I assume some of these problems would be addressed if we had an SLA/enterprise contract.

Its a small team, not a big huge company. From my experience so far, seen a 2 timeouts, and sometimes slow speeds as servers get overloaded. For what i am paying for GLM ~5.1~ 5.2 ...

Your reply doesn't seem to be in good faith. Please provide your formula for calculating effective per token cost.

I am not sure why the small team argument is relevant. This is a crowded market, there are dozens if hundreds of third party inference providers in the world right now. I'm glad that's a good excuse that works on you but I'm not sure why the average user should care.

The formula is very easy. Go to the website of neuralwatt, and read ... 5$ = 1Kwh in power for non-subscription usage. For subscription usage you get ~50% more.

Then you actually use the service and see how much tokens you use on average. You calculate the token use vs what you pay. And this gives you a stable number to compare different services and model with, if you want the token cost. This is basic school level reasoning and calculation.

> I am not sure why the small team argument is relevant.

This is relevant to the previous poster his question regarding support and SLA/enterprise support.

> Your reply doesn't seem to be in good faith.... I'm glad that's a good excuse that works on you ...

Question: Do you have a issue with communicating with other people in real life?

The irony of questioning someone's communication skills immediately after this exchange is hard to miss.
Just asking because it seems there is a issue given your tone and responses. This is out of concern...