Hacker News new | ask | show | jobs
by irthomasthomas 42 days ago
I like chutes. I think I get about 5K prompts per day for $20/m, though they may have stricter limits for new customers.

This gives you practically unlimited usage of frontier models like kimi, deepseek, glm. Their models are always fullsize, never quantised except where the lab themselves provides an 4bit or 8bit model. You can see from the model config exactly which hf model it pulls and the serving co figuration used.

Prompts are encrypted using Trusted Execution Environment (TEE). So neither a model host or neighbour can view your prompts. That's as close as you can get to local level privacy in the cloud.

1 comments

I tried looking into Chutes just now. Seems like there is no easy way to just pay & start using it with OpenCode or Claude Code, right? Their docs don’t seem to mention it. Do I really have to execute code with their API in order to use the models?
No its super easy. I think the confusion is due to the serving and hosting APIs that let you add your own GPUs to a pool and earn money. But for regular inference they have an openai responses API a basic chat app. You can signup to a $3 subscription, or deposit $5 and use your api key.

https://chutes.ai/app/chute/2ff25e81-4586-5ec8-b892-3a6f3426...

curl -X POST \ https://llm.chutes.ai/v1/chat/completions \ -H "Authorization: Bearer $CHUTES_API_TOKEN" \ -H "Content-Type: application/json" \ -d ' { "model": "moonshotai/Kimi-K2.5-TEE", "messages": [ { "role": "user", "content": "Tell me a 250 word story." } ], "stream": true, "max_tokens": 1024, "temperature": 0.7 }'