Hacker News new | ask | show | jobs
by traceroute66 10 hours ago
> You probably want to try renting some time on a dedicated box with roughly the specs you’re considering and running the open models

You don't even need to go that far. For example, with Exoscale Dedicated Inference[1] you just point it at the Hugging Face for the model and quantisation you want to test and it automagically spits out an OpenAI-compatible API endpoint.

[1] https://www.exoscale.com/ai-cloud-infrastructure/dedicated-i...

(I have no relationship with Exoscale, this particular product just crossed my radar recently)

1 comments

I think they're just suggesting renting as a way to test that the hardware they're considering purchasing would actually be able to do what they need.
> I think they're just suggesting renting as a way to test

Well, yes, I understood that.

Which is why I started with the words "You don't even need to go that far.".

To re-phrase what I said in clearer terms:

Instead of renting an instance, then messing around with configuring Linux and whatever via SSH or Ansible or whatever. Just point a Hugging Face link at this magic service and get a ready-to-go API back. Enabling you to test your desired model spec with minimum fuss.

Ultimately the guy wants his own hardware. So why waste time messing around with someone else's VM if you just want to test a specific model spec. That is the TL;DR.

Half of my point was to test the models, the other half was to try to get a sense of what the speed would be. Hard to do, but dropping $5k on a 128 gig machine thinking that will unlock good local AI and then realizing that you’ll need to spend >$20k more to run a decent model, and then finding out that even that gives you crap speed isn’t the best way to discover all this.

I very much want local AI to win this in the end, but it’s extremely expensive to run good models at good speed locally right now. Minimax M2.5/2.7, Qwen 3.6, etc are pretty good for basic stuff, but pretty far off from competing with Opus/Fable.