| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by spindump8930 49 days ago
	Remember that models on different inference platforms might not necessarily give exactly the same results, adding another axis of non-determinism to development. Things like quantization, custom model serving silicon, batching, or other inference optimizations might mean a model from the original provider performs differently from the hosted one :/ This paper isn't the exact same scenario, since it's an auditable open weight llama model, but shows the symptoms of this: https://arxiv.org/pdf/2410.20247

2 comments

gchamonlive 48 days ago

It's a shame people love to use hostile language (something I am also sometimes to blame), but I think redsocksfan45 misconception is good to address. The comment is however (rightfully) dead. I'll address it anyways.

Model performance consistency is important not because you want inference determinism (which you can actually get by setting tempetature to zero and applying a static seed). The `another axis of non-determinism` can be illustrated by the question "if I move from openrouter to bedrock, will gpt-5.5 perform the same?", to which the answer is no, at least not necessarily.

This is important because workflows that used to work on one platform might degrade or outright not work on another, even using the same model, which you have to account when deciding which provider to use.

link

bossyTeacher 49 days ago

Anyone who has used gpt-x via openai vs microsoft has experienced this very clearly.

link

energy123 49 days ago

Which one is better?

link

dannyw 49 days ago

For OpenAI, OpenAI direct has always been better; except maybe early 2023-era when OpenAI Platform was not that stable or reliable yet.

For Anthropic, it can vary based on model and time. For Opus 4.7, Bedrock is the clear winner in TPS by leaps: https://artificialanalysis.ai/models/claude-opus-4-7/provide...

link

spindump8930 48 days ago

That artificial analysis page has some great references for this, thanks for sharing.

link

weli 49 days ago

As a rule of thumb inference offered by the model labs are closer to the "true implementation" compared to third parties. They have other problems though.

link