| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by sdesol 311 days ago
	Honestly Gemini Flash Lite and models on Cerebras are extremely fast. I know what you are saying. If the goal is to get a lot of results where they may or may not be relevant, then yes, it is an order of a magnitude slower. If you take into consideration the post analysis process, which is what inference is trying to solve, is it an order of a magnitude slower?

1 comments

More like 6-8 orders of magnitude slower. That’s a very nontrivial difference in performance!

How are you quantify the speed at which results are reviewed?

It’s not speed, but cost to compute.