Hacker News new | ask | show | jobs
by sdesol 260 days ago
Honestly Gemini Flash Lite and models on Cerebras are extremely fast. I know what you are saying. If the goal is to get a lot of results where they may or may not be relevant, then yes, it is an order of a magnitude slower.

If you take into consideration the post analysis process, which is what inference is trying to solve, is it an order of a magnitude slower?

1 comments

More like 6-8 orders of magnitude slower. That’s a very nontrivial difference in performance!
How are you quantify the speed at which results are reviewed?
It’s not speed, but cost to compute.