Hacker News new | ask | show | jobs
by sgt101 2 days ago
For me there are a bunch of questions:

- was the pause in model scaling a result of the benefits of RL & SFT being easier to access and quicker than scaling, or was it genuinely the result of scaling being low ROI now?

- are power densities necessary to provide high quality on device inference possible? Can the best, technically feasible, architectures accomodate T scale models and run them off batteries that fit in your hand?

- will thing slow down enough to allow edge depoloyments to realise value vs. centralised deployments.

- do edge use cases drive enough revenue to get this to happen?

- can local inference make up for model scale? Does that make sense in a latency/power race with the central infrastructure? Is there a sweet spot here?

I am not sure about any of the answers...