|
|
|
|
|
by sgt101
2 days ago
|
|
For me there are a bunch of questions: - was the pause in model scaling a result of the benefits of RL & SFT being easier to access and quicker than scaling, or was it genuinely the result of scaling being low ROI now? - are power densities necessary to provide high quality on device inference possible? Can the best, technically feasible, architectures accomodate T scale models and run them off batteries that fit in your hand? - will thing slow down enough to allow edge depoloyments to realise value vs. centralised deployments. - do edge use cases drive enough revenue to get this to happen? - can local inference make up for model scale? Does that make sense in a latency/power race with the central infrastructure? Is there a sweet spot here? I am not sure about any of the answers... |
|