Hacker News new | ask | show | jobs
by killbot5000 101 days ago
The foundation models themselves will be cheap to deploy, but we’ll still need general purpose inferencing hardware to work along side them, converting latent intermediate layers to useful, application-specific concerns. This may level off the demand for “gpu/tpu” hardware, though, by letting the biggest and most expensive layers move to silicon.
1 comments

How specifically would that work? I’ve seen no framework for that happening.
The output of the transformation layers are a collection of embeddings in the latent concept space. Those can be fed into an addition model to extract semantic segments, bounding boxes etc. IIUC this is how dinov3 works.