|
|
|
|
|
by samstave
848 days ago
|
|
>>>How does it perform on 3090, 4090 or less? Are us mere mortals gonna be able to have fun with it ? >>>Its in sizes from 800m to 8b parameters now, will be all sizes for all sorts of edge to giant GPU deployment. -- Can you fragment responses such that if an edge device (mobile app) is prompted for [thing] it can pass tokens upstream on the prompt -- Torrenting responses effectively - and you could push actual GPU edge devices in certain climates... like dens cities whom are expected to be a Fton of GPU cycle consumption around the edge? So you have tiered processing (speed is done locally, quality level 1 can take some edge gpu - and corporate shit can be handled in cloud... ---- Can you fragment and torrent a response? If so, how is that request torn up and routed to appropriate resources? BOFH me if this is a stupid question? (but its valid for how we are evolving to AI being intrinsic to our society so quickly.) |
|