An "LLM crawler app" is needed -- in that you should be able to shift Tokenized Workloads between executioners in a BGP routing sort of sense...
Least cost routing of prompt response. especially if time-to-respond is not as important as precision...
Also, is there a time-series ability in any LLM model (meaning "show me this [thing] based on this [input] but continually updated as I firehose the crap out of it"?
--
What if you could get execution estimates for a prompt?
Least cost routing of prompt response. especially if time-to-respond is not as important as precision...
Also, is there a time-series ability in any LLM model (meaning "show me this [thing] based on this [input] but continually updated as I firehose the crap out of it"?
--
What if you could get execution estimates for a prompt?