A solution couls be a model trained on the exact timeline of some text being typed that can predict how long it will take for the user to type the predicted text
eg. "I need a plane ticket to Ha" - 730ms -> "I need a plane ticket to Hawaii"
The model would detect deviations from the estimated time and invoke the main LLM. This could work for spoken word too, it would just be trained on real speech instead of typing.
eg. "I need a plane ticket to Ha" - 730ms -> "I need a plane ticket to Hawaii"
The model would detect deviations from the estimated time and invoke the main LLM. This could work for spoken word too, it would just be trained on real speech instead of typing.