|
|
|
|
|
by smpanaro
412 days ago
|
|
Not a public follow-up but the iOS 17 speech-to-text model has a clever approach to KV caching that works within the ANE’s constraints (fixed size inputs). I wrote about it here[0] but the gist is you can have a fixed size cache and slide it in chunks with each inference. Not as efficient as a cache that grows by one each time of course. [0]: https://stephenpanaro.com/blog/inside-apples-2023-transforme... |
|