https://github.com/01-ai/Yi-Coder/issues/6#issuecomment-2334...
Works very well now! 65K input tokens with 8192 output tokens is no longer an issue on my 4090. (It maxes out on 22GB/VRAM)