|
|
|
|
|
by samsullivan
315 days ago
|
|
answering correctly is completely dependent on the attention blocks to somehow capture the single letter nuance given word tokenization constraints. does the attention block in kimi have a more receptive architecture to this? |
|