Hacker News new | ask | show | jobs
by samsullivan 315 days ago
answering correctly is completely dependent on the attention blocks to somehow capture the single letter nuance given word tokenization constraints. does the attention block in kimi have a more receptive architecture to this?