Hacker News new | ask | show | jobs
by solid_fuel 584 days ago
I wouldn't expect an LLM to be good at spell checking, actually. The way they tokenize text before manipulating it makes them fairly bad at working with small sequences of letters.

I have had good luck using an LLM as a "sanity checking" layer for transcription output, though. A simple prompt like "is this paragraph coherent" has proven to be a pretty decent way to check the accuracy of whisper transcriptions.

1 comments

Yes this is a tokenization error. If you rewrite the sentence as shown below:

https://app.gitsense.com/?doc=905f4a9af74c25f&model=Claude+3...

Claude 3.5 Sonnet will now misinterpret "GitHub as "Github"