Hacker News new | ask | show | jobs
by anonymoushn 43 days ago
I couldn't reproduce this behavior with Sonnet 4, and Sonnet 3.7 has been deprecated since I messed with this stuff. You can try tokenizing the string "<hello> </hello>"

I think the correct tokenization of the string will not have any tokens that contain mixed punctuation and letters, but the result of this approach does contain such claimed tokens.