Hacker News new | ask | show | jobs
by owl57 2047 days ago
Even if you need, for example, word boundaries, you might be better off pretending that no non-ASCII non-word chars exist than pretending that everything is valid UTF-8 or whatever. Depends on your inputs and requirements.