Hacker News new | ask | show | jobs
by jart 1028 days ago
Haven't you heard of Postel's Maxim?

Web servers need to be able to receive and decode latin1 into utf-8 regardless of what the RFC recommends people send. The fact that it's going to become rarer over time to have the 8th bit set in headers, means you can write a simpler algorithm than what Lemire did that assumes an ASCII average case. https://github.com/jart/cosmopolitan/blob/755ae64e73ef5ef7d1... That goes 23 GB/s on my machine using just SSE2 (rather than AVX512). However it goes much slower if the text is full of european diacritics. Lemire's algorithm is better at decoding those.

1 comments

>Haven't you heard of Postel's Maxim?

Otherwise known as "Making other people's incompetence and inability to implement a specification your problem." Just because it's a widely quoted maxim doesn't make it good advice.