|
|
|
|
|
by ricardobeat
3267 days ago
|
|
'mañana'.padStart(7, '1') => "1mañana" // ok
'man\u0303ana'.padStart(7, '1') => "mañana" // oops
You are being disingenuous here. Those are different strings, with different lengths (try copying this into the console): 'mañana'.length // 6
'mañana'.length // 7
The latter has two stacked characters. These issues are inherent to Unicode and `padStart` is treating the strings correctly. If you need normalization, use the .normalize method you mentioned yourself.This is a major improvement: double-wide and stacked characters have been there since ES3, but now the language is providing standard tools to work with them. |
|
If it were so simple...
`normalize` doesn't exist in IE at all and not in Safari < 10 so to take this advice we need a polyfill. As you may expect, polyfilling unicode normalization isn't pretty, it requires a massive lookup table.
The best polyfill out there, unorm, clocks in at ~38KB gzipped. Now, keep in mind there are a half dozen or more iframes on many web pages, each would have to load their own copy and it's unlikely the caching would overlap for a number of reasons. Also keep in mind that code builds / loading based on browser support isn't realistic in many cases, so if I want to use normalize, everyone pays the network bandwidth usage penalty not just the IE11 users. Of course this is only one part of the problem, want to iterate over graphmeme clusters? That'll be another massive library. Etc, etc.
The browser JS ecosystem is full of these problems, it's not just text processing. If you've ever wondered why a site needs to load 2MB of javascript, it's because that's about what is needed to create a cross browser compatibility layer and a reasonable standard library.