| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by thomasfoster96 3268 days ago

Proposals [0] that made it into ES8 (“what’s new”):

* Object.values/Object.entries - https://github.com/tc39/proposal-object-values-entries

* String padding - https://github.com/tc39/proposal-string-pad-start-end

* Object.getOwnPropertyDescriptors - https://github.com/ljharb/proposal-object-getownpropertydesc...

* Trailing commas - https://github.com/tc39/proposal-trailing-function-commas

* Async functions - https://github.com/tc39/ecmascript-asyncawait

* Shared memory and atomics - https://github.com/tc39/ecmascript_sharedmem

The first five have been available via Babel and/or polyfills for ~18 months or so, so they’ve been used for a while now.

[0] https://github.com/tc39/proposals/blob/master/finished-propo...

4 comments

yahelc 3268 days ago

Interesting that String padding made it in -- sort of jumps out as the simplest of these additions. I wonder how much of that had to do with negative PR for JS-land due to left-pad-gate.

link

DelaneyM 3268 days ago

You can read that disaster in two ways...

I consider the fact that a stupid-simple package was depended-upon by so many mature libraries as an indication it should be a language feature.

Working with network protocols I find myself needing padding functions all the time, and there isn't really an elegant way to do so inline, so I welcome this addition.

link

mbell 3268 days ago

> I consider the fact that a stupid-simple package was depended-upon by so many mature libraries as an indication it should be a language feature.

Unfortunately neither the npm module nor the browser version really do what most people want and string handling in javascript is still a minefield.

'\u{1F4A9}'.padStart(5, '1') => "111" // oops (\u{1F4A9} is at the end of this, HN filter)

'\u{1F4A9}'.length => 2

[...'\u{1F4A9}'].length => 1 //WTF?

'mañana'.padStart(7, '1') => "1mañana" // ok

'man\u0303ana'.padStart(7, '1') => "mañana" // oops

'man\u0303ana'.length => 7

[...'man\u0303ana'].length => 7 // WTF? Why doesn't this match the behavior of [...'\u{1F4A9}'].length ?

'man\u0303ana'.normalize('NFC').padStart(7, '1') => "1mañana" // OK

I do understand the unicode issues here, but the inconsistency in the APIs from a user perspective and lack of any fully cross browser support for sane string processing in Javascript means we still have only a few options:

1) Don't do string processing in javascript at all.

2) Include a library to make it sane, these are usually huge as they usually need large lookup tables.

3) Accept that things won't always be correct.

This is one example but the lack of a sane standard library in Javascript is one of the biggest problems the web has right now. I'd be curious to know how many bytes of JS are loaded on the average website just to work around the lack of standard library support for basic functionality, I'd bet it's a very large number. Another fun one: Try to parse a URL and append extra query params to it, correctly.

link

ricardobeat 3268 days ago

    'mañana'.padStart(7, '1') => "1mañana" // ok
    'man\u0303ana'.padStart(7, '1') => "mañana" // oops

You are being disingenuous here. Those are different strings, with different lengths (try copying this into the console):

    'mañana'.length // 6
    'mañana'.length // 7

The latter has two stacked characters. These issues are inherent to Unicode and `padStart` is treating the strings correctly. If you need normalization, use the .normalize method you mentioned yourself.

This is a major improvement: double-wide and stacked characters have been there since ES3, but now the language is providing standard tools to work with them.

link

mbell 3267 days ago

> If you need normalization, use the .normalize method you mentioned yourself.

If it were so simple...

`normalize` doesn't exist in IE at all and not in Safari < 10 so to take this advice we need a polyfill. As you may expect, polyfilling unicode normalization isn't pretty, it requires a massive lookup table.

The best polyfill out there, unorm, clocks in at ~38KB gzipped. Now, keep in mind there are a half dozen or more iframes on many web pages, each would have to load their own copy and it's unlikely the caching would overlap for a number of reasons. Also keep in mind that code builds / loading based on browser support isn't realistic in many cases, so if I want to use normalize, everyone pays the network bandwidth usage penalty not just the IE11 users. Of course this is only one part of the problem, want to iterate over graphmeme clusters? That'll be another massive library. Etc, etc.

The browser JS ecosystem is full of these problems, it's not just text processing. If you've ever wondered why a site needs to load 2MB of javascript, it's because that's about what is needed to create a cross browser compatibility layer and a reasonable standard library.

link

lstamour 3267 days ago

> Also keep in mind that code builds / loading based on browser support isn't realistic in many cases, so if I want to use normalize, everyone pays the network bandwidth usage penalty not just the IE11 users.

Switch to loading it via JS modules and using HTTP2 to keep connection lag low on cellular 3G connections? I agree, more needs to be done to promote these kinds of edge cases. A similar problem occurs with locale-aware date parsing and formatting.

link

yarg 3267 days ago

I don't think that it's really fair to say that he's being disingenuous here, regardless of the underlying byte form the strings look indistinguisable and often users (devs) will expect them to function as such.

I think it would be less confusing to define .length as the number of characters and have an additional .size method returning the number of bytes (I'm assuming that's what .length returns, if not it's even more confusing).

Of course, that already wasn't done - meh.

link

mbell 3267 days ago

> have an additional .size method returning the number of bytes (I'm assuming that's what .length returns, if not it's even more confusing).

It's actually not the number of bytes, it's the number of...'codepoint pieces' is what it could be called I guess? Javascript's language level string implementation is something like UCS-2 with the addition of surrogate pairs being allowed, but counted as separate 'characters' for things like length and index access. It's some twisted middle ground between UCS-2 and UTF-16.

link

jcranmer 3268 days ago

> [...'man\u0303ana'].length => 7 // WTF? Why doesn't this match the behavior of [...'\u{1F4A9}'].length ?

The key thing to remember is that iteration over Unicode strings only makes sense as iteration over code points, not UCS-2 characters, not bytes, not grapheme clusters. The JS String iterator was very deliberately made to iterate over code points. That length reports UCS-2 characters is a historical mistake. That padding is operating on UCS-2 characters is probably a reflection of the fact that the operation isn't well-defined beyond ASCII.

link

mbell 3268 days ago

> The key thing to remember is that iteration over Unicode strings only makes sense as iteration over code points, not UCS-2 characters, not bytes, not grapheme clusters.

There are tons of situations where interating over grapheme clusters is what you want to do.

link

wruza 3268 days ago

And tons of situations where you don't want neither of two (e.g. nfd vs. nfc). Cairo graphics library has utilities for text rendering, explicitly called "toy text" functions in reference, leaving serious rendering to Pango. That's fair. Languages should not call unicode strings "unicode strings" if these are not covered in detail by special libraries with distinct names for ucp/ucs/etc lengths, iterators, etc. There is no such thing as string length or "char" anymore. String is blank or non-blank, anything beyond that is too complex to be part of any stdlib. Even "blank" is not so obvious today.

link

tracker1 3267 days ago

That's why I hated the "it might break stuff" arguments against making "string" interfaces against characters (including combinators), and always using UTF-8 for encoding internally, in memory. Would have made a lot of that easier.

As to your last bit, I tend to favor encodeURIComponent and have done it correctly... the main reason, is to avoid "+" vs " " in query strings.

link

javajosh 3268 days ago

It seems like certain code points (like '\u{1F4A9}' aka the poop emoji) are a single character but the string report a length of 2. That is the root of all of those problems. One of your "problems", the length of an array with a single string element, isn't a problem.

link

mbell 3268 days ago

> One of your "problems", the length of an array with a single string element, isn't a problem.

You're misunderstand the code, it's using the spread operator on a string:

[...'word'] => ["w", "o", "r", "d"]

What is being demonstrated is that under the hood, javascript stores astral plane codepoints as surrogate pairs and strings operate on 'characters' which is why '\u{1F4A9}'.length => 2. But, when the spread operator is applied to a string, it breaks up the string into codepoints, not characters. This is also why [...'man\u0303ana'].length => 7, the combining tilde is a separate codepoint.

This is an example of how wonky string processing in JS is, [...string].length is actually the most straightforward way to get a count of codepoints in a string.

link

recursive 3268 days ago

> One of your "problems", the length of an array with a single string element

That's not what it is. Look again, and pay attention to the ... part.

link

javajosh 3268 days ago

That's really helpful, thanks.

link

colejohnson66 3268 days ago

Any Unicode codepoint above U+FFFF will require 2 UTF-16 characters

link

lngnmn 3267 days ago

Holy fuck! The Fractal Of Bad Design all over again...

link

andersonk 3268 days ago

Honestly, I feel most of the blame falls to NPM for allowing publishers to delete packages. This doesn't happen in other ecosystems (e.g. Java).

link

steveklabnik 3267 days ago

Afterward, they changed their policies so that this can't happen again.

link

pluma 3267 days ago

Most of the blame falls to npm Inc for bending over backwards to a corporation for a bogus trademark claim they weren't even involved in.

Sure, left-pad being deleted was what resulted in most people's problems but this was just the fallout from npm Inc forcibly reassigning an actively used package name from a major open source contributor to appease a company that didn't even threaten them directly.

link

hajile 3268 days ago

I believe the string padding proposal predated the "leftpad" incident by a year or so.

link

a13n 3268 days ago

I love the trailing commas for function calls. Wish we could get trailing commas for all of JSON too!

link

bfred_it 3268 days ago

It's called JSON5 and it also has comments. The standard JSON will never "change"

link

pas 3268 days ago

For that you'd need to wait until all major parser libraries support it, aaand that all deployments get updated. That means that little change is an at least 10 year long project.

link

WalterSear 3267 days ago

What I want are leading commas. Much tidier and one small thing you wouldn't have to edit/think about when adding and removing list items.

link

mstade 3267 days ago

Better yet: automatic comma insertion! (I'm only half joking.)

link

thekaleb 3267 days ago

Unless you need to add something to the beginning.

link

WalterSear 3266 days ago

I prefer comma first notation.

    const hello = {
    , 'one'
    , 'two'
    , 'three
    }

would be easier to visually parse than:

    const hello = {
      'one',
      'two', 
      'three',
    }

Since the commas actually work as a guide.

link

ballenf 3267 days ago

Why? I just don't find myself doing too much JSON manual editing. A few config files and such. Occasional editing of a server response or call for testing.

I'm basing this on the assumption that the desire for trailing commas is making editing easier and reducing noise in diffs.

Just curious if there are niches where people are spending a lot of time manually editing JSON or if the use case is something else entirely.

I love them in code, despite initial resistance.

link

azernik 3267 days ago

The use case for me is JSON blobs (mostly configuration files like package.json) that are checked into version control. Same motivations - reducing line-oriented diff noise.

link

sAbakumoff 3267 days ago

I suspect that they borrowed this idea from Golang's "composite literal" spec. which is really nice feature https://dave.cheney.net/2014/10/04/that-trailing-comm

link

fanf2 3267 days ago

Trailing commas have been allowed in C composite literals since the dawn of time.

link

michaelmior 3267 days ago

Thanks for sharing! Interesting to me that separate repositories are used for each proposal, which is news to me. Although I can see some nice benefits to doing so.

link

thomasfoster96 3267 days ago

I think that’s because a lot of proposals are initially developed away from TC39 by individuals or small groups.

link

bjacobel 3267 days ago

Please don't call it ES8. It contributes to confusion around the language. ES6 was renamed to ES2015. There is no such spec as ES7 or ES8.

link

thomasfoster96 3267 days ago

I know I’m technically wrong and it’s too late to edit the comment...

...but in my experience usage of ES6/ES7/ES8 as names far outweighs usage of ES2015/ES2016/ES2017. Calling it ES2017, while technically correct, would in my opinion be far more confusing to most people than calling it ES8.

link

nickm12 3264 days ago

This is switching and ES2017 is the name we should standardize on to avoid confusion. For example, Typescript accepts targets "ES6" or "ES2015" as synonyms, but it doesn't have targets "ES7" or "ES8", only "ES2016" and "ES2017".

I personally prefer the terse version, but there are advantages to using the year-based naming.

link

neurotrace 3267 days ago

In my experience, the usage has started swinging the other direction. While in general I agree that ES6, ES7, etc. would be less confusing, it will only confuse people more if the spec is officially called one thing but some people call it something else. If someone wanted to learn about, say, "ES10", they'd have to search for both ES10 and ES2019 to ensure they got what they wanted. I think it's better that everyone agrees to use the official naming scheme to avoid that kind of confusion. But, you know, that's just my opinion.

link