Hacker News new | ask | show | jobs
by dcode 1386 days ago
Perhaps this is missing some necessary context. Note that the threads happened before a resolution, and have been created a few days before using them as an argument in Wasm. There was and still is no such design principle, because it doesn't make sense. Similarly, the arguments in the thread referenced are nonsense. JSON and CSS modules are files, which have different requirements than API calls. importScript the same, loads source code. Also note that JSON actually preserves idiomatic JS strings in data (APIs) through escape sequences, even though it stores as UTF-8 (files). Preserving in JSON makes sense, since sometimes JSON.stringify/parse are used for things like synchronous IPC, where maintaining integrity is critical. The Component Model should do the same, but refuses. The others are networking APIs, where UTF-8 is common. These are asynchronous and typically untrusted, where the protocols mandate UTF-8, and not maintaining synchronous state is much less problematic there. However, mutating string data as a side-effect of a normal function call is nothing less than a hazard, since some strings then seemingly randomly won't compare equal anymore after a call and stuff like that. WTF-16 (what JS effectively specifies) -> UTF-8 is lossy. The people in the threads know that very well, but keep bringing up the same already debunked arguments over and over again nonetheless to get their way. There, the thread suggests to encourage UTF-8 for new JavaScript APIs as well, which plays into the desires on the Wasm end yet is well-known to be impossible to pull off in JS. Applying this to JS conflicts with the ECMAScript specification, because JS has the same hard backward-compatibility constraints as Java, Dart, Kotlin, C# and other languages that evolved from UCS-2 to WTF-16. None of these languages can change their string semantics, in particular not to something akin to UTF-8, since it's semantically more restrictive and hence would break execution of existing and, arguably, all future code using idiomatic APIs like substring without being aware of potential mutation. It's subtle, I admit, but given that the arguments are factually invalid, the threads can only be one of two things: Incompetence or dishonesty. As always in political contexts, the mere mortal wonders what's better. Given foregoing discussions, where the same people participated, I would rule out incompetence. Now go there and question, and always the same gate keepers show up, simulating responses in good faith. Killing you with kindness, or however that's called. Dare to be unimpressed and follow-up, and someone quick-draws a CoC. That's why I decided to try something new, there pinging various people from the TAG, in the hope to finally get eyes on this behavior by someone knowledgeable, which obviously failed as well. Not sure if that helps, but that's some of the background :)

FWIW, here's the presentation I wanted to give to the Wasm CG that explains the details and pitfalls in an easier to digest way: https://www.youtube.com/watch?v=Ri2NMnSQo4o

1 comments

It is possible to specify new APIs that say "UTF-8 only, if you pass an unpaired surrogate, you will get an error", and many such APIs already behave that way. They are, on the whole, not a major problem in practice. They are not not fundamentally incompatible with JavaScript/Java/Dart/Kotlin/C# -- it just means that if you wish to call an API that only accepts UTF-8, you must make sure your inputs are valid UTF-8 -- the only lossy case is for invalid Unicode strings, likely generated by accident. It is not dishonest to want to add APIs that behave that way.

It's fine to disagree with me or the committee, but your grand gesturing and your overstatements about how tightening a single bolt on strings will break web compatibility forever is exhausting to listen to, especially when you claim large-scale political conspiracy and use words like "fundamentally incompatible with JavaScript". I don't see any of it.

Your behavior in those threads are absurd, unnecessary, and alarmist, and I really don't have any sympathy for you. Even this reply doubles down on the over-exaggeration.

I think large parts of WebAssembly are mismanaged. I have major complaints about the velocity, instability, and web APIs, I spend a lot of time in them, I'm grumpy, and I'm not usually one to go to bat for any of this. Even within the WebAssembly/JavaScript/WASI interop, there are 20 things I'd put on the list ahead of this. If I have any advice for you, it's to pick a new battle, because this is maybe this is the highest complaint to lowest impact ratio I've ever seen. You lost, just move on.

Allow me to focus on the technical arguments, that I think fall short. Sure, one can go and make a second string type, or a bunch of throwing APIs. I'd question that this actually improves a language in a tangible way, or that anyone would want that for a rational reason that is not induced from the outside. When not doing so, it's incompatible in the sense of the word. Whether incompatibility isn't much of a deal depends, I guess. For someone not having a respective use case, perhaps, but for someone else having exactly that use case, say hashing substrings to then discover unexpected collisions, or streaming 1K chunks of strings over component boundaries, then discovering mojibake after concatenating, it might very well be significant, or even expensive. I mean, there are good reasons all those languages try the best they can to prevent that in their native habitats. What amazes me is that all that came to be because someone has formulated a desire, that could easily be fulfilled in addition with a boolean flag for W/UTF, but refuses to include such a trivial compromise, which surprisingly has more weight than any evidence, or precedents like WebIDL, JSON, or the various language standards. I find this highly concerning, since it conflicts with my understanding of responsible engineering. Also conflicts with Wasm's communication, that literally states that Wasm executes in the same semantic universe as JavaScript, and maintains backwards-compatibility with the Web. Perhaps, if there is something fruitful to spin a narrative around, then that these decisions undermine the exact value proposition of AssemblyScript, that was supposed to be used in tandem/closely with JavaScript, which now becomes risky on the fundamental level of the most prominent higher-level data type, strings. Plus, of course, when two AssemblyScript components communicate. That makes these string decisions particularly unfortunate for me personally after having spent all that time and effort, working towards Wasm's goals in good faith, perhaps explaining my persistence on the matter. Quite a dilemma.
Sure, there are technical arguments for this approach, but there are also technical arguments for "the strings in our strings API should really be valid Unicode strings". I have no horse in this race and I have no preference, but I do see both options as completely valid.

The problem is when you say things like people that prefer the latter approach "can only be either dishonest, or incompetent". Putting it kindly, you're basically only making enemies at that point, and you seem unwilling to consider other points of view, at best. You seem absolutely baffled about why your tone, phrasing, and language are making others uncomfortable, even as you continue to insult those you're trying to influence.

I have never met you before this conversation, and I came away with a very negative impression. There are reasons you aren't being listened to, and they are problems with you and your behavior, not grand conspiracies.

There are battles worth staking your entire professional reputation over -- the GC repo is full of people doing that -- but this is definitely not one of them.

Well, I tried. Anyway, even if I was the abhorrent monster you keep painting in your almost exclusively ad hominem argumentation while accusing me of what you are undoubtedly guilty of yourself, I'd argue that none of this justifies plain ignoring technical concerns in a standardization effort. This exchange is an almost perfect reflection of the practices prevalent in the Wasm CG, that made the Component Model, and likely other things elsewhere, possible almost uncontested. And surely this is deliberate abuse, and I hope people can see that. And coincidentally, that's exactly my critique. I hope nobody is surprised that being at the receiving end of this, despite your best efforts, for years, is an extraordinarily frustrating experience, and that this is exactly the point.