Hacker News new | ask | show | jobs
by ReactiveJelly 1797 days ago
> On August 3rd, the WebAssembly CG will poll on whether JavaScript string semantics/encoding are out of scope of the Interface Types proposal. This decision will likely be backed by Google (C++), Mozilla (Rust) and the Bytecode Alliance (WASI), who appear to have a common interest to exclusively promote C++, Rust respectively non-Web semantics and concepts in WebAssembly.

> If the poll passes, which is likely, AssemblyScript will be severely impacted as the tools it has developed must be deprecated due to unresolvable correctness and security problems the decision imposes upon languages utilizing JavaScript-like 16-bit string semantics and its users.

So, the problem is that AssemblyScript wants to keep using UTF-16? I'm not sure I understand.

Is AssemblyScript the thing that lets you hand-write WebAsm?

2 comments

Yes, it seems they want to use UTF-16 strings.

I’m confused why they can’t just switch their (nascent) language to UTF-8, and if so why the alarmist attitude? I didn’t think they were mature enough to claim no breaking changes, for example.

I probably prefer we drag the web (and .Net and Java) platforms towards UTF-8, to be honest… but maybe that’s just me.

Realistically speaking, you can't "switch" AssemblyScript to UTF-8 unless you also decide it only can run in UTF-8 host environments (i.e. not web browsers). Right now it uses UTF-16, which is what the web uses. If you move it over to UTF-8 now every operation that passes strings to web APIs has to perform encoding and decoding, and you end up with a bunch of new performance and correctness issues. It's a very complex migration.

P.S. the web will never switch to UTF-8. It would break too many web pages. Most browser vendors won't even accept breaking 0.1% of web pages, unless they're doing it to show you more ads (i.e. Chrome).

JavaScript is called the entire "web" now? HTML and CSS work with UTF-8 just fine and the majority of the WWW uses UTF-8 to serve them.
The canonical representation of DOM content is DOMString (https://developer.mozilla.org/en-US/docs/Web/API/DOMString), which is not UTF-8. Your HTML being encoded in UTF-8 is irrelevant, it gets decoded when it's loaded into whatever the canonical representation is. Your HTML could be in Shift-JIS or ASCII or whatever and not UTF-8, same difference.
This is exactly right. UTF-8 is the transmission format that your HTML gets sent in, but it is not the format of strings in JavaScript at runtime.

The problem being discussed is about runtime interoperability between JS (with WTF-16 string format) and WebAssembly.

I think you know the poster is taking about programming languages, where the bulk of complexity lies.
Yes, but WebAssembly operates at the boundary with JS, and that is not UTF-8. JS uses WTF-16 at runtime, and if WebAssembly did too then this would make interop between Wasm and JS a first-class feature with maximal performance and without security and data integrity issues.
I don't understand. Can't you just abstract this?

Python has several internal string representations to reduce conversions.

Really "UTF-16 string" and "UTF-8 string" are code smells. Applications should be using character/code point sequences or byte sequences.

Using code unit sequences is....bizzare. (Yes I know Java, C#, JS has chosen to do that, but a new language has an opportunity to improve.)

You can abstract it, but AssemblyScript did not. So it's not a trivial change, it's a complex migration. Similarly, you can use UTF-8 in Java and C#, but you can't just "switch" them over to the encoding directly, it has to be exposed via new types/etc.
Note that AssemblyScript rides on TypeScript language syntax. How would

``` let foo: string = "whatever" ```

be able to work in any similar sense as TS/JS if? How can that map to multiple string types? The idea is both AS and TS use the same syntax for strings, and are compatible across boundaries (TS for JS side, AS for Wasm side). Having multiple string types is possible, but this would greatly reduce developer ergonomics.

First you have to decide whether AssemblyScript is a language, or a compiler for an existing language.

If it's a language, then it gets to decide what ```let foo: string = "whatever"``` means.

If it's a compiler for an existing language, then semantics have already been decided, and the compiler has to implement it.

But none of the precludes abstraction to reduce data type conversions.

If I understand correctly, AssemblyScript has been building ahead of specs on Interface Types?

I sympathize with pain, but the bleeding edge of tech does...bleed.

Because if they did, then interop with JS would require performance-losing conversion any time a string needs to be sent from one side to the other, making Web a secondary and irrelevant target compared to native.

That's not what the web needs. The web needs WebAssembly to work flawlessly with JavaScript for maximal potential, so the web will be great and not just a performance landmine that native developers will laugh (as much) at.

What should Blazor and TeaVM do when existing code allows for isolate pairs? If they perform implicit conversions to utf-8 they have the option to either trap, or perform lossy conversion which has immense security and data integrity implications.
AIUI, AssemblyScript is a TypeScript-like language that is designed to compile to wasm.