Hacker News new | ask | show | jobs
by conrad-watt 1795 days ago
I agree with the linked quote - it captures an important reason why it is valuable to _enforce_ sanitisation at component boundaries, rather than merely documenting "please don't rely on isolated surrogates being preserved across component boundaries" (which would be a problem if we didn't enforce it, since an external component you don't control may be forced to internally sanitise the string if it relies on (e.g.) an API, language runtime, or storage mechanism that admits only well-formed strings).

EDIT: since a whole other paragraph was edited in as I replied, I will respond by saying that within a component, your string can have whatever invalid representation you want. Most written code will naturally be a single component (which could even be made up of both JS and Wasm composed together through the JS API). The code may interface with other components, and this discussion is purely about what enforcement is appropriate at that boundary.

EDIT2: please consider a further reply to my post, rather than repeatedly editing your parent post in response. It is disorientating for observers. In any case, my paragraph above did not claim that there will be one component per language, but that the code _one writes oneself_ within a single language (or a collection of languages/libraries which can be tightly coupled through an API/build system) will naturally form one component.

1 comments

Sure, we could resolve this problem by either a) giving these languages a separate fitting string type to use internally or externally (Rust for instance can use 'string' everywhere) or b) integrating their semantics into the single one so they are covered as well as first-class citizens. And coincidentally, that would fit JavaScript perfectly, which is rather surprising being off the table in a Web standard. Yet we are polling on having a "single" "list-of-USV" string type, likely closing the door for them forever with everything it implies.
There is no problem, assuming that one believes that the list-of-USV abstraction (i.e. sanitising strings to be valid unicode) is the right thing to enforce at the component boundary, _including_ when the internals of the component are implemented using JavaScript.

I appreciate that this is exactly the point where we currently disagree, and accept that I won't be able to convince you here. However, the AS website's announcement did not make the boundaries of the debate clear.