Hacker News new | ask | show | jobs
by dmvinson 2386 days ago
This feels like another step away from the free and open web many people are clamoring for. Distributing opaque binaries with websites instead of Javascript is a step past even the obfuscated minified javascript files meant to be confusing. At least those can still be debugged, stepped through, and explored freely by the end user if they want to learn or reverse engineer. Is there any tool or standard being worked on to make .wasm files coherent for users who have to run the code to view websites? This feels like a step backwards in so many ways, even if it is a technological marvel.
14 comments

Please, can we stop having this argument every time there is an article about webassembly ? WASM not any more obfuscated than any minified JS. Being a bytecode doesn't make you "unfree". You have access to the same tool to debug JS and WASM. And the WASM specification is open. There is literally no difference between running JS or running WASM.
A binary format is no more obfuscated than minified JS? What now?

Going to need some clarification on how that's the case.

I guess he's saying that an unminifier tool is effectively no different than a decompiler, and both are basically unreadable without it.

That said, I don't know of any webassembly decompilers, although I guess they must exist by this point. But also historically decompilers have been imperfect as some of the structure of the code is lost in the compilation process and has to be inferred, sometimes incorrectly, by the decompiler. Compare to a minifier where all you lose is the variable names, comments, and possibly helpful whitespace. All of the structure of the code is still there and there are no heuristics necessary to recreate something that resembles the original source.

There are certainly wasm decompilers -- wasm2c, wasm2js, etc. You also have access to the browser's JS debugger for breakpoints, line by line execution control, dumping wasm's linear memory.

I haven't written any productive WebAssembly but I play Capture The Flag competitions, and it's become frequent for a wasm reverse engineering challenge to be thrown in. The tools are good enough to make that tractable, even for non-experts in wasm like me.

It helps a little that it's a stack-based rather than register-based VM. Usually more of the intent of code is preserved that way. It's like reversing a JVM class, rather than like reversing a native binary.

What are these Capture The Flag competitions? Do you mind posting a link?
Sure. My favorite explanation's a short video:

https://youtu.be/8ev9ZX9J45A

That is correct, a minified JS will preserve most of the semantics of the original program. And since the original source can come from any language, how would one know which decompiler to use.
You could already compile programs from other languages to javascript.

https://github.com/jashkenas/coffeescript/wiki/List-of-langu...

Wasm was effectively an extension of asm.js. It makes the experience of compile-to-web better, but it isn't much more opaque than other projects.

WASM is a binary and a text format. You can turn any WASM binary to the text format and have a readable version of the blob. Firefox can automatically show you the text version of a WASM blob. So no, there is no difference with a minified JS. Just because there are in a text format don't make them any more easy to reverse engineer.
Unless I'm completely missing something the "text version" you're talking about is just WASM and there's quite a difference between that and minified JS.

ex:

   end $label121
   get_local $var7
   get_local $var9
   call $func3444
   get_local $var7
   call $func1500
How is that any more readable than deobfuscated js?

    func1500(func3444(var7, var9), var7)
or more likely:

    gw(kl(s,i),s)
Yes this is what I am talking about. Once you know the instruction, I fail to see how it is more difficult to understand than javascript.
That's like saying x86_64 assembly is as easy to understand as C. High level languages exist to make code easier to understand.
That certainly is a take.
There is a difference in degree, though. Unminified JS is usually easier to read than wasm text, in general.

One practical factor: I often debug wasm files by compiling them to JS first.

At least wasm has structured control flow, which helps a lot. I wish wasm had even more readability features, personally.

It's a binary format originally based on minified JS, with a standard textual form, and which can be viewed and debugged with exactly the same tools (and ease) as minified JS.
Can you point me to an example of this?
Thanks for the downvote whoever. Honestly, I want an example of how it "can be viewed and debugged with exactly the same tools (and ease) as minified JS".

I see nothing that states that is the case.

WASM is essentially a more efficient version of Asm.js, which is just Javascript. WASM is a binary format. Asm.js is Javascript. They're equally obfuscated.
Maybe I’m misunderstanding something but WASM is meant to be compiled from other languages and that source is lost, unlike minified javascript, isn’t it?
Yes, but you'd see similar issues (to a different degree) with languages like Typescript and JSX as well.

Cross-compilation has been a thing for a while now -- WASM is the followup to ASM.js, which was already being used as a compile target for languages like C.

Now, reverse engineering ASM.js is easier than reverse engineering WASM (although ASM.js is still a giant pain). And reverse engineering minified Javascript is even easier -- most competent JS engineers could debug a React project without source maps, even if it took them longer.

But it's not clear to me that WASM makes the process meaningfully harder. As in, you're still going to want to use source maps like you use today, and it'll still be totally possible to figure out what a program is doing without the original source. It'll just be a pain.

And the benefits to the web as an open, language-agnostic platform that can be used for memory-intensive tasks outweigh the downsides of needing to work harder to reverse engineer software.

Would you consider the source lost if the (minified) javascript was compiled from TypeScript?
I would. And fair enough, I think the web closed around me and I didn't even realize it. But we are all doing ok so far, so I supose wasm won't be that bad either.
Minifying JavaScript is as destructive as compiling from any language to WASM. The original source are also lost when minifying JS. The fact that the transpilation/compilation target is the same langage doesn't mean its any less destructive.
> can we stop having this argument every time there is an article about webassembly

No, we can't, it's a valid criticism, it's not going to go away. Minified JS is bad, webassmbly is worse.

What is the criticism exactly? Saying WASM is worse than minified JS is factually wrong.

We can debate about minified JS if you so desire but its a different debate.

This is a valid concern. Among many things it is much easier to audit higher level code than bytecode.
What can't you do? How is it any easier to audit minified JS than a WASM blob ? Everytime this argument is raised there is no valid argument to explain why WASM would be much worse than the current state of JS.
Minified JS (even with single letter variables and all) is still a high level language which is much easier for humans to follow than bytecode. That was only an example of it being a valid concern. I actually love WASM (specially the s-exp representation) and have implemented a compiler that compiles to it, but it's important to listen to valid concerns even if we really like a technology.
Two wrongs don’t make a right.
Then please enlighten us and share with us what is wrong.
"RISC-V is open, you can't compile proprietary binaries for RISC-V."
That is not what I said, don't try to create a fake argument.

You real sentence should be "RISC-V instruction set is open, therefor I can see whatever a binaries is doing via the instruction it is executing." Doesn't mean its free, doesn't mean its easy to reverse engineer whatever the binary is doing, but you have everything to do it.

In my opinion it’s been a long time since direct access to Javascript has been useful. Yes you can unminify javascript but it’s still more work than most people will go through (especially if the code was generated by a compile to javascript language)so for most people things aren’t really changing that much
I don't even think source code access is the most important part of the browser's dev console. Consider the network tab instead, being able to see exactly where your bandwidth is being spent and why.

This to me makes the browser vastly preferable to native apps. I didn't realize that the desktop app I use to easily translate languages[0] sends every keystroke to Google Analytics until I had to bother installing a proxy. Meanwhile this analysis is just an Opn-Cmd-I away in the browser.

[0]: https://apps.apple.com/us/app/translate-tab/id458887729

Totally agreed -- and I don't think WASM will change any of that.

The good parts of the web in terms of debugging is the separation of concerns -- having separate interfaces for CSS, HTML, network requests, and the DOM, and having each of those interfaces be relatively inspectable.

I am a little worried about frameworks that target WASM spitting everything onto a Canvas, bypassing HTML and CSS (coughQt*cought). That would be a substantial loss for the Open web. But I don't lose any sleep over the idea of replacing Javascript.

Direct access to Javascript is useful on a daily basis for those concerned with a number of security threats.
Yes, it is a ton of work to step through and understand minified and obfuscated code, but it is a skill that many people learn and do if there is motivation. On your second point, I think the key is that the people who do have a reason to detangle the logic of minified JS can be very impactful. I consider open viewing of Javascript as similar to noncompetes in California. It allows one to view competitors source code (if you have the motivation to work for it), which ultimately allows you to learn from and adapt their best practices. Yes, this can have negative effects, but it also may allow for a smaller company to leapfrog a larger incumbent who is too lazy to do some part of their processing server side. I could probably learn a lot about how to write (and block!) analytics tracking by reviewing the Google analytics javascript source code for example.

(Disclaimer: I've never looked at Google's analytics .js files and that may not be possible for some technical reason unknown to me)

> I think the key is that the people who do have a reason to detangle the logic of minified JS can be very impactful.

You don’t need javascript for this. People reverse engineer native binaries all the time. Reversing wasm isn’t much more difficult than minified javascript as my sibling commenter states.

Reading webassembly is not that difficult.

The main challenge is that variable and function names are not available, but minified js is no better in that regard.

> Distributing opaque binaries with websites instead of Javascript is a step past even the obfuscated minified javascript files meant to be confusing. At least those can still be debugged, stepped through, and explored freely by the end user if they want to learn or reverse engineer.

That's not true. There's nothing "free and open" about the tracking code embedded in every modern site, or the javascript blobs you get when you visit Google or Facebook. Minified/obfuscated Javascript is no different from a binary blob, except that it's much less efficient. Your chances of reverse-engineering one of those is about the same as reverse-engineering a wasm blob. Just because one is technically "human-readable" plaintext and the other binary doesn't make a difference, since you can't actually read either of them.

I respect your point and semi-agree, but as someone who ran a small business in high school that usually involved reverse engineering obfuscated Javascript, I think you're overstating how hard it is to follow the logic of Javascript blobs. Yes, whole program flows can be insanely difficult to follow, but narrowing in on the logic of key functions is often what one needs when trying to learn from other's code.
So are you saying that reverse engineering javascript is easy? Or that it's easier than reverse engineering wasm?

I don't know much about web assembly, but x86, which is much more complicated with thousands of instructions, has been successfully reverse engineered basically since forever. There are decompilers that can automatically reconstruct source code in C or C++ from a binary blob.

Compared to javascript, the best you can hope for is to just format the code so its in a more readable structure, but that isn't going to untangle purposefully obfuscated logic. Add to that the fact that even a regular javascript program is an untyped mess, and it becomes clear that anyone specifically trying to confuse readers will have a very easy time of doing so. There are a lot of messy things you can do in javascript, almost COBOL levels of messy.

Also, I'm curious about this

> but as someone who ran a small business in high school that usually involved reverse engineering obfuscated Javascript,

What type of clients paid you to reverse engineer obfuscated javascript? Malware research? Something else?

> I don't know much about web assembly, but x86, which is much more complicated with thousands of instructions, has been successfully reverse engineered basically since forever. There are decompilers that can automatically reconstruct source code in C or C++ from a binary blob.

That's a bit of an overstatement.

Disassembly of native executables is essentially a solved problem, and has been for decades. There is some variation in terms of how you define disassembly and how you deal with code that specifically tries to defeat disassembly, but it's solved enough that objdump -d is a decently effective tool.

Decompilation is more difficult. There were academic-quality decompilers by around the 90s, but these weren't really usable and tended to break on anything more complicated than toy examples. The JVM breathed new life into decompilers, and it's not until this point that you get decompilers that can routinely output code that is recompilable (and only in the Java domain).

In the mid-noughts, decompilation efforts returned to targeting native binaries again. This is helped by the developers of IDA Pro (the main tool used for reverse engineering) building a decompiler view into their application. There's also been more efforts on accurate static binary translation into IRs such as LLVM, which is often close enough to C to be effective, and I'm more familiar with these efforts than I am with full decompilers.

The creation of fully recompilable C source code from binaries is still a challenge, in part because machine semantics are more well-defined than C, and you basically have a tradeoff between readable output and semantically-correct (free of undefined behavior). Control-flow recovery is still challenging; signatures are needed to deal with statically-linked pieces of the standard library; and structure and type recovery is routinely of extremely poor quality.

But if the WASM was compiled from javascript or some other language that isn't very C-like then the de-compiled C or C++ code is going to be very difficult to follow.

At the very least with obfuscated javascript you are going from js => js => js. Rather then from js => WASM => C++.

But going to c instead of js is an implentation technicality. If WASM becomes commonplace I'd be shocked if some group of kind souls doesn't open source a decompiler to minified js .
There already is a compiler from wasm to asm.js, so I assume it wouldn't be too difficult to go from wasm to more typical js.
You're not going from JS to WASM, you're going from C/C++/Rust to WASM.
Meanwhile here I am running around with js off by default. Most of the internet still works. More sites work better with js off than work worse. I hope webassembly doesn't change that into a world where the "js off" analogy is "running my OS without the ability to execute programs."
> Meanwhile here I am running around with js off by default. Most of the internet still works.

Just wait 5 more years that 80% of the web switch to React / Vue / TheNewHypeSPAFramework and with or without WASM, you will be unable to browse "js off".

The blame here is not on WASM but on the abuse of client side rendering and "everything as an App" when most page are just barely interactive documents.

The Web succeeded where Flash / ActiveX / JavaApplet / Sliverlight failed because:

- it was open

- it was document oriented.

And that we tend to forget a bit too easily about it.

But now companies are paying more to push their WASM good message. Many of them actually believe it now, because they can't comprehend anything different.
Who are these people clamoring for an open and free web? I've certainly never met them. The clamoring I hear is for a faster and more usable web. The vast majority of web users couldn't decipher JS even if it wasn't obfuscated.
The people clamoring for a faster and more usable web are precisely the people who do not operate an adblocker.
WebAssembly should be relatively easy to decompile. The specification guarantees structured control flow. You will lose struct information and names, but that's virtually the only information you're not going to get from decompilation.
WASM just makes it easier to distribute obfuscated code, because it is obfuscated by default. Pre-WASM JS is capable of just as much obfuscation.

Minified Facebook or Google trackers were never libre or meant to be easily reversed. Web apps like Google Drive aren't free either just because you can run it in a browser on Linux. You aren't supposed to (legally?) be able to modify it and nor would you be able to in many cases where they try. It's just as proprietary as Microsoft Office. There are proprietary tools to do even more advanced obfuscation on top of minification (adds red herring code paths that do nothing), which some JavaScript malware vendors use to protect their implementations.

What we really want is libre JavaScript/WASM where vendors include permissive licenses and source maps or links to download the high level source. That's free software. The "free and open" web never really existed de jure; publishers' laziness to obfuscate created a de facto free and open web. Libreness depends on access to high level source, not reverseability, or else Photoshop is free too because you can attach a debugger to it.

WASM just exposes the truth that the web was an app store all along.

> Is there any tool or standard being worked on to make .wasm files coherent for users who have to run the code to view websites?

You can convert the binary files to/from the text (lisp like) format with readily available tools.

Also, the binary format is easily parsed -- made a parser with katai(sp?) struct in like an afternoon.

Doesn’t that reasoning also imply that GNU and Linux are awfully opaque and not free nor open, since they are typically distributed as binaries?
If the binaries were the only way to access GNU or Linux then yes, obviously. That's not the case now is it.
Right, because when you distribute free software in binary form, you make sure to make the source code available with a copyright license disclaimer allowing redistribution. This applies exactly to WebAssembly software just as it does with Java software. Software freedom is compatible with binary distribution.
No, because GNU and Linux are bound by license restrictions which require the unobfuscated source code to be made available.

If all websites made their source available as well as distributing the binary, there wouldn't be a problem.

Only if the sites using webassembly binaries also have the source available to download.
The source for GNU and Linux is viewable by everyone, which negates the inability to view what is happening inside a binary. This is the problem Javascript source maps are meant to solve for the web, and I would welcome WASM more if part of the standard was a requirement for a source map when browser Dev Tools are open.
> This is the problem Javascript source maps are meant to solve for the web

That's not the problem source maps are meant to solve. They exist to debug transpiled code.

> The source for GNU and Linux is viewable by everyone, which negates the inability to view what is happening inside a binary.

That's not true. It is non-trivial to verify that the binary you received was built with the source code that's openly available. The point of FOSS is that you always have the option to build your own binaries so that you can be 100% certain of what is running on your machine. Most people aren't going to do that, so they need to place their trust on a third party (like whoever built their kernel). FOSS just makes that trust optional instead of mandatory (like it is with something like Windows)

I think it's quite different, because it used to enforce (past tense) availability of the source code, which was quite neat from the consumer perspective. Linux distros had a very different dynamic, so that enforcement was effectively unnecessary.

It stopped working as enforcement ages ago, though, so ️.

Well, for one thing, mature debuggers exist (the equivalent to the tooling he inquired about for .wasm).

More importantly, however, anything GPL must make source available and reasonably accessible. There is no such guarantee or even expectation for random programs on the web.

Indeed, huge amounts of JavaScript that’s already out there does not come with a free software license. The FSF has been complaining about this for years.

As for debugging, this is not a particularly hard or fundamental problem. It’s basically solved already.

https://developers.google.com/web/updates/2019/12/webassembl...

My employer quite deliberately publishes source maps alongside our javascript, so people who want to tinker and learn don't lose the ability to see what we're doing, but end users (our press pack says we serve pages to around 1% of people in any given month) don't have to pay the cost of downloading unminified JS for every page load. I suspect that if we ever start using WASM, we'll do the same thing.

Just because you _can_ use the compliation step to (go some way to) hide your source doesn't mean you _have_ to. And relying on your secret sauce being private while you publish it in obfuscated form for all the world to decypher feels like a losing strategy.

This might actually be a great direction to go.

I don't think there's any particular reason that WASM has to be more obfuscated than JS. You can already throw a WASM file into a bytecode-to-text translator which is about as useful as deobfuscating a minified JS file, and I assume decompiling/debugging tools will only get better in the future.

For a long time now, I've been thinking of a future where your OS properly isolates all the programs that run on it and even gives us the ability to have direct control over how programs interact with the rest of the system. OS's seem too mired in backwards-compatibility requirements to make big changes like that any time soon, but that's basically the way our browsers already work. Download some code, and execute it (relatively) safely because it's sandboxed from the rest of the system. Our browsers are basically the new OS, and this time around we can do it right using what we learned from OS's (and hopefully backport these browser features into the next generation of OS's).

For example, an app asks for a filesystem handle. You can hand it one that refers to a real location on your OS fs, or you can hand it a completely virtual fs that won't affect anything else on the system.

Whenever an app asks for a resource, being able to hand it a virtual or sandboxed one instead is a huge gain for user-control.

Dissassembled webassembly is probably way more readable than minified/packed js
> Distributing opaque binaries with websites instead of Javascript is a step past even the obfuscated minified javascript files meant to be confusing. At least those can still be debugged, stepped through, and explored freely by the end user if they want to learn or reverse engineer.

If you really think this is true, look into Google's recaptcha blob.

Why should someone else's website code forced to be "free"?
Because otherwise it's not a website?

"By the end of 1990, the first web page was served on the open internet, and in 1991, people outside of CERN were invited to join this new web community.

As the web began to grow, Tim realised that its true potential would only be unleashed if anyone, anywhere could use it without paying a fee or having to ask for permission.

He explains: “Had the technology been proprietary, and in my total control, it would probably not have taken off. You can’t propose that something be a universal space and at the same time keep control of it.”

So, Tim and others advocated to ensure that CERN would agree to make the underlying code available on a royalty-free basis, forever."

Can you see how rude it is to not do the same ?

1. We're well past websites into web apps. You are not entitled to the source code of these apps, like Gmail/GSuite - that are client heavy web apps, with logic, state, custom-IPs & algorithms.

2. The underlying code of the web's infrastructure is available on a royalty-free basis, and shall remain as such!! There's immense benefit in maintaining this equal-opportunity status-quo.

"Web apps" are an oxymoron. Transforming the browser into an OS inside the OS is just bad practice (but the reason that happened was because Microsoft sucked, and Google wanted more control over computing - see also Valve with Steam patching games).