Hacker News new | ask | show | jobs
Hardening Drupal with WebAssembly (wasmlabs.dev)
75 points by gzurl 1120 days ago
10 comments

No outgoing http requests, no ability to connect to an external database such as mysql- only SQLite is used.

No explanation for what vulnerabilities standard Drupal is actually being hardened against.

> No explanation for what vulnerabilities standard Drupal is actually being hardened against.

But... Are you not afraid of these unknown, not-yet-public vulnerabilities that this solution clearly protects us against?!

> But... Are you not afraid of these unknown, not-yet-public vulnerabilities that this solution clearly protects us against?!

No, not really. First of all, a heavy disclaimer: I am speaking as a rather cynical, old, downright ancient PHP developer and not in any other capability. Not as a former Drupal security team lead, not as a developer for whatever clients I have, just plain old me.

The biggest threat in my opinion on a properly hosted Drupal is information disclosure and privilege escalation within that Drupal instance. Arbitrary code execution is utterly pointless because the user that runs Drupal will have no privileges and there'll be nothing installed on the host, nothing to run anyways. And guess what, none of that benefits from this WASM sorcery. SQL injection and XSS -- and potentially allowing private files to be downloaded -- is the real problem while this protects from arbitrary code execution.

yes, I think it seems to focus on rce vulnerabilities which in drupal are relatively rare. But also it seems almost like a research project or a work in progress, hence may not be immediately ready for prod? with that said, would be nice if both these things were stated upfront to set the expectations correctly. Like, add "experimental" to the title maybe.
it's not only that rce vulnerabilities are rare, I contend they hardly matter. You should not have anything in your php containers an RCE can run.
(Wasm Labs dev :)) For me, this is the key point. The end goal is to limit the attack surface for future vulnerabilities. This is not something specific to Drupal or PHP, but an example about technologies / apps that can take advantage of Wasm.
Don't disagree - but if I have a limited amount of resources to harden my Drupal server, it might be best to start looking at hardening around the most commonly exploited Drupal vulnerabilities.

Having said that, searching Druapl on the CISA know exploited list shows a number of remote code execution vulnerabilities that this would help mitigate: https://www.cisa.gov/known-exploited-vulnerabilities-catalog

Indeed! As an example, SA-CORE-2020-013 can be mitigated with Wasm. An that one is classified as Critical.
Take a look at the linked article about how some PHP vulnerabilities can be mitigated with WebAssembly: https://wasmlabs.dev/articles/mitigating-php-vulnerabilities...
Yeah, would have been nice to have a list of the last dozen or so medium+ Drupal CVEs and if they were mitigated to any degree by this setup.
SA-CORE-2020-013 is one of them.
Does is prevent it, or just limit the arbitrary PHP execution to inside the WASM sandbox? If the latter, that's still helpful, but still leaves quite a few of the typical end goals in place...like altering the content presented to visitors, etc.
yup, PHP RCE inside the PHP WASM sandbox is not much different in scope than PHP RCE inside a drupal apache container more or less - you get RW on the complete drupal instance.
We are working on adding all that functionality, explanation here: https://news.ycombinator.com/item?id=36115146
For a different approach that has FastCGI and MySQL support: https://wasmlabs.dev/articles/wordpress-nginx-fcgi-mysql/
I don't see a breakdown of the costs of this layer of hardening. There's nothing on performance, memory usage, library and database support, and so on. Just a few modules that don't work optimally, noticeably the OpCode cache that speeds up PHP quite significantly.

Just stuffing something in a sandbox is easy. Set up a VM and don't add a network card and you've got a near perfect sandbox without any performance overhead. Maybe even do so for each request through Firecracker if you fear memory persistence. What matters is all the added complexity and the downsides of each solution.

Without any comparison, this attempted sales pitch for WASM is little more than a showcase, like when someone runs WINE in the browser.

You are completely correct. These are just the early steps to get it into people's radars. We follow the Unix approach of "Make it work, make it right, make it fast". We are on the "make it work" step and moving to "make it right", for which we will focus to run as many unmodified applications as possible. This requires better support for PHP features that still depend on WebAssembly-related features that are not fully baked yet (ie threading, WASI preview 2) as well as adding over the network database support (ie MySQL) since most apps don't use SQLite in production.

Once all that is in place, you can move into the "make it fast" part, but for many scenarios where security is an issue (high profile websites) even a significant performance penalty (that can be offset with beefier hardware or caching) may be desirable.

Ok, considering that it comes from VMware, I think they aim to create some kind of a multi-architectural execution environment that can host apps in a hardware-independent way on multiple clustered nodes.

I applaud this development but at the same time any kind of run-time code translation imposes performance penalties. WASM is not exception. It would be enlightening to hear the authors' stance on this topic. CPU utilization factor is a big thing for hosting companies, so they will be hesitant to use something like this in production.

VMware is a big company and there's a lot of interesting ways Wasm is being used internally. In our particular case, the aim of our group is not the one you outline. It is a bit higher level: how can we bring the most developers possible into the Wasm ecosystem? PHP + mod_wasm support is a way of getting the "LAMP stack" crowd an easy on-ramp. There are other ecosystems like Rust that already have had great Wasm support since early on and that's not necessary. There is work going in the direction that you point out, and there was a WasmEdge article a few hours ago on how to run on ARM, etc. https://news.ycombinator.com/item?id=36110253
So a GraalVM competitor but on WASM.

I do want to see the WASM ecosystem grow and thrive, but it will take some time, although what I have seen so far is amazing. I am most excited for WASI. Havent checked up on it in a moment but its basically the WebAssembly version of nodejs but as a spec anyone implementing a WASM runtime can support, so there will be various options to run your code ultimately.

Yes, I agree on WASM versatility, but: when I use my Mac M1 (with ARM64 processor) and try to run a x64 based executable, there is a small but noticeable lag on the app's start. It is caused by the dynamic instruction translation, essentially by JITting x64 code to ARM64 instruction set on the fly.

While the lag is small, it is unpleasant enough to make my work less comfortable when compared to the native ARM64 images.

The story repeats itself with Windows 11 ARM64 - it can run x86 and x64 images but there is a price to pay - they are not as snappy as native ARM64 images due to the dynamic translation costs.

Having a WASM on board poses the same kind of dilemma which directly impacts hosting bills.

Actually, that is one of the Wasm key features: being a platform-independent binary not tied to any specific HW architecture.

Translating x86_64 code to arm64 requires more CPU cycles than JITting Wasm into arm64 or x86_64.

Buy even more: in the use case covered in this article, the PHP interpreter won't change so it can be AoT compiled upon deployment to the target architecture. No JIT compilation needed :-)

I used to work with Drupal, it was a security nightmare, like almost all community-based extension system (see also npm). Modules made Drupal popular, they also made it close to impossible to have any sense of security, as only very popular modules would be audited, and you basically needed a panoply of smaller ones depending on your use case.

I wonder if that's changed over time, for instance I could see GPT3.5 doing a good job of summarily inspecting modules systematically.

This is also the case with similar popular platforms like WordPress, there is always tension between expanding what you can do to include nearly everything people can think of, and at the same time keeping some basic checks/safeguards. There is also the issue that the most popular platforms and plugins will receive the most attention from attackers. Technologies like Wasm at least provide hope to alleviate some of those concerns by "sandboxing" the modules (ie if a module is a spellchecker it should not be allowed to connect to the network, etc.)
I'm not familiar with how it used to be, but Drupal does have a security advisory process[1] for contributed modules. I don't know, but I'd imagine having moved the contrib ecosystem to Gitlab, and composer-based dependency management, has made some dimensions of security auditing easier to do at scale. I'd say it's still true that you can't easily stay 100% within that security advisory umbrella.

1. https://www.drupal.org/drupal-security-team/security-advisor...

Is there already a law in place, or are we supposed to use Godwin's?
I took a quick look at the examples that come with mod_wasm module: from what I could see, the HTTP/WASM interface is based on CGI.

I kind of like it because CGI is architecturally elegant, but what about performance implications? By default, the CGI is not the fastest thing on Earth. Every time a HTTP request arrives, a CGI process should be initialized over and over again wasting CPU time on BSS section / runtime library initialization. Does it work any differently when it comes to WASM?

> I kind of like it because CGI is architecturally elegant, but what about performance implications? By default, the CGI is not the fastest thing on Earth.

(Wasm Labs dev here) We haven't yet measured performance in a meaningful way that we can use to compare, mainly because our PHP builds also have certain limitations mentioned in the article, such as not having OPcache available at PHP. It is however an iterative process, and we are interested in this, as well as we understand that it's interesting for the community.

> Every time a HTTP request arrives, a CGI process should be initialized over and over again wasting CPU time on BSS section / runtime library initialization. Does it work any differently when it comes to WASM?

It's a little different in Wasm, in the sense that there is no need to fork/exec (if talking about bare CGI). The PHP executable gets loaded into the runtime once (what will compile it from Wasm -> native ISA), and then for every request we create a Wasm execution context, and run the PHP Wasm module that interprets the PHP script.

Creating a Wasm execution context is certainly much lighter than a fully fledged fork/exec.

Wasm Labs dev here :)

In mod_wasm, there are some differences with a pure CGI implementation. When Apache boots, it loads the configuration and initializes the WasmVM. When a new HTTP request arrives, the VM is ready so you don't need to initialize a different process to manage it.

You still need to process the request and pass the data to the Wasm module. This step is done via STDIN through the WebAssembly System Interface (WASI) implementation [0]. The same happens in the opposite direction, as the module returns the data via STDOUT.

So, the CGI pattern is still there, but it doesn't require new processes and all the code runs in a sandbox.

However this is not the only way you can run a Wasm module. In this specific case, we use CGI via WASI. In other cases, you may compile a module to fulfill a specific API, like ProxyWasm [1] to create HTTP filters for proxies like Envoy.

- [0] https://wasi.dev/

- [1] https://github.com/proxy-wasm/spec

> when a new HTTP request arrives, the VM is ready so you don't need to initialize a different process to manage it.

I suspected that, pretty smart. I presume that a WASM module state can be cached in a similar way, essentially imitating a "fork" syscall semantics while keeping the pool of request handlers in a always-ready hot state.

Or better PHP-FPM?
Kind of what you want, with wasmEdge and FastCGI https://wasmlabs.dev/articles/wordpress-nginx-fcgi-mysql/
Have people also faced these same types of problems with Wordpress? How did you deal with it?
That article doesn't explain what the WASM code does in Drupal.
I understood that Drupal/php is compiled to WASM and then run on a WASM runtime.

So no WASM inside Drupal, but Drupal run as, and in WASM.

Exactly. The Drupal code is unmodified and runs on top of the PHP interpreter compiled to Wasm [0]. The Drupal PHP code is not compiled to Wasm.

- [0] https://github.com/vmware-labs/webassembly-language-runtimes

Is that possible, though? To compile PHP to WASM seems possible¹, but not sure if Drupal can be compiled: drupal does (did? back when I did Drupal it certainly did. I certainly did) some really funky metaprogramming and eval() stuff. Like keeping PHP in the database.

¹https://github.com/wapm-packages/php

I think you missed this paragraph:

The Apache HTTP server and the Drupal packages remain unchanged. However, instead of loading the libphp.so extension module, it incorporates mod_wasm.so. In addition, instead of relying on the traditional PHP interpreter, it utilizes a PHP build in the WebAssembly binary format.

So instead of trusting libphp.so, you have to trust mod_wasm.so that is running a php interpreter internally. Is this a win? Genuine question.
The biggest gain comes from mod_wasm using a Wasm runtime to run the PHP interpreter in a sandboxed environment. And mod_wasm can be used the same with Python or Ruby, or anything that builds for WebAssembly.

- No errors (even fatal) from the Wasm module will bring down Apache HTTPD. - mod_wasm's code base is small enough to inspect thoroughly and test thoroughly to ensure it is secure - Indeed you will now have to trust the embedded Wasm runtime (wasmtime as of today). As a naïve analogy, this is the equivalent of trusting a virtual CPU(Wasm)+OS(WASI), which gets tested in thousands of other scenarios than the ones used in mod_wasm and issues get fixed and corrected asap.

"Sod it, let's just stick it all behind another vm"
The article claims NASA use Drupal, which was quick to verify as true, but NASA also uses Django, or at least the Django project still claims they do.

Why on earth are they using both?

Do you think NASA is some monolithic entity? If your company had a budget of 20+ billion dollars you might use two webframeworks.

   > Why on earth are they using both?
You lost the chance for a great pun here.
Yeah, there's enough space.
For the same reason almost ALL old companies that have existed in the tech sector for more than 20 years use more than one language/framework/programming practice/etc... Things grow, things get acquired, things languish.
This article explores how Drupal can benefit from the capabilities-based security model offered by WebAssembly, a portable binary format that allows execution of code in a safe and efficient manner. By deploying Drupal within a WebAssembly-based stack, it gains an additional security layer, protecting against a wide range of vulnerabilities, including those that may not be public yet but can be preemptively mitigated through these mechanisms.
> capabilities-based security model offered by WebAssembly

What? Since when does WebAssembly natively ship with a "capabilities-based security model"?

> protecting against a wide range of vulnerabilities, including those that may not be public yet but can be preemptively mitigated through these mechanisms

Not yet public vulnerabilities?

Who was this article/summary written by? Reeks of GPT or at least someone who doesn't actually know the subject very well.

Capabilities based means that by default Wasm cannot do anything with the outside world. You have to explicitly declare the specific access you are giving, for example if the runtime implements WASI filesystem access you need to specify which parts of the underlying filesystem will be accessible to the module

Not yet public means exactly that. You may have a buffer overflow issue in your code that you are unaware of. There are technologies that help mitigate those when/if discovered. Wasm is one of them but not the only one (ie most modern compilers have specific settings to harden the binaries against some issues)

I'm 99% sure this was written by some flavour of GPT.
Yes, sure. WasmGPT this time :-)