Hacker News new | ask | show | jobs
by dgreensp 1733 days ago
I think the one-sentence version of this is that Workers are meant for small, undemanding tasks (for example, they have tight memory limits and don’t have great performance), so using them to do “serious number crunching” at the edge, which is the advertised use case, seems questionable.

I think the blurb about the downsides of Wasm is just too generic, it’s a sort of “why Wasm isn’t preferable to JS in all cases” for the uninitiated. It may not be meant to imply that number crunching is the use case.

3 comments

> I think the one-sentence version of this is that Workers are meant for small, undemanding tasks

Not at all! We're building a platform on which you can build your entire app. What you say may have been the case four years ago when Workers launched, but since then we've added Durable Objects, Cron triggers, much longer time limits, etc. We very much believe Workers can be a stand-alone alternative to other cloud providers.

> for example, they have tight memory limits and don’t have great performance

This isn't true.

"Performance" is a vague term, you need to clarify the use case and what you're measuring. But, I can't think of what you could mean by "don't have great performance", that seems to imply that they execute code slower or something, which just isn't true at all. In many cases, Workers perform much better than you could achieve with any other platform, due to the ability to spread work and data across the network and move it close to where it's needed.

The "memory limit" on a single worker instance is 128MB, but Cloudflare runs many instances of the worker around the world, so across the network you're really getting many gigabytes of memory. By building a distributed system based on Durable Objects, you can harness the memory of many instances to use on a single task. Workers definitely biases towards distributing load across the network rather than running a single fat instance of your server, but that just means Workers makes it easy to build apps that scale to much higher.

What this article is highlighting is that Wasm is still an immature technology. That is, unfortunately, just a fact. There's still work to be done, and progress is being made, but it's still early. The code footprint issue (because every app must bring along its own language runtime) is the biggest blocker. We hope to see that solved with dynamic linking.

But, Workers isn't primarily based on Wasm. The vast majority of Workers are written in JavaScript, where these issues don't exist. Workers runs JavaScript just as fast as any Node.js server, and runs it closer to the client resulting in better latency.

I don't see how your response addresses the article's issues where workers don't have enough memory, space, or runtime to perform number crunching, meaning that for people with those use cases it's not a full "alternative to other cloud providers".
I have a a server engine that runs a lot of small (and potentially unique) tasks (say 0.2% of 1vCPU and 30Mb of RAM), but require durable disk (rocksdb, sqlite for instance + some random files) and I want to locate them in specific regions depending on the server they are targeting. Would Cloudflare workers be good enough for that or shall I stick to DO's cheap vms ?
Native edge computing (i.e. containers or VMs at the edge) are probably more suitable than workers for such a task. Granted workers do have persistence options (e.g. KV store) but you'll have to target your existing app towards them.

Disclaimer: I work at StackPath which offers containers and VMs at the edge with anycast IPs which is perfect for this use case.

Probably!

But, it's a little hard to answer the question based on your description, because you've described your setup in traditional server terms that don't translate directly to how Workers does things. Workers doesn't have tasks running in VMs or containers, it's a serverless distributed systems platform where code runs in response to events across a wide network.

So, in order to tell you how to build your application on Workers, I'd need to know what the application actually does.

Most likely, though, you would replace your rocksdb/sqlite storage with Durable Objects storage. You could locate different Durable Objects in specific regions as desired.

Workers doesn't really do regions. The code runs in whatever datacenter is closest to the user. It also doesn't have the concept of durable disk. You can use KV, with its trade-off of eventual consistency, or use something like FaunaDB or Firebase, but that means that the request has to wait for the request to the backing service. Its also not super great for long running tasks. The possibility is there with Workers unbound, but I found the pricing structure for it to be a bit opaque.
> We very much believe Workers can be a stand-alone alternative to other cloud providers.

Time for Durable Workers: Run one Worker (per Durable Object instance) with higher RAM and wall-time ceiling, capable of serving WebSockets and WebRTC data-channels (and gRPC, if we are being ambitious)?

> The "memory limit" on a single worker instance is 128MB, but Cloudflare runs many instances of the worker around the world, so across the network you're really getting many gigabytes of memory.

Throw-in the zonal Cloudflare cache (which is free upto 500MB per cached-object) and some clever workarounds, a lot could be done. May be Cloudflare dev-rel could to add it in a "Workers SDK" of sorts to make this easier, or the eng team can work towards seamlessly exposing it to Workers instances as swap space? :D

Small note on this:

> What this article is highlighting is that Wasm is still an immature technology.

I assume you meant on the server, where wasm is fairly new, that's true. On the browser, wasm is a mature and stable part of the Web platform.

Otherwise very good points!

I don't, actually. Wasm is immature in the browser, too, for the same reasons. Writing a client-side web app entirely in Wasm generally doesn't work well today because you'll force the client to download much more code before the page can run. Lots of work is being done to improve this, like adding built-in garbage collection to the Wasm runtime so that apps written in GC'd languages don't need to ship their own. Dynamic linking will help too, assuming that language runtimes are allowed to be cached across different sites.

In both environments there are certainly use cases where Wasm provides huge advantages. But those use cases are still narrow. Over time it'll grow but there's still tons of work to do.

Ah, I think we disagree on the goals of wasm.

Yes, if you want to write a client-side webapp you run into limitations. That wasn't one of our main goals when we created wasm, though! It would be great if that materializes - more options are always good - but JavaScript is frankly the right tool for 99% of sites and we never intended wasm to directly compete with JS there.

Wasm is stable and mature for solving the needs of sites like Google Earth, Unity games, Figma, Meet, Zoom, etc. Those require more than what JS can offer and wasm is the perfect fit for the relevant parts of them.

On those websites wasm is often the difference between shipping and not shipping. That's a huge deal, and why wasm has been focused there. Other use cases like replacing JS with wasm might offer some benefits in speed, perhaps, but the impact of that would be smaller (but it could eventually apply to a wider set of sites, potentially).

Right. Same for server. You can do most of what you want to do in JavaScript and it'll be fine. Use Wasm to fill in the gaps where JavaScript doesn't work. That's a good strategy today with Cloudflare Workers, too.

It's when people want to write entire apps entirely in their language of choice, and want to accomplish this using Wasm, that the technology is still missing things. A lot of people want to do this, both on the browser side and the server side.

I agree that's a common desire, but how likely is it that actually becomes feasible? I think programmers underestimate the degree to which languages and VMs are coupled.

Adding GC to WASM makes it essentially like the JVM because it has to know about the layout of every type (to find pointers, etc.) As far as I can see, this effort is like bolting a VM that's 2x-5x as big (in terms of semantics) on top of the existing small WASM VM.

I think they will end up with something like the union of JVM and CLR [1], and even that's not enough.

JS already has garbage collection, but its runtime data types can't really host something like Java or OCaml efficiently.

----

The CLR is supposedly language-agnostic, but I'd argue it's not. Visual Basic was "broken" for this reason -- VB.NET is more like C# than VB6. The old code doesn't run.

I've heard PowerShell described as a weird shell-like syntax for writing C# programs.

And I remember F#'s behavior around null, algebraic data types, and exceptions was heavily influenced by the CLR. In some ways it's probably closer to C# than its prime influence of OCaml.

So while I don't know anything about the WASM GC effort (and haven't kept up with it), I'm skeptical that we'll get a true polyglot experience. What's more likely is that some languages will be favored over others, with the "losers" experiencing 2x - 10x slowdowns.

And this doesn't even get into the runtime library issues. For example PyPy is essentially perfectly compatible with CPython at the language level, and has been for over a decade. Still, many applications have difficulty migrating to it because they lose bindings to native libraries, like linear algebra with NumPy, and OpenGL, Win32 bindings, etc. (these are enormous)

I expect the analogous issue to be a big problem for using WASM in a polyglot fashion too.

----

As a separate issue, WASM is still not up to par with native code in terms of protections around the stack and the heap: https://lobste.rs/s/a9ghhz/maintain_it_with_zig#c_ghawis . Thus it favors Rust over C/C++, since Rust enforces more integrity at compile time.

Real apps need to poke many holes in the VM to get anything done, and those attack vectors matter more as that happens. WASM follows the principle of least privilege better but it has regressed in other dimensions (at least if you want to run legacy C code, which was the original use case advertised)

----

[1] Random article from Google suggests that this is HARD, and these are two of the most similar VMs out there: https://www.overops.com/blog/clr-vs-jvm-how-the-battle-betwe...

CLR includes instructions for closures, coroutines and declaration/manipulation of pointers, the JVM does not

Another smaller difference is that the CLR was built with instructions for dealing with generic types and for applying parametric specializations on those types at runtime.

Still, as you wrote earlier this year [1], there are valid use cases for something chunkier than isolates (and yes, I know that lifting and shifting existing web apps isn't one of them). I'm looking forward to your edge containers becoming more widely available.

[1]: https://blog.cloudflare.com/containers-on-the-edge/

> I think the one-sentence version of this is that Workers are meant for small, undemanding tasks (for example, they have tight memory limits and don’t have great performance)

That could be most web apps functionalities. Things like registration, authorization/authentication, sending emails, store/retrieve data, etc...

> so using them to do “serious number crunching” at the edge, which is the advertised use case, seems questionable.

Cloudflare workers don't run in the background. They block the HTTP request. For serious computation, Cloudflare should offer background workers that can run for extended periods of time. [1]

1: This could be tricked by triggering an async request, but there is no push API to get notify the "App" of the result.

Workers does have a Cron like functionality. My memory is fuzzy but it's been around for a while.

https://developers.cloudflare.com/workers/platform/cron-trig...

Assuming you pull results every 1 second, that's 86.4k requests a day for each potential user you have. Another solution, is it have a single CRON worker that watches for all notifications, kinda a basic implementation for a Push API.
A request per second per user is just bad architecture. You will need something like you're saying to batch processing in some way.
> Cloudflare workers don't run in the background. They block the HTTP request. For serious computation, Cloudflare should offer background workers that can run for extended periods of time.

You can use `event.waitUntil()` to schedule a task that runs after the HTTP response has completed, and you can use cron triggers to schedule background work in the absence of any HTTP request at all. You can even build a reliable async queuing system on top of cron triggers and Durable Objects, though at the moment it's a bit DIY -- we're working on improving that.

Thanks. I didn't know about waitUntil, that can unlock some powers.

> we're working on improving that.

I'd assume you are working for Cloudflare. Do you see it going the way of firebase?

I'm the tech lead for Cloudflare Workers. :)

I don't know much about Firebase, to be honest, so I'm not sure how to answer that question. But, our aim is that every type of server compute should be something you can build on Workers. Meanwhile, our design philosophy is that Workers should feel like you're programming one big, globe-spanning computer, rather than lots of individual servers.

Given HTTP pipelining and fetch() promise semantics, I'm not sure there's a practical benefit to pushing results instead of "blocking" the original request.
If you want to run something in the background whether the user page is still active or not. Once the computation is complete, it pushes a notification to the user or trigger other workers.
I for one hadn't thought about the cost of shipping your own standard library with every bundle, so it was informative for me
WASM tasks shouldn’t need a full standard library. If you statically compile against any library it should only keep the pieces used.
That's fair, but still, you'll be pulling in a lot of really fundamental stuff that JavaScript gets for free. String manipulation, fundamental data structures, iterators, HTTP, JSON (de)serialization, etc.
One of the examples he mentioned was 200Kb.

That's still a lot more than "a single file of javascript", certainly, but it's not that bad.

Trade-offs all the way down, as ever.

I'm not sure but if you need any string manipulation at all it will be really hard to not use the rust std?
I think their point is that what's referred to as "tree-shaking" in the JS world is (I'm pretty sure) normal and standard for most statically-compiled languages like Rust. So you won't bring in anything you don't need, though you will still be bringing in lots of stuff you do need
Generally true, but just to add to that, in languages like Rust and C++ you often end up including stuff you don't need due to toolchain limitations. For example, indirect calls (from Rust traits or C++ virtual methods) are hard to get rid of in general.

This is a pretty significant cause of bloat in practice, sadly, but toolchain improvements may help in the future. So for now, JS's advantage of the standard library being in the VM is pretty significant.