Hacker News new | ask | show | jobs
by bertolo1988 3171 days ago
You can get rid of the need to multi threading by deploying more containers in the same machine or via orchestration.

I don't understand why people keep insisting that the lack of multi threading support is a Javascript problem when there are better and more scalable ways of using your machine resources.

6 comments

> You can get rid of the need to multi threading by deploying more containers in the same machine or via orchestration.

What if you have a large shared in-memory data structure that you want to update with lots of irregular translations in parallel? Like many graph problems? How are you going to do that with multiple containers? As in industry we just don't understand how to distribute that kind of problem effectively.

Shared data structure among multiple threads... this sounds utterly fimilar and evil! Redis is single-threaded, probably one of fastest,has data different structures, can handle high loads, code is easy to reason, something that just works.

One of the reasons Node is successful is the simplicity of single threaded code. Way easier to reason, I would question the usage of Node if you are doing something CPU bound with it. You can use golang or C# with tasks for that.

Think outside of web workloads! Not everyone is writing a web app.

Think about something like Delaunay triangulation or mesh refinement. These are critical path bottlenecks for a great many applications and in practice very parallel, but they're irregular so we cannot easily distribute the data structure. The best results we have are for shared memory thread models. We don't know how to do it any other way!

That's why the post you just replied to suggested that Node may not be the right tool for those kind of tasks.
The problem is that I see a lot of projects written in languages with single-threaded runtimes (python more often than node) that become difficult/expensive to scale and extend down the road. I loathe the idea of a rewrite, but sometimes the initial language choice lacked forethought to the point where it makes sense to rewrite in something that actually can make use of all of a machine's processing resources.

Things like greenlet and gevent (and likely napa.js) are band-aids over the underlying problem.

For these workloads, I would consider a compiled language with great parallelism/concurrency. e.g Rust or GoLang.
So just because shared memory is hard you are ok with sacrificing performance and replacing memory access with io hops? That sounds like an overkill and not suitable for every task.
I like node.js and use it very often (had my first package reach over 100 stars, woohoo :) ) but I don't understand why it needs to be suitable for every task.

If you really want to do something creative with the shared memory, I guess you could do that in a "native module" written in c++ or even Rust[1].

I'm not saying that it's not doable with JS, it's just that it's already been done (as in, has a solution that works).

[1]: https://github.com/neon-bindings/neon

Why should i learn a new language for that? It's good to have as many options as possible in js and you take the one that fits you best.
Because JS isn't good at everything just like C++ and Rust aren't good at everything.

Right tool for the job.

When all you have is a hammer...
>Why should i learn a new language for that?

You make it sound like it was difficult to learn. Underneath, C++, Java, Pascal, C#, Javascript and Python, have many similarities and jumping from one of those languages to another in the list is very easy; compared, for example, to something like jumping from any of those languages to Forth, PROLOG, SQL, ML, Haskell, or Lisp.

Some of them are also really similar syntactically, for example this group: [C, C++, Java, C#]; or this other group: [Pascal, Algol, Go], so even the syntax doesn't get in the way when jumping from one to other.

Thus, usually, software engineers do know more than one language and they apply what better suits the program.

Because languages are tools, and you should learn to use more than one tool. If you know JavaScript, you basically already know most C based languages syntactically, it's very little effort to at least learn one of them for tasks JS isn't suited for.
But what about dealing with ffi, build dependencies and toolchains? Sounds like it is just shifting complexity in to a different place, not actually solving it.
Have you read the docs? Just go read this https://github.com/Microsoft/napajs/blob/master/docs/api/mem... and tell me you don't feel uncomfortable. This is the kind of baggage you are bound to get with such solutions and once you end up writing a steaming pile of code, then you need a thread safe logging and debugging story.
> Shared data structure among multiple threads... this sounds utterly fimilar and evil!

This seems like a bit of a FUD.

With multiple threads and shared data, you don't necessarily have to share all the data structures with all other data structures and all the threads. You can setup your things such that minimum or nothing is shared. That's (also) what access control and immutaibility is for in programming languages, apart from other features.

Of course, different languages support these features in different ways, I don't want to get into the specifics, but in pretty much all mainstream languages you can create a similar share-nothing or share-almost-nothing design and it's not even hard, it might even be easier.

I really don't understand modern web/JS developers. They seem to ignore traditional solutions and/or proclaim them as evil, and then they go on to employ a 'new' solution that is 3× as complex, performs 5× worse and requires 10× as many dependencies/tools/frameworks/etc. Why? I suspect there's a LOT of largely irrational fear of concepts and languages that are unfamiliar. "Fear driven developement" in fashionable lingo.

TL;DR you don't need to be scared of threads, you just need to be scared of threading architectures that share too much.

>I really don't understand modern web/JS developers. They seem to ignore traditional solutions and/or proclaim them as evil, and then they go on to employ a 'new' solution that is 3× as complex, performs 5× worse and requires 10× as many dependencies/tools/frameworks/etc.

It is, perhaps, because a significant amount of Node.js developers came from front-end-only development, thus unfamiliar with the traditional approaches (in this case, using threads). An example is the many cases in which a document store as MongoDB is (wrongly) used for data that is mostly relational.

Simply put, they never were taught the traditional approaches first.

Basically your argument boils down to "it's easier to write single-threaded code than multi-threaded". Well no shit, but the benefit is in many cases colossal, so I'd say that's not a good argument to dismiss this complaint.
> Redis is single-threaded, probably one of fastest,has data different structures, can handle high loads, code is easy to reason, something that just works.

Redis probably isn't a great example here. I've worked on projects where a single Redis instance was not enough (would easily peg its single CPU to 100% and have query latency in the multi-second range). In the end, sharding the data among several Redis instances was successful, but also brought its own problems. The ideal is that we just have languages, runtimes, data stores, etc. that abstract these details away from us so we can focus on our application logic, not on how to make it faster.

To be fair though, one of Redis's biggest weaknesses is its single threaded nature in instances where you, e.g., have huge sets and need to compute expensive set intersections/etc...

Redis also might not be the best choice if thats your primary use case...but still.

Once upon a time nobody seriously thought JavaScript would ever play any role outside of the browser, and even the role in the browser was small enough many people preferred to disable it.

Then we got a very fast JIT, and suddenly you could do reasonable compute heavy stuff very fast, and then it became viable to also write the server side in JS, because of programmer efficiency and library reuse and other reasons.

The "right tool for the job" can seriously change when tools improve and develop, and just because there already are other tools for the same job should not stop anybody from trying.

I can not think of a better example for that than JavaScript.

Do it in C++ because CPU bound performance is terrible in JS anyways
That's reasonable, but I'm refuting the claim that 'you can get rid of the need to multi threading by deploying more containers in the same machine or via orchestration', not asserting that JS is the right language in the first place.
Yeah, I agree that isn't always the right choice. If all you need is horizontal scalability, then more processes is fine, but it won't work when you actually need multiple cores working on the same task
> What if you have a large shared in-memory data structure that you want to update with lots of irregular translations in parallel

Use C++.

If you ever decide to scale and distribute it for real save the state in a database and orchestrate containers. For this solution i would recommend using Node.js or Go.

My point is not that JS is the correct language for all tasks.

My point was that this isn't true:

> You can get rid of the need to multi threading by deploying more containers in the same machine or via orchestration.

If you have a large shared memory data structure and irregular updates then this approach won't work, no matter what language you are using. If you say use a database instead, well then the database just has to solve exactly the same problem, and they'll use shared memory parallelism as well.

You can punt the problem further down the stack, but some, somewhere at some point is going to need to solve the problem, and they're going to use shared memory parallelism to do it.

A container per thread sounds like the least efficient solution possible.
Assuming everything you do is non-blocking, this effectively means a container per core.

Not ideal, but not too bad either.

And if it isn't non-blocking? What if I'm writing a game and want to run parts of the rendering, AI, physics, etc. in parallel? Do I create a bunch of containers for each subsystem of my game and have them all communicate via IPC? This doesn't sound like a recipe for good memory/performance characteristics.
But we’re talking about javascript here, a language that is async about almost everything.

I completely agree with what you’re saying for those other problems you describe. But for the typical node.js webapp, doing the one-container-per-core is perfectly fine.

Its not up to me to know how to manage processes or threads. That's the SO task, not mine.

My app should be stateless and scale by replication. This is the most efficient solution in every possible way.

Sounds like your app should be.... functional.
A thread is just a process that shares memory; a container is just a process that has a different network/filesystem/etc than the rest of the processes.

Containers don't cost meaningfully more than threads unless you create expensive unique resources for each one.

Network/Filesystem/Memory are not expensive resources? It's a lot of overhead. If you're going to claim that you can share memory between containers than arguably you don't have multiple containers but rather a single one.

This is way more overhead than threads which can share all of those resources.

Network namespaces, virtual ethernet interfaces, iptables rules, & union filesystems are all very cheap and have little to no overhead for normal use cases. N processes in 1 container isn't a perf win over N processes in N containers.

Shared process memory isn't the easy memory-consumption win it sounds like, locking is hard to get right, potentially very destructive to the parallel performance that was the point of the whole exercise, and marries you to a single physical box.

Even if you want to take advantage of shared-address-space shared memory you probably want to do it in a more principled way than fork()

One-copy-per-thread and share-by-communicating both give you braindead simple scaling without dealing with that.

This is actually very similar to your idea. Each worker is executed in a separate v8 instance and napa provides a way to communicate between workers. A bit more efficient since you dont carry container runtime around.
So, Perl ithreads then?

It works for specific types of work, but not necessarily as a low cost abstraction unless you pre-thread. In other words, it works well for some cases where threads are used, and horribly for others.

Why would you need containers to run multiple instances?
I think this has two major downsides:

1. Setting up containers and the communication between them is really complex - it also only makes sense for a deployed, long-running, server process. 2. Communication between containers is very expensive. You're never going to beat an in-process pointer to shared memory.

Web workers would make a lot more sense for most javascript programs.

1 - How is REST complex?

2 - That is only true if the your bottleneck is on communication somehow. Usually it is not.

And a huge advantage: You will write stateless and easy to scale apps that do not care how the machine resources are being handled.

You need to run multiple applications side-by-side and they need to be able to talk to each other. With containers that means figuring out the network layer and service discovery.

It also means accounting for each system failing in your own app, retries with exponential backoff, timeouts, logging errors, circuit breakers, plus all the exotic ways a network layer can fail - if your RPC protocol has arbitrary limits (message size, timeouts, etc)

You will probably want to use kubernetes with istio, not raw docker. All very do-able, but definitely not simple.

I agree that services make sense, but there's a level between single-threaded and micro-services where having concurrency within your application is useful.

I tested this with docker and could not observe a big performance-penalty on the cpu usage. Of corse you need more memory and the biggest thing is that you have to build your application in a way that you can do this later.