Hacker News new | ask | show | jobs
by matt_daemon 1059 days ago
I wish there wouldn’t be such a song and dance about “moving away from Python”. There’s nothing wrong with creating ML tools in Elixir, but it’s always Python is slow, Python has no concurrency support, blah blah
5 comments

I come from Ruby but the reactions can be similar, happy to give my data point.

The thing is Elixir is really good at an increasing number of things.

If you need to write a HTTP proxy in the middle of your application, since Elixir processes & incoming HTTP workers are cheap, you do not need to go evented: it just works.

If you need to have reactive web apps with automated changes pushed to the client, it's the same: there is no need to external tools (e.g. any cable) at certain scale.

If you need to do some scripting, there is `Mix.install/2` for single-file dependencies description & use.

If you start crawling too much web pages or process to many APIs, the concurrency support kicks in and there is less need to scale (or later), turning into fewer machines, fewer ops problems (or delayed) etc.

And now you start being able to use MachineLearning, deploy the same type of code on GPU, embed Machine Learning models right in the middle of your web app without much work, etc, which in turns makes it a nice platform for apps / SaaS.

Elixir really is becoming a Swiss-army knife which scales easily :-)

And nerves makes it easy to do IoT stuff / almost-embedded
add rustler and you can even do some systems stuff! (We use it at work)
still not easy at the most important thing of all: being approachable, instrumented and intuitive to people less dedicated to programming.

I say this as someone that likes elixir, but after seeing it failing miserably at my org, I'm very skeptical it can be thrown around like a spring or node or django project. It needs real support from the org and requires module design skills that are not present in most random devs from a random org.

Could you expand on what you mean by this?

Module design doesn't seem any harder than class design in JS or Python.

Do you mean the language is generally harder for non-developers? Or that Elixir is harder for JS/Python developers to pick up and write good code? Or something else?

Writing well designed Elixir code does seem to require a fairly different approach from most common OO languages, at least at a surface level. (Although IMO that's more because you can copy OO patterns you've seen before without thinking much about why they're good patterns than because good design in Elixir is much different from OO)

sure thing! I'll try to make points related to what I observed and experienced in my org, starting with one huge huge huge preamble which is: I work in an average org with average joe devs. an average joe dev to me is someone who "just wants a job" and is not very interested in furthering its own professional development or learning new things unless trained & forced by the job. it's perfectly fine to be an average dev, I understand I'm sounding like a snub but the difference is real, exists, it's tangible as I touch it every day.

having said that, designing a proper elixir module (which basically is a bunch of functions that operates on a certain data structure usually represented by map of a certain shape) carries a certain level of fundamental understanding of the operation you're doing, which in my experience is one of the hardest things to get correctly.

It also requires a different way of exploring code as you can't do the familiar `.` and see what happens, you can't just do `price.toCentsValue` but need to do `price |> Price.toCentsValue` and so you need to know the existence of the `Price` module, which might not exist and be buried in `Cart` or as an helper in some controller because you did not understand clearly the domain and the modules responsibilities. Attaching behaviour to data is powerful, explorable, and most people are used to it, even if it's the wrong place in principle, with modules this is flipped, it's now data that must thread through operations, and it's not super easy to grasp.

Also, tooling is not that good (dialyzer sucks, intellij plugin not that good, vscode lsp good but still not a proper experience from people coming from c#/java, type annotations are not that readable...), pattern matching and destructuring on fn arguments confuses people and it's not super easy to read, and a million other papercuts related to tooling and syntax.

We don't have many elixir codebases (let's say around 15-25%) and I've seen incessant whining about "we don't want to maintain elixir" simply because the majority of people cannot be bothered to learn another mountain of quirks and papercuts (every lang has them) plus also losing the familiar way of working they already have, and having to remember that for the spot ticket that appears once in a blue moon on jira for elixir. That's why I think elixir needs extra support from the organization, basically in mandating it to be the primary language, teaching people proper design + proper code navigation and structure techniques, etc.

I hope I've been clear in my long winded ramblings; and I still wish a great future for elixir, so it becomes more approachable in average places like mine

> It also requires a different way of exploring code as you can't do the familiar `.` and see what happens, you can't just do `price.toCentsValue` but need to do `price |> Price.toCentsValue` and so you need to know the existence of the `Price` module, which might not exist and be buried in `Cart` or as an helper in some controller because you did not understand clearly the domain and the modules responsibilities. Attaching behaviour to data is powerful, explorable, and most people are used to it, even if it's the wrong place in principle, with modules this is flipped, it's now data that must thread through operations, and it's not super easy to grasp.

Not sure, I think you're picking and choosing things to ramble about. While you can't just do `price.toCentsValue`, and have to call `Price` module, in a OO language, you would need to instantiate `price = new Price(amount 10)` or something before calling `toCentsValue`. This means you're aware of the Price class, same as being aware of the Price module. If anything being aware of the module infact is better IMO since it allows you trace through explicitly what the code is going to do. Your point is correct if you're working through macros though.

I'm not really picking anything, just sharing how it went with elixir in my org. I'm sure we could have done better, but the pit of success was not there for the polyglot teams. the elixir-only teams did a bit better, but mostly they just learned the ecosystem and then GTFO'ed out to greener pastures since they just wanted to write elixir and only elixir.

about your specific counterpoint, I think it's partially valid; I don't instantiate modules, I might not have named structs but just operate on Maps so I don't know the supposed module name, I might have misplaced the function in another module (while it's very intuitive to stick it to the object in oop)...and a million other things that are way too different for the average joe.

The org I work at has the same issue with Elm, there is one Elm product and no one really wants to take those tickets because everyone hates dealing with it. It has become a total pain in the ass, and I believe someone is rewriting the app in React
I'm guessing the people who actually love Elm left.

You really do have to love a paradigm to work with it, because paradigm shifts have a cost. Elm's paradigm shift is "do everything declaratively/immutably/non-side-effecty" and the massive (IMHO) benefit of going about things that way is "no runtime errors" (!!!) in addition to quite performant code.

But yes, the cost is there, and it is that you sometimes just want a side-effect to get something done, you sometimes just want to call into another library to get something done, etc.

The thing is, if Elm allowed this, or made it easier, you'd lose the Elm guarantee of "no runtime errors". Which, frankly, is a pretty big one- just inspect any popular domain's web page and you're likely to see dozens of JS errors that are simply hidden from most users, contributing to a "janky" web experience.

I literally just inspected this very page I'm typing this comment on and I see:

    This page is in Quirks Mode. Page layout may be impacted. For Standards Mode use “<!DOCTYPE html>”.
    Error: Promised response from onMessage listener went out of scope
The cost of perfection is high. You really have to love the ideal.

But yes, if no one does, then Elm is a boondoggle.

I'm dealing with something similar with NixOS. NixOS's big guarantee (which is also big) is basically "no build or runtime failures that are due to misconfigured dependencies". But there's a steep learning curve and scattered documentation. The core idea is amazing, though, and that's what I love. The rest I tolerate while climbing the mountain.

That's going to be the case with almost any legacy project though
> I work in an average org with average joe devs. an average joe dev to me is someone who "just wants a job" and is not very interested in furthering its own professional development or learning new things unless trained & forced by the job

so basically you're all doing things you hate, which is literally the worst possible work environment, and you're trying to use this as a data point for why Elixir "isn't for average coders". I have news for you, dude, you're not even at the "average coders" level, you just work in a "code mill", I'm guessing India.

you have an absolutely skewed perception of average in our field. I'm guessing "less than 3 years of experience".

nevermind the random stab, it's just to make you understand how random some answers are. in any case, no one is trying to touch your precious language, if you can't understand the context in which elixir failed (hot startup that HAD to hire everyone and its dog in 3 months + polyglot environment), not my problem

your last paragraph largely answers your questions.

it doesn’t take much to yeet some python scripts (and more) at the wall and get something to stick. SO, GPT, pick your poison, it’s painfully easy. solved problems as far as the eye can see, with a little glue or tape to hold it together.

elixir demands more of an investment. more than approximately zero is quite a bit if you already have momentum.

(elixir can absolutely be yeeted. not arguing otherwise.)

Honestly long run elixir code by LLMs is going to be better because the llm won't have enough attention or context to be sure that mutable passes to functions don't absolutely wreck your data
On the other hand, it may filter less dedicated people from showing up at your door.
this has been a point of contention with numerous hr departments that needs to hire fast "cuz investors!!11". this is the reality of life sadly
This is the real problem.

Most orgs that fail on elixir fail due to management.

absolutely, that's what I said when I talked about strong org support.

also, if you don't have strong org support, you risk getting onboard people that just want to work with that specific technology that WILL gtfo as soon as they need to change team to one that doesn't use it, or if the technology is sunset, etc., so it's even more risky to have an exotic stack in the middle of more common stacks

I do agree with the need to make it more approachable indeed.

It is too much a "language of experts" at the moment, although it is not caused by the language itself, more by the topics covered in general.

"Everything is working fantastically with our python ML project but we're rewriting it in Elixir anyway" would be a weird article wouldn't it?
This article in particular doesn't feel like there's any song and dance. The very first line is directed at people already using Elixir who are looking to stay in Elixir-land while getting deeper into ML.
Those are real issues though.
Is concurrency useful for ML?
If your data loading pipeline grows even slightly complex, then yes, you absolutely need concurrency in order to deliver your samples to the GPU fast enough.

The current workarounds to make this happen in python are quite ugly imho, e.g. Pytorch spawns multiple python processes and then pushes data between the processes through shared memory, which incurs quite some overhead. Tensorflow on the other hand requires you to stick to their Tensor-dsl so that it can run within their graph engine. If native concurrency were a thing, data loading would be much more straightforward to implement without such hacks.

Yes, it can be.

1. Loading data

2. Running algorithms that benefit from shared memory

3. Serving the model (if it's not being output to some portable format)

There are also general benefits of using one language across a project. Because Python is weak on these things, we end up using multiple languages.

You end up having to do a lot of things in a ML training run, some of which you can do in parallel because it’s not important now (eg saving metadata) or because you’d otherwise be resource limited (eg loading data and formatting batches for training)
And for this you cannot use Python's multiprocessing because ... ? Sure, moving data between processes is slow because of pickling [0]. However, I'm using parallel processing for the things you suggested, and for these it works great.

If I really had the use case and needed threads, I'd much rather use C++ bindings in a Python package than rebuilding the whole thing. Guess it depends on the scale we are talking about.

[0] https://pythonspeed.com/articles/faster-multiprocessing-pick...

It’s handling all the things that can go wrong when communicating and coordinating across processes, across machines, or troubleshooting bottlenecks on running systems that Elixir (and Erlang) excels.
Concurrency generally makes things run faster. If you test your ML methods your tests will complete faster if the ML methods are able to use and take advantage of concurrency. Some people consider that useful.
No, parallelism is useful, concurrency without parallelism is not useful.

Go and elixir provide some parallelism but the primary focus for both languages is concurrency.

It's not. Until you need to deploy it.
yes but generally they are not that kind make or break type issues like eg Julia correctness problems.
Correctness problems? Could you expand on that? Not doubting, just curious. Thnx
Python is slow and concurrency is not great.