Hacker News new | ask | show | jobs
by awaythrowact 1795 days ago
@cbkeller

I know in Julia I can just precede every function call and object instantiation with "modulename." and solve the namespace problem that way. What I want to do instead, what I do in python, is bind one namespace of function methods to each object, so that as I code, I don't have to remember which module each each object came from. That is the appeal to me of "model.fit" over "module.fit(model)".

EDIT: This is not some "shave a few seconds off coding time" quality of life issue. This is a mission critical requirements in many enterprise contexts.

Scenario A: Model Development Team trains a model, serializes it, and sends the serialized model object to Production. You want Production to have to lookup which software Package the Model Development Team used for each model, just so they know which "predict" function to call? No, "predict" should just be packaged with the model object as a method, along with "retrain", "diagnose", etc.

Scenario B: I want to fit a dozen different types of models, from multiple packages, on the same data, to evaluate how they each do, and build build some sort of ensemble meta model. So I need to loop over each model object in my complicated ensembling logic. You want me to also programmatically keep track of which module each model comes from?

These are important, happens-every-day use cases in the enterprise.

I think the best solution for this in Julia is to package the model state with all the "method" functions in one struct. Again, this is what MLJ does. This is the closest Julia gets to OOP. But then you either need a lot of boilerplate code, or a Macro to unpack it all for you behind the scenes.

2 comments

hmm, but doesn't it just mean that ppl should extend the same `fit!` method rather the define their own?

The bigger issue for production in my experience is about packaging the right model with the right version. I don't think anyone has to do `module1.fit` everywhere, since `fit` would've likely come from the same source.

That solves both of your scenarios

> since `fit` would've likely come from the same source

No.

As described in the great-great-great-great-great-great-great-great-great-parent comment, the problem I am describing is that of trying to combine models from multiple software authors. You may not have that problem. It may not be a common problem among Julia's academic user base. I do have that problem.

Thanks for reading.

Oh, I see -- yes, fair enough!
Just to be clear, I like Julia, and think it has advantages over Python. I'm writing all this as someone who is cheering for Julia to break out of its HPC niche. Thanks.
Thanks so much for taking the time to outline your thoughts...I share the same goals and input from industry experience like this very valuable. This has spawned some discussion on the Julia slack about how best to target your usecase.

Can I trouble you to make a post either on discourse or on the slack? I'd really like this to get in front of the broader julia community and core devs, and you're the best person to do that. Maybe there's a solution of which I'm unaware....or there could be some design discussions to come out of this.

https://julialang.org/community/ (slack and discourse link)

Always happy to help. Especially the last day or so - I've been waiting on some long training loops so it's been a pleasant diversion.

To be 100% honest with you, there's pretty much 0% chance of me adopting Julia in the next 12 months. I evaluated it before embarking on my current project, but ended up going with Python, and now I have several thousand lines of Python code that work fine, and I'm not going to rewrite it all in the near future. At some point in the medium term, I'll re-evaluate Julia. But until then, I don't want to lead anyone on any wild goose chase. Even if you solved this problem, and all my problems, I'm just not in a position to switch right now. So for that reason I think I'll hold off on issuing a community wide call for help. But I'm cautiously optimistic that at some point in the future I'll be writing Julia professionally.

A lot of this is also probably cultural rather than language features. The first thing they teach any Python data science boot camp initiate is: never "from numpy import *", always "import numpy as np" and yet in Julia "using" appears more common than "import"...

I also wouldn't read too much into my one example. It was initially meant just as an illustrative point, but somehow I was so bad at explaining it that it took tons of comments for me to get my minor point across. I do think the MLJ guys are properly on the right track, and that should work fine for most people who don't mind Macros. Maybe I'm in the minority in hating Macros.

The more commonly cited issues around compile time caching, tooling, etc. are boring to list off yet again, but probably the right areas of focus for the community, in terms of both impact and ease of implementation.

More generally, I really do think you're better off talking to people like @systemvoltage, who have actually given Julia more of a chance than I have. If I worked for Julia Computing I'd be reaching out to him and begging to get his brain dump on all the challenges he faced. In any business, it's always easier to reduce churn and learn from your (unhappy) customers, then it is to convert new prospects, whether that business is programming languages or investment banking.

Best of luck. Sincerely.

That makes sense, thanks ! Good luck to you as well.