Hacker News new | ask | show | jobs
by amkkma 1798 days ago
Thanks, I see where you are coming from.

>It should be the responsibility of the developer of "model" to select the "fit" algorithm appropriate for "model." (They don't have to implement it, but they do have to import the right one.) The developer of "fit" should not be responsible for handling every possible "model" type. You could have the developer of "model" override / extend the definition of "fit" but that opens up its own can of worms.

It's really the same thing as python, just better...I don't see the distinction you are drawing.

In python you have a base class with default behavior. You can subclass that and inherit or override.

Julia has abstract types with interfaces...instead of relying on implementation details like fields, you provide functions so that more types of models can work even if they don't have that one specific field. Otherwise everything is the same where it counts,- you can compose, inherit and override. Even better, you can work with multiple models and types of data, inheriting where you see fit.

I don't see any benefit to python's restrictions here, either in ease of use or in expressiveness.

For all intents and purposes it's a strict superset.

Even better, you can use macros and traits to group different type trees together.

https://www.stochasticlifestyle.com/type-dispatch-design-pos...

These seem to be in contradiction:

>It should be the responsibility of the developer of "model" to select the "fit" algorithm appropriate for "model.

>You could have the developer of "model" override / extend the definition of "fit" but that opens up its own can of worms.

It's the same in python, either you inherit Fit or you can override it. What's the difference with Julia?

Except in julia all types and functions have a multiple dispatch, parametric type and possible trait lattice of things you can override, customize and compose, so that even if the model author has to override fit, they can do it using small composable building blocks.

1 comments

I agree you can achieve same benefits with Macros. Indeed, I see that MLJ, Julia's attempt at a SciKit type project, makes extensive use of Macros. But I personally think macros are an antipattern. In large projects, they can introduce subtle bugs. Especially if you're using multiple modules that are each editing your code before compile time and that don't know about each other. I know others in Julia community agree that Macros are dangerous.

I think abstract types are a brittle solution. The "can of worms" I alluded to is something like this: Library TensorFlow implements model "nn" and Library PyTorch also implements model "nn" and they both want to override "fit" to handle the new type "nn"... Good luck combining them in the same codebase. This problem is less pronounced in OOP where each development team controls their own method. Julia devs can solve this by having every developer of every "fit" function and every developer of every "model" struct agree beforehand on a common abstraction, but that's an expensive, brittle solution that hurts innovation velocity.

I think the closest I can do in Julia via pure structs is for the developer to define and expose their preferred fit function as a variable in the struct, something like "fit = model['fit_function']; fit(model,X,Y)" but that introduces a boilerplate tax with every method I want to call (fit, predict, score, cross validate, hyperpameter search, etc). (EDIT: indeed, I think this is pretty much what MLJ is doing, having each model developer expose a struct with a "fit" and "predict" function, and using the @load macro to automate the above boilerplate to put the right version of "fit" into global state when you @load each model... but as described above, I don't like macro magic like this.)

None of these require macros.

If (and only if, I have not looked at our hypothetical model and fit GF, but for the sake of argument, I will assume that it does) "fit" specialises on "model" will a "mode = AnOtherModel" cause "fit(model, x, y)" to be exquivalet to Python's "model.fit(x, y)". If you need to provide a custom "fit" method, you do so by providing a method specialised on AnOtherModel to the "fit" GF.

At no point is there a macro involved.

As for the "module a, model nn" and "module b, model nn", I would naively assume that they actually are different models, and therefore something specialising on "a.nn" will not get dispatched to when you pass a "b.nn".

Disclaimer: I don't actually know Julia, at all. But I have written substantial amounts of CLOS code (and Python, but I like CL and CLOS better).

I know nothing about Lisp, so at the risk of talking past eachother....

I never said macros were required. I said implementing this type of code without OOP required more boilerplate, and MLJ uses macros to reduce that boilerplate.

As I understand module imports in Julia: Each module developer exports a list of publicly facing objects. Obviously "fit" and "model" would be among them. If you import two modules that both export a new "nn" subtype of shared parent type "model", and both extend "fit" and "predict" and etc to accept their own subtype "nn", then you have to manually specify which module you are referring to every time you call nn, or fit, or predict, or whatever. Is that wrong? If I just import PyTorch, import TensorFlow, and then call "mymodel = TensorFlow.nn; fit(mymodel, mydata)" then Julia doesn't know that the "fit" I am calling is the TensorFlow implementation and not the PyTorch implementation; what if I had WANTED to use module A's "fit" on module B's model, and they intentionally adopted the same abstract type system to enable this interoperability? So instead I have to write "mymodel = TensorFlow.nn; TensorFlow.fit(mymodel, mydata); TensorFlow.predict(mymodel, mynewdata)". Obviously the extra typing is mildly annoying but the bigger problem is potentially introducing bugs by mismatching modules, and the developer's cognitive overload of having to keep track of modules. Python style OOP is a more elegant solution to the namespace problem and results in more readable, maintainble code, at least in my opinion. Anyways, maybe Julia has a more elegant solution I'm not aware of, if so I'd love to hear it.

In Julia it will just dispatch to the correct function.

In other words, one package would define `fit(mymodel::TensorFlowModel)` and the other would define `fit(mymodel:PyTorchModel)`, and then when you call `fit` it'll just dispatch to the appropriate one depending on the type of `mymodel`.

This dispatch-oriented style also allows a shocking degree of composability, e.g. [1], where a lot of packages will just work together, such that you could for example just use the equivalent of PyTorch or TensorFlow on the equivalent of (e.g.) NumPy arrays without having to convert anything.

If you mean "what about the case where both packages just call their model type `Model`", while I've never run into that, the worst case scenario is just that you have to fall back to the Python style explicit Module.function usage (which was always allowed anyways...). And if you if you don't like names being exported, you can always just `import` a package instead of `using` it:

  help?> import
  search: import

  import

  import Foo will load the module or package Foo. Names from the imported Foo module can
  be accessed with dot syntax (e.g. Foo.foo to access the name foo). See the manual
  section about modules for details.

[1] https://www.youtube.com/watch?v=kc9HwsxE1OY
I very frequently run into namespace collisions like that. I think they are quite common in large codebases.

I am aware of the ability to do eg "import TensorFlow; model = TensorFlow.model; TensorFlow.fit(model,data)"

As I mentioned previously, I find Python's OOP "model.fit" syntax to be better, for a variety of reasons.

Thank you for your engagement. Have a nice day.

There's some serious misunderstanding here. You do not have to disambiguate the function call, only the construction of the object. You would write

  m1 = TensorFlow.model()
  fit(m1, data)
  m2 = Pytorch.model()
  fit(m2, data) 
Julia knows which version of model you are using.

YensorFlow.fit and Pytorch.fit are just different methods of the same function.

You've formed some strong opinions based on a pretty big misunderstanding.

Do you have an example of a case where you ran into this in Julia with two packages that you wanted to use together? If the packages are still actively developed, I suspect the developers would be interested to resolve the situation to allow interop.
Your explanation is stop on for Julia.