Hacker News new | ask | show | jobs
by nerdponx 1880 days ago
I really wish the Julia ecosystem would stop assuming that you always interact with your computer through the Julia REPL and started supporting proper command line interfaces. This is one of the big annoyances and mistakes of the R ecosystem, and I think it's unwise to carry that mistake over to Julia.

Also, big "ugh" to browser-based tooling. I want to browse webpages in my browser, I don't want to do my data science work there. We don't even have a good native client for Jupyter notebooks yet, let alone for this new Jupyter alternative that doesn't support the existing Jupyter kernel protocol.

Not only that, but Pluto also apparently has some obnoxious UX limitations that remind me of other less-than-usable wannabe-Jupyter-notebooks (e.g. Apache Zeppelin, Databricks): https://towardsdatascience.com/could-pluto-be-a-real-jupyter...

In short: nice idea, but I'd rather see continued unification around Jupyter and a proper IDE that can at least emit and interact with Jupyter-compatible data.

On the other hand, the Jupyter notebook JSON format is bad for a variety of reasons (e.g. you need special tools for readable Git diffs) and I really wish we had all settled on R Markdown instead. But R has its own NIH tooling problem and nobody was ever going to adopt it because the R community itself (driven by RStudio) has little interest in sharing or interoperability with other languages.

</cynical-angry-rant>

12 comments

Confession: after doing Data Science work for the past 4 years I STILL don't really understand why people like Jupyter.

R was my first programming language and I got really spoiled with RStudio where everything "just works" and the "highlight code -> run in REPL" workflow is super smooth and tightly integrated. All I want is for that to work in other languages, but it seems like if you want it in Python you need to be running PyCharm or a similarly-heavyweight IDE (seriously, despite all the hype of VSCode there are still a ton of issues with just highlighting code and running it in an IPython terminal) and for Julia it just doesn't exist. If you really want a Jupyter-like workflow you can just use R Notebooks, which are literally just better in every way.

Well, R isn’t the best language when it comes to building systems. Most R code is essentially one file written to produce an output once (for a paper, project, etc.). This means that people want a better language to build systems which Python fit. That explains why people moved to Jupiter.

I don’t like RStudio for the same reason I don’t like Matlab. I already have my editor and terminal workflow. I don’t want to use/learn a new tool for the privilege to use the language. Notebooks hit an acceptable middle ground where I can launch them via terminal. Notebooks have plenty of problems. Mainly, running cells out of order is just an incredibly dumb thing to be possible. This same problem is present in RStudio which you seem to enjoy (highlight and REPL) and you want it in other languages. If the code isn’t written to run in an order, a tool shouldn’t allow it.

> Well, R isn’t the best language when it comes to building systems. Most R code is essentially one file written to produce an output once (for a paper, project, etc.). This means that people want a better language to build systems which Python fit. That explains why people moved to Jupiter.

I definitely agree that Python is a better general purpose computing language than R, but R's deployment story (i.e. packages) is much, much better than that of Python (pip/poetry/pipenv/conda/whatever came out this week). I honestly don't think that's the reason though, it's more that Python has much, much, much better developer mindshare.

Jupyter is a whole other world though, like iPython was the best thing ever as a proper REPL for python, and Jupyter was good for being able to do graphics with your code. That was all standard in the R world, with Sweave (which I wrote my thesis in), so it didn't appear to add a lot of value (to me, at least).

> I don’t like RStudio for the same reason I don’t like Matlab. I already have my editor and terminal workflow. I don’t want to use/learn a new tool for the privilege to use the language.

I am 100% with you on this, but Rstudio is just a nicer interface over the tools for literate programming in R, and the wonderfulness of Rmd vs ipynb is a thing of joy (to me, at least).

> Mainly, running cells out of order is just an incredibly dumb thing to be possible. This same problem is present in RStudio which you seem to enjoy (highlight and REPL) and you want it in other languages. If the code isn’t written to run in an order, a tool shouldn’t allow it.

So, this is a tricky one. I agree in principle, and I have a habit of continually re-running my documents to ensure that this doesn't cause problems, but there is definitely valid use-cases for out of order execution. Consider that you may often fit a model (which can take ages) and iterate on the visualisation/analysis code, but you don't want to re-run the modelling code every time you change a plot, which your solution would require.

Most of the tools claim to allow you to cache particular blocks, but I've never been able to get it to work reliably.

Yeah, I find that the out-of-order execution issue is common with people who have a software development mindset, but for data analysis/science is basically the only sensible way to work. The "load data" command might be one line but takes 3 minutes to run, while a huge chunk of code that plots the data might take 1 second and I might want to tweak it 50 different ways before settling on something that I like/delivers insight. Producing a standalone script that develops the same insight you get from "playing" with the data is an afterthought in some cases.
As long as you're aware of the dangers, it's fine. Personally I try to model offline from analysis to avoid this issue, and set eval to no in org for those cases where I've built the model inline with the analysis.

Unfortunately, it generally takes a couple of terrible situations before people learn the problems with this.

I agree that data analysis needs a tool to persist data while iterating over certain functions. But in this vein, said tool should aim to try to prevent the user from having to run the load_data() function more than once. Not encourage it by allowing someone to permanently manipulate the output of load_data().
This is an option in many tools, but it doesn't tend to work that well in practice.

I do agree that this is the ideal though (As an example if Pluto is always reactive, then this workflow becomes much more difficult as when you change a downstream datapoint, the model will be re-run).

That workflow has worked for me for Julia in Spacemacs and VS Code. Pretty sure it works in Atom, too.
Spyder is basically an RStudio clone for Python, but I never had a great experience using it. Not really sure why, somehow I just ended up using Jupyter because that's what my coworkers all used. When doing "solo" stuff, it doesn't matter because I dislike every interface so I'm never happy anyway...
> I dislike every interface so I'm never happy anyway...

It sounds like you might have some good constructive criticism after trying several options. Care to elaborate on the shortcomings of various options?

Pluto notebooks are Julia scripts, usable at the command line.

Edit: Pluto uses Julia's package manager; moreover, Manifest.toml can be used to pin all of your project's dependencies so the notebook is repeatable, from a code perspective.

That's good to know. But I was talking about the package manager and starting the Pluto server.
You can start pluto server from command line

> julia -e "using Pluto; Pluto.run()"

Also, package manager can be used from inside Pluto. To install somethin, you can just write in a cell

> using Pkg

> Pkg.add("Package Name")

How does RStudio have little interest in interoperability with other languages? They produce the reticulate package[1] to allow calling Python code for R, they have added support for Python to RMarkdown and RStudio[2], they let you host Python apps on their RStudio Connect product[3], they sponsor Ursa Labs to work on the Arrow project for easy data interchange[4].

1) https://rstudio.github.io/reticulate/ 2) https://solutions.rstudio.com/python/ 3) https://blog.rstudio.com/2020/12/16/rstudio-connect-1-8-6-py... 4) https://ursalabs.org/

Then things are a lot better now than they were when I was using it actively, and I take back that criticism.
You're not wrong in the substance of your rant though, but unfortunately it does appear that the ship has sailed and we're stuck with jupyter.

I really need to get a handle on ein, for when I'm inevitably dumped back into a notebook-driven environment (hard to avoid in DS these days).

To me this seems like an improvement in the direction that you want, in particular that notebooks are reactive. All too often I get a Jupyter notebook from someone else and try to run it on my machine only to find that some intermediate step does not work any more, because the original developer ran something out of order or removed a critical step. A reactive notebook seems more likely to still work after a lot of changes are made while experimenting.
Not just more likely, it will work. Pluto notebooks are deterministic, and do not have the hidden global state that plagues Jupyter.
> I really wish the Julia ecosystem would stop assuming that you always interact with your computer through the Julia REPL and started supporting proper command line interfaces.

What does it even mean? What is a CLI interface for a programming language if not a REPL ?

I also do not really get the complaint, but it is along the lines of people wanting to write `julia-pkg install Pluto` instead of `julia -e 'using Pkg; Pkg.add("Pluto")'`. It seems it is a big pet peeve for some people.
Yes. I agree with this complaint. The REPL is useful in some cases but in general I avoid interacting with it whenever possible. My impression is that workflow is highly task-dependent (perhaps obvious) but there are many of us who just want to write a script, run the script, and repeat.
Check out https://github.com/fredrikekre/jlpkg. It does pretty much exactly what you are describing.
The `-e` thing gets very messy quickly if you need to pass non-trivial data from the outside world into the application (have you ever tried to "parameterize" a Sed script?). It also doesn't compose well with other CLI tools.

I think these are two perfectly reasonable things to be annoyed by.

Thanks for the example.

As someone who has never used Julia... Wow, that looks exceptionally painful compared to most other modern languages.

The package manager that comes with Julia is actually way better than what is available in python, and it has an unmatched "foreign language dependencies" support. It just happens to be mostly used from the REPL, not the command line (hence the execute -e flag above).
> > I really wish the Julia ecosystem would stop assuming that you always interact with your computer through the Julia REPL and started supporting proper command line interfaces.

> What does it even mean? What is a CLI interface for a programming language if not a REPL ?

I guess they mean that the julia interpreter should be a good unix citizen (which is quite not at the moment). For example, while you can in theory create "julia scripts" by adding a julia shebang, this usage is not really well thought and has several friction points. Most notably, a very slow startup time, even of several seconds if you import some common packages. This makes said julia scripts essentially unusable.

The usual response of the julia community to these complaints is that "you are holding it wrong", and that you should use julia inside the proper REPL. Some people do not like this answer, and there's a tiny bit of drama around that.

I think there needs to be a distinction between on one hand Julia's startup time, which is an inevitable consequence on its compilation model and unlikely to change, and on the other hand whether there is a lack of command line functionality in Julia, e.g. the package manager. The latter is much easier to amend.
but is this "compilation model" inherent to the language? It seems to be an implementation choice. It is conceivable an independent interpeter for the same Julia language but with fast startup.
Julia already comes with an interpreter, try starting your session with `julia --compile=min`.

One part of the ongoing effort to reduce latencies is to allow package authors to specify optimization levels on a per-module basis. This is great for plotting packages for example, since they usually don't benefit much from overly aggressive optimizations, so spending less time optimizing codes generally leads to a snappier experience. It is now even possible to opt into a module-specific fully interpreted mode, which can make a lot of sense for typical scripting tasks.

That's great! Hoping to see julia get snappier at every release! (as it seems to be going)
Plenty of people use the REPL in terminal and sublime text or vim or whatever. I also dislike browser-based tooling and think Julia has done a good job avoiding Rstudio-style dependencies.

But if your point is the inability to do `julia script.jl` , yeah thats a pain point. Fortunately there has been some tooling to make running many jobs in a row easier: https://github.com/dmolina/DaemonMode.jl

How is it that I do `julia script.jl` all the time? Or by “inability” do you mean that it’s slow because of the startup time? If you need a utility that starts up instantly, create a sysimage.
In contrast to interpreted languages, creating a sysimage is yet another step (in addition to installing a third party package).

In contrast to AOT-compiled languages, PackageCompiler.jl doesn't statically analyze your code. So you need a "precompile script" that hopefully hits all callable methods (such a script will have to be made manually). The resulting "binary" is also massive.

Yeah people in python are used to doing `python script.py` all the time, and that's not very convenient in julia.

sysimages are great, as is daemonmode. But really just do Revise at the REPL.

Right.

I was also a bit harsh; you can at least do `julia -e 'using Pluto' -e 'Pluto.run()'`.

What do you mean by “RStudio-style dependencies”?
Just that there seems to be an expectation of using Rstudio when using things in R that seem generic.

One pain point is that Rmarkdown uses a different pandoc installation when executed by Rstudio than from the terminal.

This preference seems to depend a lot on where you come from. Having come from Scheme / Lisp (same as some of the original Julia developers AFAIK), I find I prefer the REPL in Emacs for coding. I do use Jupyter quite a bit for running simulations, doing data analysis, etc. For me, the main reason to use Jupyter has been (i) interacting with sessions on remote machines without needing to bother with X, and (ii) being able to easily incorporate LaTeX and share whole documents (math + working code) to collaborators and students.
PS I have tried Pluto and find that I don't like it very much in its current form, though I kind of like the idea.
Is Julia different from Python in this regard? I use Python mostly by executing scripts, but it’s nice to have the REPL and IPython and Jupyter. With Julia I’m free to just run “julia script.jl”, aren’t I? There’s probably more to your complaint than I naively realize, though. Maybe Python has better IDE support?
Python has a decent command line argument parser in its standard library, and there are several even-better options in the 3rd party library ecosystem, e.g. https://pypi.org/project/click/.
Julia has ArgParse.jl for 3rd party argument parsing. It seems pretty fully featured.
Good to know, thank you.
I wish there was a plain text format as base that everyone agreed on no matter what UI or backend is used; that would suddenly make it usable in any text editor and people could build tools and plugins that "just work" no matter whether Jupyter or something else is used.

The closest we got was the org-mode file format with human-readable data for everything, but it seems tightly coupled with Emacs unless you only want to use it as Markdown replacement.

But is tied to R, which maybe isn't the right approach.

Personally I love org mode, but we'd need a jupyter plugin to convert ipynb to org and back to make it work.

Big ugh to browser based tooling, and yet also continued unification around Jupyter? Are there any plans to have a non-browser Jupyter?
I use PyCharm's Juypter plugin and it's seemed far better for me. I work in Python everyday but I'm more on the data engineering and app security/architecture side of things than straight up Data Science. I don't use notebook's as often as I'd like but I live in PyCharm.
> Are there any plans to have a non-browser Jupyter?

Sure. VSCode with python and Jupyter extensions

Is VSCode's Jupyter extension much better than PyCharm's? Just curious, I prefer PyCharm over VSCode for normal python dev work. By a lot but I get it's personal preference, so I'm curious.
can't compare with charm, but a lot better than web, I believe
I tried this for the first time the other day and it was a great experience. Ironically the most cumbersome part continues to be Python environment management. I'll spare you my usual rant about that, but hopefully by Python 4 they'll find a solution.
What difference does this make, though? Isn't VSCode an Electron app? All of its UI is based on web stuff, anyway!
I assume that at least the browser shortcuts are out of the way ?
QtConsole is actively maintained. I don't use it but I do like it a lot.
nbterm [0] was recently released. You can also use Jupyter as a command line interface through Jupyter Console.

[0]: https://blog.jupyter.org/nbterm-jupyter-notebooks-in-the-ter...

It's "ugh" in the Jupyter world too.

A good quality standalone "notebook editor" would be an incredible tool. Nteract exists, but is not "good".

> Also, big "ugh" to browser-based tooling.

Hear hear! A simple web-view inside a native application window is a huge improvement imho. If only JuptyerLab provided a simple interface to access menu elements as well, you could easily have a nearly complete native experience.

Hard disagree.