Think Julia: How to Think Like a Computer Scientist | HN Mirror

Y	Hacker News new \| ask \| show \| jobs

	Think Julia: How to Think Like a Computer Scientist (benlauwens.github.io)
	207 points by cdsousa 2861 days ago

6 comments

ssivark 2861 days ago

PSA: The goal of the Julia 1.0 release was to stabilize language constructs for library authors to build upon. It will take time for them to update their libraries to be compatible with v1.0 -- so if you want to get a feel for the language, and things are breaking, stick to v0.7 for the near future.

beyondCritics 2861 days ago

Sadly not even 0.7 is usable right now, see https://docs.julialang.org/en/v1.0.0/ * The only difference between 0.7 and 1.0 is the removal of deprecation warnings. * I am currently using version 0.6.4. I feel it is still a great piece of software, even if is a few years old.

nur0n 2861 days ago

A good review of the changes between 0.6 and 0.7/1.0 is here: https://white.ucc.asn.au/2018/06/01/Julia-Favourite-New-Thin...

chrispeel 2860 days ago

You probably meant to say "stick with v0.6.4".

cuchoi 2861 days ago

If I don't care about parallelism nor speed, is there a reason to learn Julia?

fusiongyro 2861 days ago

I've only been learning for about a week, but I think if you're a nerd for language design, you will appreciate it on an aesthetic level as a very tight design around a powerful concept. Common Lisp also has multiple dispatch, but I feel the integration of it into all the nooks and crannies of Julia really pays off. Julia's performance doesn't appear as a side effect of building on the LLVM or because they over-optimized the core, as it does in many young performance-oriented languages. Instead, it appears as a tangible benefit of a multiple-dispatch oriented design that makes it easy to add information to the system to improve performance without compromising the clarity of a sketch.

I have often felt that there are many discontinuities between "pretty" Haskell as it is taught and pragmatic Haskell. I haven't used Julia enough in anger to say it for sure, but I see in the way it works great potential for the pragmatic code to be as beautiful as the high-level and abstract code.

For a long time I have felt that Haskell represented the most mathematical language. Julia really shows that there are other ways of building a mathematical language with taste and style. It's oriented to practitioners and applied math folks rather than computer scientists and pure mathematicians. I have enjoyed seeing the differences between these systems quite a bit, and I think Julia has a bright future as a practical, daily-use system for science.

sgt101 2860 days ago

there's an interesting issue wrt. maths and Juila; yesterday there was a story : https://news.ycombinator.com/item?id=17781475 on unmaintainable code. One of the clauses mentions the use of non standard characters as variable names : δ σ π ρ for example and cites the issue as having to deal with the code in a simple text editor.

I recently wrote a simulator intended as the demonstration of some issues in a paper. I found that using non standard characters enabled me to create a clearer implementation of the calculations in the paper in the code - so I think that it's a great thing that you can do this in Julia and that it should be encouraged.

In 2017 programmers have access to super powerful computers - some cycles to render and enable the manipulations are appropriate? What do people think?

eggy 2860 days ago

This why I love J and APL - succinctness of expression, however, these are very same reasons these PLs are criticized. I think if you do a lot of math with symbols, you appreciate them, and if you are a code maintainer, and not a mathematician, it takes getting used to it.

ethelward 2860 days ago

The only issue here is with input devices. The Greek alphabet is standard Unicode, so it's not more expensive to render or manipulate for your text editor compared to standard latin.

nolemurs 2860 days ago

To be fair, input devices are a pretty serious issue here. Probably the single most common operation I do on code is search it. If I can't type what I'm searching for easily, that's pretty annoying.

ethelward 2857 days ago

> To be fair, input devices are a pretty serious issue here

That's true. I'm currently using an emacs extension that allows me to convert to greek unicode LaTeX-like string (\alpha, \Gamma, ...), butI understand that's more of a hack rather than an actual solution.

ssivark 2861 days ago

I find Julia to be a wonderfully expressive language to write in, without sacrificing any speed. The abstractions allowed me to write less code (especially boilerplate), and maintain cleaner code structure -- mapping to my understanding of the problem. Feels much friendlier than Python to me, but YMMV. (My experience basically revolves around prototyping various numerical/ML algorithms for exploration and understanding, and not the kind where you just call a library to solve a task. Projects mostly in the range of 20 to 2000 lines of code. Having used both Python and Julia for such tasks, I lean towards Julia when I have the choice)

A couple of my previous HN comments on Julia's indexing: https://news.ycombinator.com/item?id=15472933 and https://news.ycombinator.com/item?id=15473169

Plus, I found it exciting to read community discussions for a language growing towards 1.0, to understand the different approaches that were considered, and why certain choices were made. From what I've seen, the whole development process was quite transparent, and the developers have always indulged sincere questions/suggestions from participants, dealing in concrete examples instead of sweeping generalizations and polemic. I don't know what standard to compare this to, but I've enjoyed the experience.

nur0n 2861 days ago

Yes. I don't do any scientific computing but I still really like Julia. The speed is there if you need it, but what attracts me more is how "ergonomic" the language is. To put it differently: Julia makes it really easy for me to map my thoughts into code. I have a habit of trying out many different languages across different paradigms and IMHO no one has come close in that regard. There is no one thing I can point to that is responsible for this. I believe it is due to a cohesive set of design decisions by the creators. The only downside is that currently there are not many Julia 1.0 packages available for general purpose programming (understandable given that 1.0 just came out).

Disclaimer: I recently started contributing to Julia (but I wouldn't have bothered in the first place if I didn't think it was so cool ;) )

cjhanks 2861 days ago

As somebody who appreciates Julia, my opinion is - probably not. That is, unless you want the opportunity to create a killer library that shows how the language features map well to other domains.

But, I think a lot of people in scientific computing are tired of the typeless mess that Python/Numpy/Scipy code-bases evolve to be. And for those people, I think it has a lot of merit.

At the end of the day, the language was designed to fill one major gap. A lot of time and effort in R&D is spent either; architecting sane C++ memory models, or reverse engineering existing Python code. Alternative well performing and safe languages like Java simply are not fast enough - to get the features of the modern CPU, you need to be native. And a side-note, MATLAB cannot usually be ran in a production environment.

mlthoughts2018 2861 days ago

Having spent years working with numpy and Cython, then switching to Scala for years as well, I much prefer dynamic typing.

Strong type safety is mostly just a waste of time.

srean 2860 days ago

As a long time Python/Cython user I can say that I have sorely missed static types in many occasions, especially for long running tasks. In fact I would sometimes use Cython not for performance but as a type checker.

I can describe a recent example. I had to ensure that an integer is always an int64 as the logic passes through different python modules and libraries. It was an absolute hell to track down all the places where things were dropping down to int32. With static types this would have been a no-brainer. This is not to say that I do not enjoy its dynamic typing where it is appropriate.

Hopefully Python 3 will make things better with optional types. But its still not statically typed, just a pass through a powerful linter.

mlthoughts2018 2860 days ago

While I certainly concede that dynamic typing will have painpoints like this, I just think on balance they create far fewer problems than the maintenance and inflexibility of type system enforcement patterns.

That said, I find your particular example with int64 extremely hard to believe. I assume you’re using numpy or ctypes to get a fixed precision integer, in which case it should be extremely easy to guarantee no precision changes, and e.g. almost all operations between np.int64 and np.int32 or a Python infinite precision int will preserve the most restrictive type (highest fixed precision) in the operation.

I work in numerical linear algebra and data analytics and have used Python and Cython for years, often caring about precision issues— and have literally never encountered a situation where it was hard to verify what happens with precision.

Unless you’re using some non-numpy custom int64 type that has bizarre lossy semantics, it is quite hard to trigger loss of precision. And even then, a solution using numpy precision-maintaining conventions will be better and easier than some heavy type enforcement.

srean 2860 days ago

I will agree about the 'on the balance' in the context of speed of prototyping and interactive sessions.

When rubber is about to hit the road, i.e. near deployment with money at stake, I would have love an option to freeze the types, at least in many places. Cython comes in handy, but its clunky and its syntax and semantics is not super obvious to a beginner (I am no longer one, but I remember my days of confusion regarding cyimporting std headers, python headers, how do you use python arrays (not numpy arrays) etc etc).

I am curious, have you put money at stake supported only by dynamic types ?

Regarding int32 vs int64, its not a precision issue its about sparse matrices with more than 1<<31 nonzeros. I am equally surprised that you have not run into this given your practical experience with matrices.

My case involves more than just numpy. There's hdf5, scipy.sparse, some memory mapped arrays and of course numpy.

Given the amount of time I spent to debug this, I would have killed for static type checks.

sgt101 2860 days ago

Ohhh… I really disagree; I find that strong typing allows me to get the compiler to check that the work that's going on in the various branches of my code is at least allowed - even if it's wrong. I wish that my test cases really did test all of these corners, but realistically I just don't believe that I am good enough at writing test cases to get everything. From another perspective using strong typing like this saves me lots of time in terms of writing finicky test cases. I'm not saying that dynamic typing is wrong - in fact it is brilliant in terms of not having to write reams and reams of dangerous and maintenance heavy boilerplate!

mlthoughts2018 2860 days ago

I’ve used a lot of functional programming unit test tools, and I’ve never seen any of them live up to the hype of checking corner cases in an automated yet comprehensive way.

The marketing pitch for that is always something like QuickCheck in Haskell, where e.g. reversing an array should be its own inverse function and you can auto-verify this like it is a law across a bunch of cases.

The problem is in real life unit tests, nothing has any laws like this, and it’s just a bunch of bizarre case-specific business logic and reporting code. The concept of a corner cass is a semantic one, and the definition of what inputs are possible to a given function will change and have constraints from the outside world that not even the most expressive statically typed language will easily let you encode into the type system.

Combine it with the fact that your colleagues have variability in their skills too, and often won’t make good choices with type system abstractions to represent business logic, and then all that costly extra boilerplate code for specifying types, creating your own business-logic-specific ADTs, adding privacy modifiers, templated or type classes implementations...

...it just becomes a big pile of garbage liabilities for what turns out to seriously be no benefits over dynamic typing.

Even in the static typing case, you’ll end up with tons of runtime errors causing you to frequently revisit assumptions in the unit tests. You’ll just have a harder time refactoring large pieces of code that are wedded to particular type designs and you’ll have to sit and wait on the compiler to try every change (this can be hugely bad when the system has components needed for rapid prototyping, interactive data analysis, or other real-time uses).

I’ve really seen a lot of corners of this debate play out in practice, and static typing beyond extremely simple native types and structs (basically C style), really offers nothing while being a huge productivity drain. The claims that it actually helps productivity because the compiler catches errors and forces more correctness just turns out to be false in real code bases. You get just as many weird runtime errors and just have a harder time debugging or rapidly experimenting with changes.

sgt101 2860 days ago

Many years ago I learned modular-2 and then ada. Then the job market moved, and fashion, and I learned c++ and then java. Of you had asked me pre-java-generics (6?) I'd have agreed with you, but generics reminded me that parametric polymorphism and static types are potent weapons, and suddenly I was writing ada type code again. Julia had pushed me further that way. With Ada we were able to use these tools to enforce design decisions across time and teams, stopping mess and mudballing. I can't claim this for Julia yet as I have only used it for three small projects - but I am optimistic.

fusiongyro 2861 days ago

One thing that makes Julia cool is that adding the types improves performance, but you don't have to add them if you don't want to.

dnautics 2861 days ago

Adding types in julia usually does not improve performance for function definitions... it may improve performance for collections (like specifying the type that an array collects), but usually julia is pretty darn good at correctly inferring the collection type

ummonk 2860 days ago

Right. Adding types is a good way to ensure you're writing type-stable code, but when you do write type-stable code, Julia can usually infer that it is type-stable without needing actual specified types.

inamberclad 2861 days ago

Since you seem to have the same quarrels with MATLAB as me, have you figured out a good way to run it in batch mode, without having to open the interactive console?

chrispeel 2860 days ago

Both Matlab and Julia can be run in batch mode, without going to the REPL.

cjhanks 2860 days ago

I empathize with you, but I don't have any tips. Any requests to me to make MATLAB part of a distributed computing were given a pretty hard 'no'.

thatcat 2861 days ago

What do you mean by production environment in the context of academia / R&D?

ms013 2861 days ago

Usually “production” in those contexts means running on a cluster or supercomputer somewhere. That, or code being run on a machine not controlled by the developer by users who do not write code but just run a pre-existing program with different inputs. That’s how I (and colleagues) typically use the term ‘production’ in the academic/research environment that I work in. The issue with things like MATLAB in production is the lack of a license on the machine you want to run on - often people have it on their workstation, but if you get time at a supercomputing center or department cluster, odds are your local license doesn’t translate over to the system you want to run on.

cjhanks 2860 days ago

There are R&D departments outside of academia.

But in both commercial R&D and academic R&D, some code really needs to live on for future people to benefit from it. It's just not a good use of research time for people to re-write everything from scratch... every time.

jernfrost 2860 days ago

I don’t use Julia for speed but because it is so easy to use and powerful.

Haskell e.g. is very powerful and elegant but time consuming and hard to learn. LISP is easy to learn and powerful but has kind of clunky syntax.

Ruby has quite nice syntax and is quite powerful but also kind of messy. Python is quite clean and easy to use but not as powerful.

Julia I would say has hit a sweet spot between all these languages. It is quick to learn and understand while also allowing you to write clean easy to read code. That may describe python. But with macros and multiple dispatch I would say it is a much more powerful language.

I also find it much nicer than Python to use as a script language as you got way more functionality out of the box.

I am a C++ developer professionally, and write little Julia script to help me with various boilerplate coding inn C++, processing assets etc. Julia is really quick to drop into when you need it.

With python I always forget which module some functionality is in. The name of a function etc. Julia has much better naming most useful stuff is already invluded in the automatically loaded base module.

newen 2860 days ago

Yeah Julia is really easy to pick up if you're a computer science person. I came across Julia when searching for a language with good linear algebra support, then I learned it over a week while implementing a paper I was reading. Two weeks later I implemented an improvement to that paper, which turned out to be very publishable. I basically owe a whole paper to Julia, which almost felt like a free paper lol.

gopalv 2861 days ago

> is there a reason to learn Julia?

JuMP is why I learned whatever I did of Julia and that was mostly because it allowed me to express some things better (not necessarily faster or in parallel).

A good example would be this random problem that came out of an interview discussion.

https://gist.github.com/t3rmin4t0r/44d8e09e17495d1c24908fc0f...

I'm almost sure my python implementation is wrong, but I can't quite prove it - the Julia one is trivial to understand (a dot product + a minimization function).

kevmo314 2861 days ago

Using an optimization library seems like the completely wrong approach to solving this interview question. Was that intentional?

pietjepuk88 2860 days ago

Aside from the special case of "1" always being present, I would say that in general you should just use an optimizer to solve knapsack problems. Whether you should do so for an interview question is up for debate I guess; using libraries shows you can get something done quickly and efficiently, but implementing your own solver might show you understand the underlying complexity.

As far as comparing code complexity of Julia to Python is concerned, I would say that when you use JuMP in Julia, you should use Pyomo/CasADi/PuLP/... in Python. That is not to say that I don't find JuMP to be a more appealing framework overall. It has wide support for all kinds of solvers, and some Julia/JuMP authors even wrote fairly good MIQCP/MICP solvers on top of commercial/open-source MILP/SOCP solvers.

kevmo314 2860 days ago

> I would say that in general you should just use an optimizer to solve knapsack problems.

Why do you say that? Perhaps if the knapsack problem is provably non-polynomial, but being able to distinguish those cases is a rather important skill.

For example, the problem you specified can more easily be solved directly: https://repl.it/repls/RotatingObeseBusinesses

With respect to the interview question context, if I had a candidate that implemented a solver, I would be inclined to say they completely overengineered the solution for a less optimal answer. There are a bunch of assumptions that are required for a solver to be optimal and unless the candidate can enumerate all of them and argue why the problem meets those constraints, the solver is not 100% guaranteed to be correct, whereas a direct solution is.

Not to mention, as you noted, edge cases, and the complete cryptic-ness of the code.

babahoyo 2861 days ago

Julia is increasingly becoming better than R and Stata for data cleaning. Many of its metaprogramming tools beat `dplyr` in syntax and features. So if the data-cleaning to regression stack (which i would guess is different than scientific computing) is your thing, then i would recommend trying Julia out.

zzleeper 2861 days ago

Early on (as a heavy Stata and Python user) I tried Julia and got quite discouraged by its messy treatment of missing values (and weights, etc). I've also tried R but also found lots of inconsistencies, so not enough reason to switch, besides when plotting nice graphs.

But I would say Julia is increasingly getting there. Comparable packages are WAY easier to write in Julia than in Stata/Mata, while being faster, so any gaps will keep disappearing in the next hears.

babahoyo 2861 days ago

you were somehow satisfied with the way stata handles missing values???

    gen x = y if z > 4 // headaches abound

Julia's missing value support is great now and is only going to get better. You have to be more careful with how you use them, but you won't get anything like the output above in julia.

* For reference, stata uses +Inf as missing value, so any operation with "greater then" is going to assign missing values to something. And yes, there have research papers retracted due to this behavior.

cuchoi 2860 days ago

One of my least favorite quirks of Stata

skybrian 2861 days ago

Did you see how Julia does missing values now?

https://julialang.org/blog/2018/06/missing

kirillseva 2861 days ago

Could you give some examples of how dplyr-based data cleaning code would look in modern julia?

babahoyo 2861 days ago

Check out DataFramesMeta, which unfortunately isn't working on 1.0 yet. They have basically a 1-1 matching of `dplyr` verbs to julia versions.

I don't think a standardized and idiomatic data-cleaning process has been established yet, which is for the best right now. There is `JuliaDBMeta` for metaprogramming with JuliaDB tables, and the `Queryverse` for working with a wide array of objects.

One way that Julia's metaprogramming shines is with the ability to go into the AST and replace symbols, enabling local scopes that are more readable than other scopes. One workflow I'm excited to experiment with is something like this

    @as my_long_dataset d begin # make d = my_long_dataset in this scope
    @with d begin 
    t = :x1 + :x2 + x3 # these symbols are arrays inside this @with scope
    d.new_var = t # assign the variable
    end
    end

Of course, with the `@as` macro you probably don't save that many keystrokes if you are just doing `d.x` or `d[:x1, :x2]`... The ecosystem is still evolving but the point is that I like how you can replicate something like `attach` scoping in R without all the headaches. I think it makes a cleaning script feel more like you are only working with the data you care about.

TallGuyShort 2861 days ago

It seems to be one of several lingua franca in the scientific computing community, and growing in popularity. In that context, its accessibility to people who don't have time to be software experts is another feature.

gnulinux 2861 days ago

Not caring about neither parallelism nor speed is queer. What're you optimizing?

benley 2861 days ago

Just to throw a few possibilities out there: Correctness? Ease of writing? Portability?

sgt101 2860 days ago

maintenance, readability, extensibility, reuse :)

gerdesj 2861 days ago

You might contrast the approach here with say an Engineering textbook. This manual on a particular tool (Julia) seems to imply that it is the one way to engage with an entire discipline. An Engineering textbook might mention various tools for a particular job and even endorse one over the others but in general it will start with the problem and not the solution.

That said:

$ aurman -S julia

(rolls up sleeves)

3rdAccount 2861 days ago

Can you explain that last part?

nur0n 2861 days ago

`aurman` is a (unofficial?) package manager for Arch Linux. The standard package manager is called `pacman`. Arch User Repository(AUR) is (IIRC) a repository of uncurated packages compatible with Arch.

To put it simply: he is implying that he will check out the book.

gerdesj 2860 days ago

Yes, I should have put pacman. The book is a great resource and despite my criticism has introduced me to julia.

3rdAccount 2859 days ago

Thank you for the explanation.

inamberclad 2861 days ago

Never wrapped my head around Julia. I like it, and I've used it for a couple things, but I've never had a use case compelling enough to keep at it.

sgt101 2860 days ago

It's elegant and powerful; there are very few coding constructs that are widely used that aren't in Julia, and those that are (like Classes) aren't there because the authors of the language don't think that they are useful, as opposed to "it's hard to implement". But YMMV, the downside is that the ecosystem is evolving, and it's just hit 1.0 so expect things to be smooth in 6mths to a year. The upside is that I find that the Julia code I write appears from the keyboard easily, quickly and in a form that I can understand a few weeks or a few months later.

vanderZwan 2860 days ago

> those that [aren't in the language ] (like Classes) aren't there because the authors of the language don't think that they are useful

From what I understand it's more that the combination of other features in the Julia language (like multiple dispatch) makes classes redundant.

alexeiz 2858 days ago

> In Julia indices start from 1.

Why? I programmed in Lua which also made such choice, and I find it rather inconvenient. Makes you always think if you got your indices right, since in all other programming languages indices start from 0.

pwaai 2861 days ago

im trying to figure out how to apply this to coding challenges.