Hacker News new | ask | show | jobs
by yashap 1913 days ago
Go is a great language, but it seems terribly suited to data science. The popular data science languages are Python, R, Julia, and to a lesser extent Scala. They’re all extremely flexible languages, where you can easily write high level abstractions/DSLs, and they all have very strong functional programming support, because data science tends to be extremely functional. They also tend to be very concise languages.

Go is at the complete opposite end of the spectrum - not flexible at all, it’s purposefully difficult and awkward to write high level abstractions/DSLs, there’s very poor functional programming support, and it’s very verbose. There are great reasons for these restrictions, they’re intentional design decisions, but they also make it a very poor fit for data science IMO.

4 comments

Not trying to start anything, but what's functional about Python? It doesn't have/support tail recursion, a strong type system, pattern matching, immutability-by-default for lists and dictionaries.

From where I'm standing, python has some features that kinda look like functional programming concepts, but overall is an OO imperative language, like Ruby and many others.

My understanding for its preference from the DS community is due more for its library support in that domain.

> strong type system, pattern matching, immutability-by-default for lists and dictionaries

As a side note, its really interesting just how much the popular conception of "functional" has changed. 10 years ago, I don't think anyone would have listed any of those as being important or suggestive of functional programming. Nowadays, "functional" means "like Haskell" instead of "like Lisp." I think we need to be careful when we talk about functional programming because so many ideas have jumped the paradaigm and it means so many different things to different people.

Scheme and Standard ML standardized these very features in the 70s and 80s as a part of the “functional programming” paradigm.
Scheme doesn't have a "strong type system", "pattern matching", or "immutability-by-default".

Scheme didn't standardise those very features in the 70s and 80s – it still doesn't have them.

Some of those features are available in add-on libraries or as extensions in some specific Scheme implementations, but they are thus far absent from the standardised language.

That's true, but python isn't really functional in the "like lisp" way either. Things like ifs, loops, etc. are statements, not expressions. Lambdas are pretty limited (they can only be one line). There is no tail recursion.

Functions are first-class objects, and it supports higher-order functions, and had closure (even if in any non-trivial case you needed a full nested def) which were less common features when Python was first introduced, and probably why python was labeled as "functional." But now those are standard features in almost every modern language, so using that as a criteria for "functional" languages is not a very useful distinction.

No, Python's lambdas can have as many lines of code as you please. They are not limited to one line.

I'm not sure where this myth comes from, but I see it a lot. Maybe some people think that "lines of code" == "statements", but these are not remotely the same thing, even if they happen to coincide in simple cases.

Python's lambdas are limited to one expression in the implied return statement, but not allowing multiple statements in lambdas is no real limitation when programming in the functional style, as the true functional languages have no statements to speak of, only expressions, and their lambdas work exactly the same way Python's does. A single expression is all that a functional programming language's lambda needs.

Multiline lambdas are considered poor style in Python ("Why not use a `def`?" they'd say.), so you may not see them much, but they do work. The Hissp compiler, for example, relies on this feature. (I am the author of Hissp BTW.)

Python is not fully a functional programming language but it supports some functional patterns. There's a nice mini-ebook by David Mertz on functional programming in Python, and it used to be freely available but I can't find it at the moment. However, he wrote an article version here: https://developer.ibm.com/languages/python/articles/l-prog/

Also, pattern matching is coming to python in 3.10. You can read about it here: https://www.python.org/dev/peps/pep-0634/

Dry python returns library gets pretty close to feeling like scala cats. Unproductive diversions and all.
> a strong type system

It's a myth that dynamic languages can't have strong types. Python aborts almost immediately whenever it can. For instance, adding a number to a string? Exception. Accessing undefined properties?

Furthermore there's a language-standard static type checker, mypy.

> pattern matching

We have that in Python 3.10.

> immutability-by-default for lists and dictionaries

We do have tuples and frozendict.

Arguably its implementations of functional features are much weaker than "truly" functional ones such as Lisp, Haskell, OCaML or F#.

> Python aborts almost immediately whenever it can

doesn't sound very strong

Runtime type checking is most definitely not what people mean when they talk about strong type systems.
Strong != static.
I love python, and I don't know if the GP are good points, but your answer is really disengenuous.

> > pattern matching

> We have that in Python 3.10.

> > immutability-by-default for lists and dictionaries

> We do have tuples and frozendict.

3.10 like the version that is not released yet?

Tuples an frozendicts, so precisely non default list and dicts?

The tools are there but you don't like their names.
Neither tuples nor frozen dicts offer efficient (logarithmic or constant time) updates, like lists or balanced trees or HAMT do. You can't really write a program with only immutable structures in python, unless you accept it will be unbearably slow, even for python. Clojure, erlang, elixir, these are dynamically typed and functional.
I think the appeal is with Jupyter [1] notebooks. Python is not about performance. Usually numpy (or other libraries) that does the heavy lifting on another language anyway.

But having the Jupyter notebooks allows for intractability with the data. Make changes, and see how it affects every step after it.

[1] https://jupyter.org/

- map/reduce/filter/for-comps in the standard library. Go doesn't support this style of programming, and because of the lack of generics, you can't write generic data structures with these types of methods either. It's all loops and mutation in Go

- first class functions. Go does have these

- concise lambda syntax, that makes them nice/easy to use. Go has first class functions, but a very verbose/awkward lambda syntax

- can easily create your own generic data structures with functional interfaces (can't do this in Go b/c no generics)

- Python is pretty strongly typed, and if you meant statically typed, there's now optional static type checking in Python, similar to TypeScript (not as robust/well implemented though)

- Python has decent immutability support. For example, dataclasses (https://docs.python.org/3/library/dataclasses.html) with frozen=True are a lot like immutable classes in more purely functional languages (i.e. case classes in Scala). Tuples and named tuples. There are libs out there for frozen (a.k.a. immutable) dicts, lists, etc.

- Python is about to get pattern matching in 3.10

- functools (https://docs.python.org/3/library/functools.html)

- etc.

You can absolutely use Python in a very mutable-OO style, but it also has pretty good functional programming support. If you look at most Python data science code, it's written pretty functionally.

I'd say most important for data science applications is the ability to create generic data structures with functional interfaces - you can't do this in Go, makes it really awkward to write a lot of the foundational vector, data frame, etc. libraries, that basically all higher level data science libs depend on.

Functional languages don't need a strong type system
IDK if its Go's problem honestly. Data modeling is hard. Its hard for a reason. If a language like python makes it seem easy, its still hard but your perception and attitude towards it has changed because some of the busy work has been taken out of it - possibly in a way that costs you down the road.

Let's be honest programming languages are the punching bags of developers.

There are mainly two types of data scientists, A and B [1].

Those B types are probably want to use Go for building data analytics pipeline similar to Pachyderm[2]. If you want to go the way of the compiled language for data science and numerical analysis the best bet now is probably Fortran. The fact that Swift for Tensorflow project was started and terminated recently really showed that there is a need for a proper and modern compiled language for data science and numerical analysis.

There is, however, a dark horse in the data science and numerical analysis in the programming languages race that perhaps can satisfy both type A and B data scientists. The dark horse is D language. It supports functional, object oriented, borrow checker, inline assembler, REPL, metaprogramming, CTFE, open and multi-methods, just to name several modern features suitable for data science and numerical analysis but admittedly the eco-system is rather poor as of now (e.g. no library for Arrow). It also very fast to compile and run even with GC (the GC is also configurable) and you can selectively opt out for no GC inside the same code base if blazing speed is your things.

But the glimpse of what it is capable of are there already albeit still in infancy compared to the mature languages like Matlab, R or Fortran [3][4]. But hey, Rome was not built in a day.

[1]https://www.quora.com/What-is-data-science/answer/Michael-Ho...

[2]https://www.pachyderm.com/

[3]https://tech.nextroll.com/blog/data/2014/11/17/d-is-for-data...

[4]http://blog.mir.dlang.io/glas/benchmark/openblas/2016/09/23/...

That need is fulfilled by languages like Fortran, which is quite modern with OOP and generics, the age of punch cards is long gone.

Or HPC languages like Chapel.

Not only they are compiled, they offer first class support for distributed HPC and GPGPU computing.

Go is nowhere close to offer such capabilities.

Why not Julia?
Please check this post mortem on Julia[1].

Granted, this is probably a pre-mature assessment on Julia.

Coincidently the top most comments are lamenting on Google having a missed opportunity on Swift for TensorFlow project (mentioned in my original comments) and if it was done in Julia, the project would have been a success ¯\_(ツ)_/¯

[1]https://news.ycombinator.com/item?id=26384133

> Go is at the complete opposite end of the spectrum - not flexible at all,

You must be kidding. Go is the flexible one (not one of) in static popular languages. It is even more flexible than many dynamic languages. It supports function types as first-class citizen, closures, value methods as functions, type methods as functions, type deduction, .... IMHO, the main sell point of Go is not simplicity, but overall balance and flexibility: https://github.com/go101/go101/wiki/The-main-sell-point-of-G...

> there’s very poor functional programming support,

This is true currently, but this is not caused by lack of flexibility, it is caused by lack of custom generics instead.

Fair enough, flexible is an extremely loose term. I was referring mostly to the ability to a language that's flexible enough to let library/tool authors create their own very high level abstractions and DSLs. In Go, lack of custom generics often makes this very difficult. You look at the kind of APIs offered by mega-popular data science toolkits like pandas and Spark, it's really hard to offer something similar in Go. You end up with a lot of inferface{} types everywhere, vectors/series/whatever carrying their type in a struct field, etc.