Hacker News new | ask | show | jobs
by MrPowers 1913 days ago
Go has a ton of potential in the data science space.

A basic DataFrame library would go a long way. Doesn't have to be as full featured as Pandas. Just something that's maintainable and portable.

I wrote a blog post a few months ago on the current Go DataFrame libraries (gota, qframe, dataframe-go): https://mungingdata.com/go/dataframes-gota-qframe/. None of the current offerings are integrated with Arrow.

An Arrow-backed Go DataFrame library that can read / write Parquet files could really jumpstart data science in Go (really data engineering in Go, which is where they should probably focus first).

7 comments

Maybe a high-concurrency experiment runner or data flow engine, but Go would probably be the last "modern" language I think of as being good for data science.

All of the features that make it great for writing high-concurrency web applications would make it painful for writing tabular data processing, array manipulation & linear algebra, and plotting.

Nim seems a lot more practical; it's easy to bind to existing data science libraries, and you can use the macro system to build more expressive DSLs. That said, since Julia already does pretty much anything I would need to do (and will hopefully one day have a fast start up times and/or AOT compilation), I'm not sure why you would want to use Nim either. Maybe use it to write some kind of "mid-level" library code that binds to something like Torch, which you could then use from an even higher-level interactive language.

Apart from the incumbents -- Julia, Python (grandfathered in + you can use Hy/Hissp/Coconut), and R -- maybe you could have a good time doing data science in Common Lisp or Racket. Again: good CFFI story, macros for expressive DSLs, flexibility to run in interpreted and compiled modes, dynamic/gradual typing for easy iteration, etc.

Hell, I would sooner take Lua for data science over Go.

That said, I am an "Arrow maximalist", because the beauty of it is that you should be able to use data frames even in Go if you really want to, without reinventing the CSV parsing and memory layout wheels.

> data science in Common Lisp or Racket

Similarly, Chibi or Gambit Scheme.

> I would sooner take Lua for data science

Which provides for a low level language like Terra or a Lisp via Fennel or Urn.

Incidentally, Lua has DS history, as it was used by Yann LeCunn for torch, which was a Lua library.

There were a whole bunch of goodies in the surrounding ecosystem, as I recall.

Then Yann got acquired by FB, and it all got re-written in Python (hence pytorch, as opposed to torch which was in Lua).

> Go has a ton of potential in the data science space.

Does it? I'm not familiar with Go data science applications but the design of the language, tooling and runtime, eg low latency garbage collector, errors thrown for unused imports, do not, to me, seem to fit well with the needs of data science. I'm interested in hearing what advantages Go brings.

I guess the best thing would be the lightweight and simple concurrency model of go when it comes to data science applications. But other than that, I can't really think of a good reason why go should have so much potential.
How do unused imports relate to a language's suitability for data science? Your Python IDE adds and removes imports as you use them. Your Go IDE adds and removes imports as you use them. Unless you're using "ed" as your editor, it shouldn't even be something you see or ever think about.
> errors thrown for unused imports

You're doing something wrong if it doesn't get cleaned up automatically.

Data science is an experimental activity, whereas golang is explicitly a production platform. The amount of friction this will introduce is too high for practical use.

For example, in golang you will get a complication error if you have an unused variable, leading to significant extra work when exploring code level alternatives.

I can see a lot of potential in Go for data engineering specifically, yeah. Those would probably be some very stable and performant ETLs. And the concurrency and network primitives would make it easy to develop libraries like Prefect/Airflow.
Yep, agreed. Go is a great language for AWS Lambda type workflows.

Python isn't as great (Python Lambda Layers built on Macs don't always work). AWS Data Wrangler (https://github.com/awslabs/aws-data-wrangler) provides pre-built layers, which is a work around, but something that's as portable as Go would be the best solution.

Love awswrangler. I use that over boto whenever I have the opportunity.
We use Go for our ETL, with some Python too. We are in the process of transitioning to Argo Workflows from a K8s CronJob/Job setup which has been pretty stable itself.
The biggest hurdle for Go in this realm is honestly the Go—>C FFI latency. It severely limits acceleration
> Go has a ton of potential in the data science space.

I don't think that a language where you can't write generic map/fold/reduce and typed DataFrames (such as Spark's DataSet) has "a ton of potential".

Go is worse than nearly any dynamic or static language I know in that regards. Even Java has way more potential than Go.

In the same GitHub organization, there is https://github.com/goplus/pandas, but it seems to not have progressed past a README.