Hacker News new | ask | show | jobs
by y14 3551 days ago
As a ruby developer who had to move to python for its data science support, it’s very nice to see that people help Ruby to evolve into this direction. Ruby is a beautiful language that should be expanded beyond web development and these kind of libraries will make it happen.

But, as encouraging as it is, if your’e thinking about developing a real, production-ready, data science project in ruby - don’t. At least not yet. The libraries around machine learning, neural networks etc are old, unmaintained and usually don’t even work.

2 comments

I love Ruby too. Sadly, Treat and stanford-core-nlp all appear to have be broken given the latest 3.6.0 update to the Stanford NLP lib.

My approach thus far has been to write a simple wrapper script around the library (in it's native language) and then do a standard output of the result set in JSON.

Then within Ruby, I just do a `shell`, capture the result and do a JSON parse on the response. It's crude, but works reliably well, I don't have to worry about any bridge libraries supporting the latest language version.

> if your’e thinking about developing a real, production-ready, data science project in ruby - don’t.

Why is this? Are there inherent limitations in the language that has Ruby taking a backseat to Python in ML and maths/statistical applications? Or is it just due to neglect by the community?

(I started leaning ruby about ten months ago and have only just started to gain some proficiency, so I do not know much about the workings of the ecosystem - or the innards of ruby for that matter)

I'm not sure if there are inherent language limitations (not that much of an expert), but know there's more momentum around Python for ML/data science work, mainly as a result of a few good resources specifically for it, which has encouraged more libraries and developer support to focus on it.

So perhaps less about neglect from the Ruby community, and more proactive-ness from the Python one.

I think the key factor here has been numpy, the scientific library for python. Academics used python because of it, and they are the ones who wrote the neural networks tooling.

We probably can expect to see implementations in all languages at some point. Floating point errors are not even that a big deal since we're dealing with statistics anyway.

That being said, neural networks are very resource/computation heavy. I wrote one in golang and cut my execution time in half just by encoding my matrices as flat arrays instead of two dimensional arrays. If ruby is to be used to build neural networks, it will need to perform the big work in a binary binding, like tensorflow does with its C++ layer.

I doubt that anybody is thinking of actually running the networks in ruby, or python, or even C on the GPU. They're all run on the GPU anyway.