Hacker News new | ask | show | jobs
by EvanMiller 4229 days ago
For some context, ArrayFire is a product of AccelerEyes, which began life selling a GPU booster for Matlab (a product called Jacket).

This and today's .NET announcement shows how hard it is to sell proprietary developer tools. I had considered using ArrayFire for some of my own commercial work, but in the end decided to roll my own OpenCL code in order to have better control. If you require cutting-edge performance (which is the reason you'd consider ArrayFire in the first place), there's just too much risk involved if the vendor doesn't get details like memory access order right on complex matrix problems. Open-sourcing reduces that risk quite a bit; if this decision had been made 3 years ago, I would have given the product a closer look.

From a business perspective, open-sourcing will murder their margins so they're basically gambling on their ability to jump-start volume. I think the product is in a tough position because most of the action these is going towards "Big Data," where data doesn't fit on a single machine -- let alone a GPU -- or towards heavy number-crunching, where hand-rolled kernels will outperform generic array libraries. They might have luck serving as a kind of backend to NumPy, but then they're two steps removed from the customer so it'll be hard building a relationship that leads to a sale.

As a side note, it seems odd to me that "native CPU" is a target distinct from OpenCL, which already runs on both CPUs and GPUs. I understand that kernels written for GPUs sometimes need to be rewritten for CPUs to take advantage of the different computation and memory architecture, but since their native CPU target isn't vectorized or multi-threaded, it seems like any further effort should be spent adapting the OpenCL kernels for CPU platforms rather than reinventing the wheel with a distinct C or assembler target.

I admire the general goal of making GPU processing more accessible, but it's a problem with a lot of nuance and requires a significant amount of customer education. GPUs are sort of like quantum computers in the limited sense that they're totally awesome at some tasks and totally suck at other tasks, and you need a solid grounding in the theory to distinguish the two sets of cases. Open-sourcing should at least help with the education angle, since ArrayFire now represents a respectable percentage of publicly viewable OpenCL code. (The open-source scene for OpenCL is pretty depressing right now.) In any case, good luck out there.

2 comments

Great thoughts and interesting to see your thought process along the way. For quite some time, we have made ArrayFire free for a single GPU usage, dipping a toe in building a user base. We have already started monetizing that free user base over the last several years and we are good at that already. So from a business perspective, we have no margins that are really at risk. We only have more money to make from this move!

And you are right. Too bad we didn't do this long ago!!! Hindsight is 20-20 as they say. I wrote about some of the internal deliberations we had on this decision here: http://notonlyluck.com/2014/07/31/the-decision-to-open-sourc...

> As a side note, it seems odd to me that "native CPU" is a target distinct from OpenCL, which already runs on both CPUs and GPUs.

We are planning to move towards a single library that dynamically loads the appropriate backend depending on the runtimes / drivers available. If we completely relied on OpenCL, the same binary will not work on machines without the OpenCL SDKs installed.

> I think the product is in a tough position because most of the action these is going towards "Big Data," where data doesn't fit on a single machine -- let alone a GPU -- or towards heavy number-crunching, where hand-rolled kernels will outperform generic array libraries

Well that is two part question. As for hand-rolled kernels, they will obviously be better if you know the problem type. But more often than not, our users are happy to get "X" times the speed up in "Y" hours as opposed to "(1.2 - 1.3)X" speedup in "(3-5)Y" hours.

As for Big data, this is something we are working on / towards. We have some ideas that will make scaling across multiple GPUs and multiple machines easier. Since we will be doing this publicly, I am sure we will get a lot of valuable feedback from the community.