Hacker News new | ask | show | jobs
by tadeegan 1643 days ago
Tensorflow is just such a classic clusterfuck google project. V2 had huge breaking changes (reminiscent of angular) and tons of the apis are clunky and don’t work well together. There are like 3 different ways to save a model. It’s almost like a bunch of teams built features with no oversight.

I’m pretty sure tf is considered in maintenance mode within google as Brain and the tf creators themselves have moved to Jax. I do think Google learned a lot from tensorflow and am excited to see Jax pan out.

Pytorch is a pleasure to debug. I think pytorch jit could close the deployment gap.

5 comments

The article gives credit to TF server and TFLite and so forth as being better for deployment, but leaves out the fact that those systems don't fucking work most of the time, and support is pretty much over at this point. The same goes for model support; even the models in TF's own repository are sometimes broken or don't follow the API conventions set forth in the documentation. I honestly don't know how anyone uses TF in production at this point, unless they are frozen on a specific old version and have figured out an environment that works with their specific models already.
Yeah, TensorFlow's API has definitely gotten convoluted and confusing. I think the shift from TF1 to TF2 and then later wrapping Keras in TF just caused a lot of problems.

TensorFlow seems to be spreading itself pretty thin. Maintaining so many language bindings, TensorFlow.js, TFlite, Server, etc. seem like they could all use some focus, BUT, and this is a big but, do you think if they can get each part of their ecosystem to an easily usable point that they'll have cornered the industry sector?

PyTorch is taking a much more targeted approach as seen with PyTorch Live, but I truly think that TFLite + Coral will be a game-changer for a lot of industries (and Google will make a fortune in the process). To me it seems like this is where Google's focus has lain in the AI space for the past couple of years.

What do you think?

> I truly think that TFLite + Coral will be a game-changer for a lot of industries

I'd like to agree. Google was very far ahead of the curve when they released Coral. I was completely stoked when they finally added hardware video encoding to the platform with the release of the Dev Board Mini.

I want them to succeed but I fear if they don't drastically improve their Developer Experience, others will catch up and eat their lunch. TensorFlow has been hard to pick up. A few years ago when I was trying to pick this up to create some edge applications, PyTorch wasn't so much easier that it seemed worth sacrificing EdgeTPU support. But now PyTorch seems much, much easier than it did then, while TensorFlow hasn't seemed to improve in ease-of-use.

Now I'm genuinely considering sacrificing TFLite / EdgeTPU in favor of, say Jetson-esque solutions just so that I can start doing something.

Note: I am an amateur/hobbyist in this context, I am not doing Edge machine learning professionally.

Yeah, I hear you loud and clear on a lot of those points. I think the most important think honestly is the fact that most PhDs use PyTorch in academia, so industry will inevitably shift to tailor to this growing supply if possible. Of course, Coral/TFLite are really useful, so a balance will be found, but it'll be interesting to see how it plays out.
> unless they are frozen on a specific old version and have figured out an environment that works with their specific models already

Mostly this I suspect

Totally agree on the debugging. The fact that PyTorch is more pythonic and easier to debug makes it the better choice for a lot of applications.

Are you in research? I think TensorFlow's position in industry puts it in a kind of too-big-to-fail situation at this point. It'll be interesting to see what happens with JAX, but for now TensorFlow really is the option for industry.

Do you think TFLite + Coral devices will help breathe new life into TF?

> Tensorflow ... V2 had huge breaking changes

Meanwhile PyTorch doesn't follow SemVer and always has breaking changes for every minor version increment. There's always "Backwards Incompatible Changes" section for every minor version release: https://github.com/pytorch/pytorch/releases

Even TF 1 was just an extension of Google Brain: the project that took a datacenter of CPUs in Google to distinguish cats and dogs in Youtube videos with very high accuracy. I remember when Jeff Dean was talking about it the first time, it felt like magic (though it still feels like it, it’s just more optimized magic :) ).
Any other deployment issues for PyTorch you’re aware of that would help ‘close the gap’?
I think PyTorch c++ api is less mature and harder to compile into other projects. Tensorflow started with the c++ api exposed which is why the graph format is so stable and favorable to deployment in heterogeneous environments.