Hacker News new | ask | show | jobs
by rajatmonga 3142 days ago
TensorFlow Lite is TensorFlow’s lightweight solution for mobile and embedded devices! TensorFlow has always run on many platforms, from racks of servers to tiny devices, but as the adoption of machine learning models has grown over the last few years, so has the need to deploy them on mobile and embedded devices. TensorFlow Lite enables low-latency inference of on-device machine learning models.

Looking forward to your feedback as you try it out.

8 comments

> Looking forward to your feedback as you try it out.

Thanks Rajat. We use typical Cortex-A9/A7 SoCs running plain Linux rather than Android. We would use it for inference.

1. Platform choice

Why make TFL Android/iOS only? TF works on plain Linux. TFL even uses NDK and it would appear the inference part could work on plain Linux.

2. Performance

I did not find any info on performance of TensorFlow Lite. Mainly interested in inference performance. The tag "low-latency inference" catches my eye, just want to know how low is low latency here? milliseconds?

1. The code is standard C/C++ with minimal dependencies so it should be buildable on even non-standard platforms. Linux is easy.

2. The interpreter is more optimized for being low overhead and the kernels are better optimized especially for ARM CPUs currently. While model performance varies by model - we have seen significant improvements on most models going from TensorFlow to TensorFlow Lite. We'll share benchmarks soon.

> The code is standard C/C++ with minimal dependencies so it should be buildable on even non-standard platforms. Linux is easy.

Glad to hear that Rajat. Since it is easy as you say, I look forward to your upcoming release with Linux as standard. :-)

Also interested in answers to these two questions, as well as OpenCL performance in vanilla linux (iMX6 and above).
Will CoreML (or any hardware acceleration) on iOS be supported?
We want to provide a great experience across all our supported platforms, and are exploring ways to provide a simpler experience with good acceleration on iOS as well.
Woah this is cool. I’ve been waiting for this since you announced it. I was thinking about benchmarking it against other solutions . What do you think about other similar frameworks like coreml ?
What tradeoffs did you make compared to the original?
A few tradeoffs we had to make:

- As mentioned below - flatbuffers makes the startup time faster while trading off some flexibility

- Smaller code size means trading off dependency on some libraries and broader support vs writing more things from scratch more focused on the user cases people care about

Do you have performance/memory comparisons from using flatbuffer vs protobuf in TF? A quick writeup with how switching effected performance would be really interesting :)
Flatbuffers also uses less memory.
Using FlatBuffers, for one?
Hi, I have written about this before ( https://news.ycombinator.com/item?id=15595689 ) , but are there serialization fixes between cloud training and mobile ?

We have had huge issues in trying to figure out how to save models (freeze graph,etc) and load it on Android. If you look at my previous thread - it also mentions bugs,threads and support requests where people are consistently confused.

Agree, that is a big problem that we are working hard to solve. It isn't solved in this release, but it is high up on our task list.
hey, thanks for the reply!

petewarden (https://news.ycombinator.com/item?id=15596990) from Google is also working on this - so im really hopeful you guys will have something soon. This is a serious blocker for doing anything reasonable in TF.

Will this leverage the Pixel Visual Core SoC on a Pixel 2 device?
This release of Tensorflow Lite doesn't leverage the Pixel Visual Core. We will explore different hardware options available to us in the future.
TF Lite supports Android NN API that allows each phone to accelerate these models leveraging the custom accelerator on the phone.
What about using XLA to compile libraries for mobile deployment fusing only the operations needed by the model?
One nice thing about Lite is that it's a lot easier to just include the operations you need (compared to TensorFlow 'classic'), there's fusion for common patterns, and the base interpreter is only 70KB. That covers a lot of the advantages of using XLA for mobile apps. In return you have the ability to load models separately from the code, and the ops are hand-optimized for ARM.

I'm still a fan of XLA, and I expect the two will grow closer over time, but I think Lite is better for a lot of scenarios on mobile.

How about quantization? Does tensorflow lite perform quantization or is it tensorflow supposed to do it? Is it iterative process or straightforward? Or are you training quantized models as nn api docs say?
The quantization is done with a special training script that is quantization aware. We will be open sourcing a mobilenet quantized training script to show how to do this soon.
TensorFlow Lite is an interpreter in contrast with XLA which is a compiler. The advantage of TensorFlow lite is that a single interpreter can handle several models rather than needing specialized code for each model and each target platform. TensorFlow Lite’s core kernels have also been hand-optimized for common machine learning patterns. The advantage of compiler approaches is fusing many operations to reduce memory bandwidth (and thus speed). TensorFlow lite fuses many common patterns in the TensorFlow converter. We are of course excited about the possibility of using JIT techniques and using XLA technology within the TensorFlow Lite interpreter or as part of the TensorFlow Lite converter as a possible future direction.
Is it lite enough to compile with Emscripten and use via WebAssembly?
This should be possible, but we haven't tried it. We're likely going to add a simplified target that has minimal dependencies (like no Eigen) that allows building on simple platforms.
Cool. I have something else that uses Eigen in WebAssembly, so that hasn't caused any issues btw.