|
|
|
|
|
by antinucleon
2952 days ago
|
|
Summary:
Tensor program is able to be optimized by using machine learning and transfer learning. The numerical program optimization model is trained on feature from low-level AST of the program.
Experiments:
Tasks: ResNet, MobileNet, LSTM LM, DQN
Hardware: CUDA/ARM GPU/ARM CPU
Speed up compare to CUDNN, TensorFlow Lite and ARMComputeLib: ~from 1.2X to 3.8X faster in end-to-end test. |
|