Hacker News new | ask | show | jobs
Show HN: Speed up model inference on CPU with hand crafted layer implementations (github.com)
2 points by wanderinglight 582 days ago
Kaoken explores the performance of handcrafted layer implementation of common PyTorch layers.

The results show that for smaller models, using these "baked" layers enables real time inference without the need for a GPU. ore details in the README.

1 comments

Is the main idea to convert the model implementation from Python into C, then hardcode all possible values? Do you do this yourself in the generator code, or could you let the C preprocessor/compiler handle something like this by using macros? (might help with compile time/memory)

"NOTE: Ensure the device you are running on has no form of hardware acceleration like GPU or the results will be skewed"

How much does adding GPUs affect your performance improvement gains? I understand that the point of this optimization is for CPU-only machines, but it would be interesting to consider the affect your optimizations have when running on GPUs as well.

We let the generator code hardcore the weight into the generated source.

GPU performance significantly affects performance by as much as 20X. This project is only intended for cases where GPU is not available / desired due to cost or other constraints,