Hacker News new | ask | show | jobs
by tbenst 2190 days ago
First of all this is very cool. Dunno if author is on here, but I’m curious why both Flux and Knet are used rather than just one of them (Flux seems the most Julianic?).

Also, is this really faster than PyTorch/TF? Last time I benchmarked Flux for non-trivial networks, the speed was quite good with small models but memory usage was ~5x higher than pytorch, and I couldn’t fit my models on the GPU for flux. For large models, I had to compromise on batch size in Julia, although maybe with Zygote.jl the memory issues have been resolved?

4 comments

I suspect FLux/Knet are still slightly slower and less memory efficient than PyTorch/TF, although things are moving very fast here!

This is not relevant in understanding AlphaZero.jl speed though. The reason it is much faster than Python implementations is because tree search is also a bottleneck, and Julia shines here!

Ah, I hadn’t appreciated this. Thanks for making & sharing your code!
Author here. AlphaZero.jl supports both Flux and Knet indeed and users can choose whatever framework they want to use.

As far as I understand, Flux and Knet have different strengths. I think Knet is a bit more stable and mature for large-scale Deep Learning, but Flux shines for "scientific-ML" usecases where low AD overhead is crucial.

While some may be addressed and others are being addressed, what would really help us if people file issues when they don't find performance to be adequate. If you still have the code handy, please do open some issues.
I ran the test 15 months ago using example code from Metalhead vs PyTorch examples repo. Unfortunately my test consisted of staring at nvidia-smi, so don’t have code handy. I believe I benchmarked Resnet.

Edit: I also had an issue with VGG and opened an issue: https://github.com/FluxML/Metalhead.jl/issues/42. Perhaps this has since been resolved

> 5x higher than pytorch, and I couldn’t fit my models on the GPU for flux. For large models, I had to compromise on batch size in Julia

I had the exact same experience. While I like Julia and Flux I can't use it in this state for my models.

Would you mind opening corresponding issues on the repo? That would help guide the ongoing compiler work.