|
|
|
|
|
by version_five
1016 days ago
|
|
Thanks for clarifying. I'm interested in what C is leaving on the table in terms of performance. I saw your github implementation, I'd suggest you try submitting it as a show HN if you didn't already. (Looks like you did submit it, try it again with Show HN: and maybe more people will notice). I noticed that it says mojo is using six threads. Is that across cores or is it something else? Do you know what it's running in different threads? I also saw some discussion in the llama2.c issues about using BLAS for the matmul. I'd be curious to know what speedup this gives. |
|