Hacker News new | ask | show | jobs
by version_five 1016 days ago
Thanks for clarifying. I'm interested in what C is leaving on the table in terms of performance. I saw your github implementation, I'd suggest you try submitting it as a show HN if you didn't already. (Looks like you did submit it, try it again with Show HN: and maybe more people will notice).

I noticed that it says mojo is using six threads. Is that across cores or is it something else? Do you know what it's running in different threads?

I also saw some discussion in the llama2.c issues about using BLAS for the matmul. I'd be curious to know what speedup this gives.

1 comments

I'm not that much in context regarding BLAS. People are trying to optimize the code as much as possible, but some optimizations are not approved to be merged due to over-complexity in the code understanding.