Hacker News new | ask | show | jobs
by cosmojg 807 days ago
There are already some implementations out there which attempt to accomplish this!

Here's an example: https://github.com/silphendio/sliced_llama

A gist pertaining to said example: https://gist.github.com/silphendio/535cd9c1821aa1290aa10d587...

Here's a discussion about integrating this capability with ExLlama: https://github.com/turboderp/exllamav2/pull/275

And same as above but for llama.cpp: https://github.com/ggerganov/llama.cpp/issues/4718#issuecomm...