Hacker News new | ask | show | jobs
by reisse 32 days ago
Nothing special?

I mean, inference engine might need to get some tweaks, to support whatever compute is available. But then, if you put a few terabytes of disk for swap, and replace RAM to bigger sticks if possible, it should work? Slowly, of course, but there is no reason it should not to.

1 comments

The big difference will be measuring seconds per token instead of tokens per second.
Seconds per token is just fractional tokens per second ;)
> fractional

Reciprocal?