Hacker News new | ask | show | jobs
by visarga 1953 days ago
hate to break the party but this model only loads a small part of itself in RAM when inferencing
1 comments

That's a good thing. Less completely means more energy for interestingness, and less expense means more accessibility.