Hacker News new | ask | show | jobs
by rjb7731 1191 days ago
The inference on the gradio demo seems pretty slow, about 250 seconds for a request. Maybe I am too used to the 4-bit quant version now ha!
1 comments

I'm sure it's partially the HN hug of death.