|
|
|
|
|
by ychen306
236 days ago
|
|
It's orders of magnitude cheaper to serve requests with conventional methods than directly with LLM. My back-of-envelope calculation says, optimistically, it takes more than 100 GFLOPs to generate 10 tokens using a 7 billion parameter LLM. There are better ways to use electricity. |
|
I realize it sounds inhuman, but so is working in enterprise IT! :)