|
|
|
|
|
by ekelsen
1023 days ago
|
|
Can you share details of the build failure on the github? We'll try to help. The inference code is shared as a proof of concept, it is not meant to be a production ready deploy. Also worth noting that not all LLMs are used to produce text which is read by humans. |
|
It’s funny you say production, because all of the errors I ran into suggest the container is expecting your production architecture.
My advice is stream first then make synchronous convenience wrappers on top of that. Also, lean on community standards for PoC. I’m guessing your investors are interested in making this scale as cheaply as possible, but that is probably the least important feature for people evaluating your model’s quality locally.