Can I ask, are you using a HuggingFace endpoint to hit the LLaMA models, or deploying it yourself? Still new to this and trying to understand how putting the large models in production works...
Hey hey! We have deployed this on our cloud. It’s running on 2 A10Gs on AWS in the background.
We had the tech from our MLOps platform NimbleBox.ai that let us setup a managed service on all major cloud providers so we just frankenstein-ed it to work for LLMs as well :)
The prompt engineering, specially for web search, is powered by our open-source tool ChainFury (https://chainfury.nbox.ai/)
We had the tech from our MLOps platform NimbleBox.ai that let us setup a managed service on all major cloud providers so we just frankenstein-ed it to work for LLMs as well :)
The prompt engineering, specially for web search, is powered by our open-source tool ChainFury (https://chainfury.nbox.ai/)