Hacker News new | ask | show | jobs
by he11ow 1064 days ago
This looks great!

Can I ask, are you using a HuggingFace endpoint to hit the LLaMA models, or deploying it yourself? Still new to this and trying to understand how putting the large models in production works...

1 comments

Hey hey! We have deployed this on our cloud. It’s running on 2 A10Gs on AWS in the background.

We had the tech from our MLOps platform NimbleBox.ai that let us setup a managed service on all major cloud providers so we just frankenstein-ed it to work for LLMs as well :)

The prompt engineering, specially for web search, is powered by our open-source tool ChainFury (https://chainfury.nbox.ai/)

Thanks! Will check out these services, MLOps is definitely where the biggest pain points are right now.