| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by he11ow 1064 days ago
	This looks great! Can I ask, are you using a HuggingFace endpoint to hit the LLaMA models, or deploying it yourself? Still new to this and trying to understand how putting the large models in production works...

1 comments

namanski 1064 days ago

Hey hey! We have deployed this on our cloud. It’s running on 2 A10Gs on AWS in the background.

We had the tech from our MLOps platform NimbleBox.ai that let us setup a managed service on all major cloud providers so we just frankenstein-ed it to work for LLMs as well :)

The prompt engineering, specially for web search, is powered by our open-source tool ChainFury (https://chainfury.nbox.ai/)

link

he11ow 1063 days ago

Thanks! Will check out these services, MLOps is definitely where the biggest pain points are right now.

link