Hacker News new | ask | show | jobs
by d4rkp4ttern 1171 days ago
Thanks for clarifying. The support for local LLMs seems very interesting — would a haystack agent call out to a separately “running” self-hosted LLM via an API (REST, etc) or would it need to actually load up the model and directly query it (e.g model.generate(<prompt>) ) ?

Also it seems like the functionality of haystack subsumes those of langchain and llama-index (fka GPT-index) ?

2 comments

Haystack Agents are designed in a way so that you can easily use them with different LLM providers. You just need to implement one standardized wrapper class for your modelprovider of choice (https://github.com/deepset-ai/haystack/blob/7c5f9313ff5eedf2...)

So back to your question: We will enable both ways in Haystack: 1) Loading a local model directly via Haystack AND 2) quering self-hosted models via REST (e.g. Huggingface running on AWS SageMaker). Our philosophy here: The model provider should be independent from your application logic and easy to switch.

In the current version, we support for local models only option 1. This works for many of the provided models provided by HuggingFace, e.g. flan-t5. We are already working on adding support for more open-source models (e.g. alpaca) as models like Flan-T5 don't perform great when used in Agents. The support for sagemaker endpoints is also on our list. Any options you'd like to see here?

To be precise - I don't think I'm saying 'local LLMs' above :) But technically possible, I guess, just hasn't been part of what's officially available. (There are also licensing issues still.) To answer your question about the APIs - the Agent itself queries OpenAI via REST to break the prompt down into tasks, then works with the underlying tools/pipelines using Python API (and then, e.g., a Transformer model that's part of the pipeline has to be 'loaded' into a GPU). Part of those pipelines might be using Promptnode (that can work with hosted LLMs via REST, but could also work with a local LLM). Re 'subsume' - well, that depends :) But arguably, you can build an NLP Python backend with Haystack only, of course.. Regardless of how complex your underlying use case is, or whether it's extractive, generative or both.