Hacker News new | ask | show | jobs
by stanbiryukov 809 days ago
I recommend reviewing Stanford's dspy library - great examples of few-shot learning that works by generating and tuning prompts for LLMs and even distilling instruction following tasks to smaller models like T5. Second, as others mentioned, using QLoRA for supervised fine-tuning followed by DPO/KTO for preference optimization. This strategy placed Huggingface's Zephyr and IBM's Neural Chat on leaderboards for 7B parameter models. I also recommend reviewing the Unsloth library which has excellent accelerated examples of using these methods, along with the axolotl library. Lastly, skypilot and Modal both have excellent examples that showcase using axolotl to efficiently finetune models on cloud GPUs. [1] https://github.com/stanfordnlp/dspy [2] https://github.com/unslothai/unsloth [3] https://github.com/OpenAccess-AI-Collective/axolotl [4] https://github.com/skypilot-org/skypilot [5] https://github.com/modal-labs/llm-finetuning
1 comments

i looked at dspy last week, and was trying to wrap my head around how it would be useful for a "fine tune" style use case - where i would want to give the base model more context vs use a vector DB and have the model put together a result.

could you give a high level way to think about how to use dspy for something like this?

I think of dspy as a programmatic way to guide LLMs with information, whether from context based on retrieval or from input and output pairs, rather than traditional low-rank fine-tuning. Their readme has a high-level introduction to using RAG with a user defined way to pass relevant context. I also found their link to Weaviate's notebooks, where dspy is used with a vector DB, helpful in understanding an end-to-end workflow: [1] https://github.com/weaviate/recipes/tree/main/integrations/d...