Hacker News new | ask | show | jobs
by lolinder 849 days ago
True! It's important to first understand the fundamentals of what makes an LLM "good" and what makes it fast, but yes, there are lots of techniques you can apply right before and during the inference step that can trade off between speed and capabilities.

Different prompting techniques like what you're describing are one way, and RAG [0] and ART [1] are also in a similar category.

[0] https://stackoverflow.blog/2023/10/18/retrieval-augmented-ge...

[1] https://www.promptingguide.ai/techniques/art