Hacker News new | ask | show | jobs
by DebtDeflation 1094 days ago
> "build domain specific LLMS using your own data",

It seems to me that the vast majority of these people would be better off just doing semantic search with their documents chunked, run through an embeddings process, and stored in a vector database, with the search queries and results then run through an LLM at the final step to create an actual "answer". For applications where this is not practical, I agree that LoRA should be the next approach. I have a hard time believing that the future is everyone training their own domain specific LLMs from the ground up.

1 comments

I wholeheartedly agree with this. Vector databases are easily updatable, searchable by recency, and you can verify where the information came from. Training a custom frozen LLM for every company seems insane. Each company’s data is not that unique - it’s just the numbers that matter, for which you need a vector or traditional database.