Hacker News new | ask | show | jobs
by thorum 458 days ago
In the age of local LLMs I’d like to see a personal recommendation system that doesn’t care about being scalable and efficient. Why can’t I write a prompt that describes exactly what I’m looking for in detail and then let my GPU run for a week until it finds something that matches?
7 comments

You could just run a local LLM over every document and ask it "is this related to this query". I don't think you actually want to wait a week (and holding all the documents you might ever want to search would run to petabytes).

(the reasonable way is embedding search, which runs much faster with some precomputation, but you still have to store things)

A better way would be to ask the LLM to generate keywords (or queries). And then use old school techniques to find a set of documents, and then filter those using another LLM.
How is that better than embeddings? You’re using embeddings to get a finite list of keywords, throwing out the extra benefits of embeddings (support for every human language, for instance), using a conventional index, and then going back to embeddings space for the final LLM?

That whole thing can be simplified to: compute and store embeddings for docs, compute embeddings for query, find most similar docs.

Yes, you can do the "old school search" part with embeddings.
Ah, I had interpreted “old school search” to mean classic text indexing and Boolean style search. I’d argue that if it’s using embeddings and cosine similarity, it’s not old school. But that’s just semantics.
The entire library of Congress is like 10TB. You don’t need anything near petabytes until you get out of text into rich media.
Common Crawl is petabytes. Anna's Archive is about a petabyte, but it includes PDFs with images.
It's worth pointing out that even with the largest models out there, coherence drops fast over length. In a local home ML setup, until somebody radically improves long-term coherence, models with < x memory may be a diametrically opposed constraint to something that still says the right thing after > y minutes of search.
Why would it take a week?

Is this because you want it to continuously watch for live data that could match your need?

Because thinking takes time.
This is exactly what I am hoping to get sometimes (but I would say, 1 week is maybe a little long).

If I go through my current tasks and see, that for some task I need a set of documents, emails, .., why cant I just prompt the system to get it in 30-ish minutes. But as someone already stated Apple Intelligence is supposed to fill this gap.

> maybe a little long

Many of us have ongoing problems pending for years - for just "a week", "where do I sign".

It really depends on the task.

this is sort of like a dream I had https://medium.com/luminasticity/the-county-map-of-the-world...

>The idea was that he could graft queries in this that he did not expect to finish quickly but which he could let run for hours or days and how freeing it was to do more advanced research this way.

or it keeps monitoring the web and notify me whenever something that matches my interests shows up -- like a more sophisticated Google alert. I really would love that.
Why can't you?

Just run the biggest model you can find out of swap and wait a long time for it to finish.

You'll obviously see more focus on smaller models, because most people aren't willing to wait weeks for their slop, and also don't have server GPU clusters to run huge models.

> Just run the biggest model you can find out of swap

This kills the SSD