Hacker News new | ask | show | jobs
by keefle 813 days ago
Wonderful work!

is it possible to make it only use a subset of the web? (Only sites that I trust and think are relevant to producing an accurate answer), and are there ways to possibly make it work offline on pre installed websites? (wikipedia, some other wikis and possibly news sites that are archived locally), and how about other forms of documents? (books and research papers as pdfs)

3 comments

Seconded. I tried to do this many years ago for my dissertation and failed, but this would be a dream of mine.
Would it not be possible to create a search engine that only crawls certain sites?
I was most interested in the offline aspect of it, which I wouldn't know where to even start with if I were to fork.

How do you parse and efficiently store large, unstructured information for arbitrary, unstructured queries?

You put it in a search server, like ElasticSearch or Meili.
Llocalsearch uses searxng which has a feature to blacklist/whitelist sites for various purposes.
also a great idea to expose this to the frontend. thanks :)
uhhhh both ideas are great, would you like to turn them into github issues? i will definitely look into both of them :)