| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by SpriglyElixir12 1069 days ago
	How would something like this work in practice? Would you generate any tags or summaries per site when inserting it into the db?

2 comments

sudobash1 1069 days ago

ArchiveBox can extract text from HTML (and possibly PDFs too). I think it can be configured to extract subtitles from YouTube videos as well. So it can do full text searches. Basically you could have your own, offline & curated search-engine.

link

janalsncm 1069 days ago

You could run a full text search or search against an auto-generated summary. Or if you want to be fancy, use semantic search like in Retrieval Augmented Generation.

link