| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by asciimoo 24 days ago

Ohi, I'm the author of the open source Searx metasearch engine.

I'm working on a self-hosted search service called Hister with the same goal when I started Searx development: reduce dependence on online search engines.

Hister is a full text indexer for websites and local files which automatically saves all the visited pages rendered by your browser. It provides a flexible web (and terminal) search interface & query language to explore saved content with ease or quickly fall back to traditional search engines. This is a fundamentally different approach than what Searx follows and solves most of the weaknesses of metasearch engines. Of course it has its own weaknesses as well, but most of these are not conceptual and can be resolved by improving the software (and datasets)

I've been using it for a few months and as my local index is growing I can avoid relying on external search engines - and even websites listed in results - more and more frequently.

The initial reception is overwhelmingly positive with already more than 30 contributors and hundreds of contributions. Currently it can help with "recall" type searches mainly, but I'm planning to provide pre-indexed thematic datasets and I'm drafting a peer-to-peer index sharing concept. Maybe you can find it useful as well (or at least have some constructive criticism =]).

Links: - https://hister.org/ - https://github.com/asciimoo/hister - Background/motivation/beginnings: https://hister.org/posts/how-i-cut-my-google-search-dependen... - Small read-only demo: https://demo.hister.org/

10 comments

ys-matt 24 days ago

Have been using hister for a while now and have found it super useful! There are so many times I find myself trying to remember a website I looked at a couple months ago and can't find it again via a regular search. Hister has saved me there already multiple times.

The only feedback I have is the initial indexing from my large history was rough. There were a lot of domains that kept blocking me for exceeding rate limiting or wouldn't let me index at all. I could see it being useful to import a history file and organize it by domain inside some sort of temporary database to track/distribute attempts and get a more detailed report on complete domain failures.

Regardless though - great work!

asciimoo 24 days ago

Thank you for your feedback, it's super useful to get insights from users.

I agree, browser import has rough edges. The issue you mentioned is known: https://github.com/asciimoo/hister/issues/31 . I try to prioritize it and find time to fix it.

GodelNumbering 24 days ago

This looks very promising. Thank you for investing time in this.

Assuming it indexes everything locally and falls back to traditional search engines if none found, how do you feel about adding a shared middle layer? A layer that simply indexes all the canonical data that doesn't have any personal info. This way, the contributors can automatically contribute the pages they index - building a shared search engine over time! The whole thing can work without a crawler of its own (under appropriate license so people can trust it)

asciimoo 24 days ago

This is an awesome idea in theory, I'd love to go to this direction, but it's a surprisingly complex topic. I find it hard to come up with an implementation that can guarantee both result quality (no malicious actors) and user privacy.

I'd appreciate any kind of help designing such system. We are on IRC/Discord/Github/Codeberg.

bobajeff 24 days ago

This is great news!

Hister sounds like a idea I had years ago but gave up on after running into issues with index size taking up way too much storage.

Long ago I've used Searx and really liked it but after some point didn't see the point as opposed to using Google more directly. But lately in the back of my mind I've thinking about checking in on it again.

rdmuser 24 days ago

It's great seeing some more varied takes on search engines like this. That's essentially the same reason I use inoreaders rss search to find articles when I want to revisit them etc and it has been super handy. I know there have been some projects focused on rss search engines like OpenOrb that have some similarities to Hister. Makes me wonder if Hister could seed its history using rss.

asciimoo 24 days ago

It isn't supported yet, but it's a good idea and quite simple to implement it. I've added it to the issue tracker: https://github.com/asciimoo/hister/issues/431 . Thanks for the suggestion.

renegat0x0 24 days ago

I use my own domain index to navigate the web.

- If I wanted to use use my domain list to start hister, to download my preconfigured / like domains?

- Can I make some pages to rank higher in it?

- Can I assign tags to pages (by which I could later on filter?)

My domain index

- https://github.com/rumca-js/Internet-Places-Database

asciimoo 24 days ago

> If I wanted to use use my domain list to start hister, to download my preconfigured / like domains?

Yes, Hister has a built-in crawler which supports standard HTTP and different browser based backends

> Can I make some pages to rank higher in it?

You can create priority rules to boost the ranking of the matching domains/URLs

> Can I assign tags to pages (by which I could later on filter?)

It is possible to add a label to indexed documents

DavideNL 23 days ago

Noob question;

So if i only use the Firefox extension, all pages Hister will "fetch and store" will have gone through my browsers content blocker (uBlock Origin) before being saved ?

asciimoo 23 days ago

Yes, the extension extracts the content of the tabs as your browsers renders them. This includes applying all your extension's modifications as well.

vinni2 23 days ago

I use a self-hosted Searxng to make it work with my local LLM RAG setup. It works amazingly. Thank you for creating this. I will check out the hister as well.

asciimoo 23 days ago

Nice! Hister provides an MCP interface so you can easily integrate it to your RAG setup.

satvikpendem 24 days ago

Thank you for making Searx, I use it as the web search tool MCP for local models and it works very well, so that not only big companies have the power to show or hide results now.

_ache_ 24 days ago

I'm on a self-hosted seaxng. It's great. I just need a computer up 24h/7d.

A VPS with without a black listed IP is good. A simple rootless container, update is easy.

Configuration takes little time, not much.

I still hate that I have to double the bang to use the same bang as DDG.

Example: "!!wde Ente" to go to the German wikipedia page about duck instead of "!wde Ente" with DDG.

qznc 24 days ago

Happy hister user here. Thank you!