|
|
|
|
|
by szszrk
617 days ago
|
|
It looks nice, I'm positively surprised by search results I got on demo page. Ai response is more or less similar to what I'm used to, graphics contain YouTube thumbnails from videos that are super related to the topic I asked (one that took me a while to stumble upon, but is a huge knowledge source), text results are decent... I never looked into private/selfhosted search.
How does such service gather data from web? What's the original source, who does scrapping and how do you update it? |
|
About the source of the search results, both text and images, they're all from [SearXNG](https://github.com/searxng/searxng/).
> SearXNG is a free internet metasearch engine which aggregates results from more than 70 search services. Users are neither tracked nor profiled.
SearXNG by itself offers a full-stack platform for you to run searches privately (you can find public instances at <https://searx.space/>, and easily host yourself [via docker](https://github.com/searxng/searxng-docker)).
About how they scrape other search engines, it's really simple: HTTP calls and parsing of HTML (for most of them).
In MiniSearch, I don't need to save the results by myself. The scrape is done in real-time by SearXNG and passed to MiniSearch, which in turn runs a similarity search and filters out the textual results that don't seem that useful.
But I can say the real differential of MiniSearch is that it's mobile-first. Since the beginning, it was made to run on the browsers of Chrome/Safari/Firefox Mobile, and [Wllama](https://github.com/ngxson/wllama) together with [Web-LLM](https://github.com/mlc-ai/web-llm), along with LLMs of <1B parameters, allowed it!
If you're curious, here's the HN post I made about it a year ago: https://news.ycombinator.com/item?id=37885752