| > How do you rank results? There's a ton of factors. https://github.com/MarginaliaSearch/MarginaliaSearch/blob/ma... > Can you give some rough indications of how many pages you index in total? I index like 300 million documents right now, though I crawl something like 1.4 billion (and could index them all). The search engine is pretty judicious about filtering out low quality results, mostly because this improves the search results. > How many page you crawl each day? I don't know if I have a good answer for that. In general the crawling isn't really much of a bottleneck. I try to refresh the index completely every ~8 weeks, and also have some capabilities for discovering recent changes via RSS feeds. > Size of the machine(s) in RAM and HDD? It's an EPYC 7543 x2 SMP machine with 512 GB RAM and something like 90 TB disk space, all NVMe storage. |