|
|
|
|
|
by ColinHayhurst
772 days ago
|
|
Not the whole web; LinkedIn and a few others block us and we fully respect robots.txt, but we have ~8 billion pages. edit: from article, "Doing this for a few urls is easy but doing it for billions of urls starts to get tricky and expensive (although not completely out of reach)" - indeed so, but we have now done embeddings for about half of those ~8 billion pages and are using them for mojeek.com. We have an API with many features including uniquely authority and ranking scorings. Embeddings could be added. https://www.mojeek.com/services/search/web-search-api/ used by Kagi, Meta and others. Self-disclosure; Mojeek team member. |
|