Hacker News new | ask | show | jobs
by IvanHall 1560 days ago
Hello, Ivan here (the other founder).

1. Yes, any structured data could definitely help improve the results, I personally like the Wikidata dataset. It's just a matter of time and resources :)

2. The first step will probably be to handle this in our "post processing". We query several servers when doing a search and often get many more results than we need and in this step we could quite easily remove identical results.

3. The ranking is currently heavily based on links (same as Google) so we will have similar issues. But hopefully we will find some ways to better determine what sites are actually trustworthy, perhaps with more manually verified sites if enough people would want to contribute.

4. I think that Gigablast and Marginalia Search are really cool and interesting to see how much can be done with a very small team.

1 comments

> Yes, any structured data could definitely help improve the results

Which syntaxes and vocabularies do you prefer? microformats, as well as schema.org vocabs represented Microdata or JSON-LD, seem to be the most common acc to the latest Web Data Commons Extraction Report[0]. The report is also powered by the Common Crawl.

[0]: http://webdatacommons.org/structureddata/2021-12/stats/stats...