Hacker News new | ask | show | jobs
by adamcanady 3326 days ago
I work on Google Search. Having one search engine per country doesn't seem like the correct approach to the problem.

> Every country has a mammoth collection of valid results for your query.

Having seen the corpus of content available from each language, this is categorically false. Consider Wikipedia, which is a fairly ubiquitous information source on the web that provides answers to tons of searches. English documents: 5.4M, Romanian documents: 376k.

Perhaps the solution to the OP's woes is more tools for filtering. This post conflates the ideas between language and country. Search typically returns to you results and search features that are in the query language. Today's solution where filtering is done through query refinement and query operators seems to cover a lot of use cases already.

Further, by having an integrated product, ML models can learn behaviors specific to certain locales where those behaviors differ from region to region, and balance with more universal behaviors that apply to more than one region.

3 comments

A server made out of Lego probably would not have been seen as the correct approach to the problem by an engineer at AltaVista a few years ago. Search engines by country is just a way of expressing the abstractions that are involved in chunking up search in a way that is orthogonal to building a service chunked around advertizing revenue, English, and Silicon Valley political philosophies.

Or to put it another way, 376k Wikipedia documents in Romanian is about 50% more than the number of articles in the last print version of Encyclopedia Britannica. The dismissal of their significance may express a worldview bubble that is endemic of Google and its business model.

Just because 376k Romanian documents is not enough to train a data center in how to sell chia pets to Bucharestians, doesn't mean that it is not a significant repository of information for actual human beings.

Slight off-topic and a shameless plug:

Are you guys working on removing the websites who somehow manage to be on top when you search for an entire class of queries? They are practically empty, they contain 10+ affiliate Javascripts (with which they supposedly make money from clicks?) and are basically a search query aggregators, yet Google haven't removed them yet.

Sorry I can't give examples, but these sites are out there and IMO should be outright banned. They bring zero value to customers and I'd argue that a good part of them are bringing zero value to Google as well -- they utilize some "SEO secrets" to get to the first page of Google without paying a penny.

Agree and, as I work for Seznam, especially agree on the ability to train models properly to satisfy local users. Not to mention that the user can choose what should be the language of the results.
I guess I wasn't too far off geographically with Romanian Wikipedia :)

Keep up the great work at https://www.seznam.cz/ ! Lots of interesting search features, maps, etc.