Hacker News new | ask | show | jobs
by T-hawk 4114 days ago
Google handles some constructs with punctuation as atomic tokens as special cases. C# and C++ are examples. A# through G# also return appropriate results, for the musical notes. H# and onward through the alphabet do not.

.NET is another example. Google will ignore a prepended dot on most words, but .NET is handled specially as an atomic token. I would bet this is a product of human curation, not of algorithms that have somehow identified .NET as a semantic token.

Searching for punctuation in a general case is hard, though. You wouldn't want a search for Lisp to fail to match pages with (Lisp). We often forget that the pages are tokenized and indexed, that Google and the other search engines aren't a byte-for-byte scan across the entire web.

I was recently trying to understand the difference between the <%# and <%= server tags in ASP.NET. Google couldn't even interpret those as tokens to search for. It took me a long time to figure out the former's true name as the data-bind operator in order to search for that and find the MS docs.

1 comments

Occasionally it's useful to spell out the names of the characters, both when searching and when writing documentation, blog posts, and SO Q&A. That way, searching for "asp.net less than percent hash" might tell you it's the data-bind operator.