Hacker News new | ask | show | jobs
by joshuamorton 2542 days ago
> Do you even need to match Google's robots.txt parsing behavior? With less than 1000 lines you can be pretty sure they are not doing it right and are breaking plenty of people's assumptions about it.

This seems like a weird assertion. The specification isn't particularly complex (ignoring the implicit complexities of unicode). There are ~5 keywords and like 3 control characters. Why would you expect to need all that much?

1 comments

Very few people follow the specification or even know it exists.
I'm not talking about the formal specification, but the implicit specification of what people have been using for decades. That only has 5 keywords and a couple control characters. The formal spec is based on that informal spec, which again, isn't that complicated.

To be more direct: what are all of these assumptions you assume google's parser is mishandling?

Top comment [1] talks about noindex directive for example. Some people definitely expect it to work.

[1] https://news.ycombinator.com/item?id=20326098