|
|
|
|
|
by jfoster
4170 days ago
|
|
Exactly. It's so difficult to get URLs "right", and that's quite non-obvious until you do something like writing a crawler. Another example is whether foo.com/bar is the same as foo.com/BAR. Usually yes, but it's entirely possible that they will serve different content. Also, which URL parameters should be disregarded, and which should be considered important? A crawler must do quite a bit of nontrivial page introspection in order to figure out the answer to that all on its own. Often pages that are essentially the same will be a bit different. Timestamps and time-sensitive data (eg. listings on a marketplace) will trip you up, here. |
|