|
|
|
|
|
by ChuckMcM
3261 days ago
|
|
As someone who once oversaw the operation of a web crawler I can tell you its pretty simple, if it is "Okay" then the robots.txt file will tell you its allowed. If you look at the LinkedIn robots.txt (https://linkedin.com/robots.txt) you will see it is carefully groomed to allow various search engines look through specific sections of their web site, the rest are disallowed. Pretty much all of the case law comes down as there is a perfectly valid copyright on the 'collection' of a web site regardless of ownership of particular pieces, and the robots.txt is a well known and well understood mechanism for informing 'authorization' There is a "value" to LinkedIn to letting Google and other search engines crawl them, you get to see pages in your search results pointed at LinkedIn, so LinkedIn lets them crawl their pages. At the end of the day this is exactly a question of value. Microsoft knows that the collection of information in LinkedIn is valuable for a number of uses, if you want to pay them some of that value to get access to it, fine, if not then don't use it. Here is one possible outcome; Microsoft will tell them what it will cost to use their info, HiQ will probably not be able to meet it because they've built their existing pricing structure around "free" access, and then as they are going down the drain Microsoft will buy their assets and technology and LinkedIn will get this new service you can buy from them to help you find and retain people. |
|
Interpretation of that factual data would fall under copyright though.