I think a free candy jar is a better example. If you take one, you're doing what was intended. If you take two or three, probably nobody cares. If you walk up and dump the full bowl into your backpack, the security guard may come over and say something.
Another example is that it's not legal to just walk into someone's house because they didn't lock the door.
I'm not a lawyer but I assume these are the gist of what LinkedIn is claiming. That the site was intended to be authorized "for normal individual use", that a scraper goes beyond normal individual use (taking a candy out of the bowl vs taking the entire candy bowl), and that just because it's possible doesn't mean it's allowed (leaving your door unlocked is not permission for strangers to enter).
> Another example is that it's not legal to just walk into someone's house because they didn't lock the door.
This is a false dichotomy. In the case of scraping it's more akin to asking if you can enter someone's home, and being told yes, then the homeowner saying it was illegal for you to enter. Remember, scrapers are not **taking** data, they are **requesting** it. And every byte of data they receive is because some web server agreed to send it to them. If the company doesn't want you to have that data, it's their job to not give it to you, not your job to not ask for it.
I don't see robots.txt as do not crawl. It is more of a what to show or not show in search results. If I were a new search engine, I'd still crawl disallow website paths but not include the results in search results.
I literally work for a crawler. Under absolutely no circumstances would we ever ignore robots, because we desperately do not want to be blocked.
And crawling something, storing it later, and then trying to use that for anything sounds like a recipe for a news article on how your disallowed website ended up in my machine learning database.
In many cases it can lead to lawsuits, because frequently websites that compete or aggregate data will not allow you to crawl, and if they find you doing so, will sue you and you will not be able to prove that you were not using that data for your business and harming theirs.
In many cases it can lead to lawsuits, because frequently websites that compete or aggregate data will not allow you to crawl, and if they find you doing so, will sue you and you will not be able to prove that you were not using that data for your business and harming theirs
Yet web crawling is legally allowed, so how on earth could an aggregate site say that it's not permitted? I can understand copyright issues that may come up, e.g. if somebody just stole data from another company and presented it as their own, but that's a separate issue to web crawling. It also seems a bit hypocritical for an aggregate site that probably gets some of its data from web crawling to say that you're not allowed to crawl us, even if you're just going to use that data in something like a web archive for example.
It’s rather easy (technically) to deny someone else access to your website, and sue them if they use it against your terms and conditions to negatively impact your business. All of the companies that operate real web crawlers also have other businesses that directly compete with, or generally operate in similar spaces with, the people you crawled it from.
As a trivial example, a company that operates online shopping can easily be exposed to liability if they crawl a company that aggregates customer reviews, because it’s nearly impossible to show that you didn’t use that data to change how you present shopping results, materially benefiting from the data, which the aggregation site would sell you as a separate business model. You’ve effectively stolen their product, and it’s easy to see how a civil court can award damages based on that.
I don’t see how terms & conditions are relevant. It’s not like ‘by using this website you accept its terms & conditions’ is legally enforceable, and if you’re accessing public info without an account you didn’t accept any terms.
They’re free to block you of course but suing? On what legal grounds?
People love to get overexcited about legal things because they've seen it in movies. In the real world, it's much harder to successfully sue than many people realize.
Update in 2023 when the case has concluded: scraping of public profiles is legal, just avoid scraping private profiles with underhanded tactics that are illegal.
There's a elaborated piece breaking down the whole case development of hiQ vs LinkedIn.
SorteKanin@reddit
"Looking at public posters is legal, court reaffirms."
Eurynom0s@reddit
I think a free candy jar is a better example. If you take one, you're doing what was intended. If you take two or three, probably nobody cares. If you walk up and dump the full bowl into your backpack, the security guard may come over and say something.
Another example is that it's not legal to just walk into someone's house because they didn't lock the door.
I'm not a lawyer but I assume these are the gist of what LinkedIn is claiming. That the site was intended to be authorized "for normal individual use", that a scraper goes beyond normal individual use (taking a candy out of the bowl vs taking the entire candy bowl), and that just because it's possible doesn't mean it's allowed (leaving your door unlocked is not permission for strangers to enter).
Rarelyimportant@reddit
> Another example is that it's not legal to just walk into someone's house because they didn't lock the door.
This is a false dichotomy. In the case of scraping it's more akin to asking if you can enter someone's home, and being told yes, then the homeowner saying it was illegal for you to enter. Remember, scrapers are not **taking** data, they are **requesting** it. And every byte of data they receive is because some web server agreed to send it to them. If the company doesn't want you to have that data, it's their job to not give it to you, not your job to not ask for it.
bloody-albatross@reddit
Do scrapers need to honor robots.txt though?
JoshYx@reddit
No
Big search engines support and honor robots.txt and that's enough to fulfill their purpose
BenjiStokman@reddit
Search engines don't legally need to honor robots.txt. They do anyway because there's many ways to punish crawlers that don't.
zepperoni-pepperoni@reddit
Also I think that doing that might lead to regulation about it, which they wouldn't want
April1987@reddit
I don't see robots.txt as do not crawl. It is more of a what to show or not show in search results. If I were a new search engine, I'd still crawl
disallow
website paths but not include the results in search results.deleted_by_reddit@reddit
I literally work for a crawler. Under absolutely no circumstances would we ever ignore robots, because we desperately do not want to be blocked.
And crawling something, storing it later, and then trying to use that for anything sounds like a recipe for a news article on how your disallowed website ended up in my machine learning database.
In many cases it can lead to lawsuits, because frequently websites that compete or aggregate data will not allow you to crawl, and if they find you doing so, will sue you and you will not be able to prove that you were not using that data for your business and harming theirs.
It’s a big deal.
colaclanth@reddit
Yet web crawling is legally allowed, so how on earth could an aggregate site say that it's not permitted? I can understand copyright issues that may come up, e.g. if somebody just stole data from another company and presented it as their own, but that's a separate issue to web crawling. It also seems a bit hypocritical for an aggregate site that probably gets some of its data from web crawling to say that you're not allowed to crawl us, even if you're just going to use that data in something like a web archive for example.
deleted_by_reddit@reddit
It’s rather easy (technically) to deny someone else access to your website, and sue them if they use it against your terms and conditions to negatively impact your business. All of the companies that operate real web crawlers also have other businesses that directly compete with, or generally operate in similar spaces with, the people you crawled it from.
As a trivial example, a company that operates online shopping can easily be exposed to liability if they crawl a company that aggregates customer reviews, because it’s nearly impossible to show that you didn’t use that data to change how you present shopping results, materially benefiting from the data, which the aggregation site would sell you as a separate business model. You’ve effectively stolen their product, and it’s easy to see how a civil court can award damages based on that.
Ravek@reddit
I don’t see how terms & conditions are relevant. It’s not like ‘by using this website you accept its terms & conditions’ is legally enforceable, and if you’re accessing public info without an account you didn’t accept any terms.
They’re free to block you of course but suing? On what legal grounds?
cosyrelaxedsetting@reddit
People love to get overexcited about legal things because they've seen it in movies. In the real world, it's much harder to successfully sue than many people realize.
greatgolem66@reddit
Update in 2023 when the case has concluded: scraping of public profiles is legal, just avoid scraping private profiles with underhanded tactics that are illegal. There's a elaborated piece breaking down the whole case development of hiQ vs LinkedIn.