Skip to content

Commit b57b638

Browse files
committed
Integrate suggested changes to copyright section.
1 parent df0c5e7 commit b57b638

File tree

6 files changed

+65
-27
lines changed

6 files changed

+65
-27
lines changed

docs/search/search_index.json

Lines changed: 1 addition & 1 deletion
Large diffs are not rendered by default.

docs/section-5-legal-and-ethical-considerations/index.html

Lines changed: 42 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -337,8 +337,8 @@
337337
</li>
338338

339339
<li class="md-nav__item">
340-
<a href="#dont-steal-copyright-and-fair-use" class="md-nav__link">
341-
Don't steal: Copyright and fair use
340+
<a href="#copyright-respecting-others-intellectual-property" class="md-nav__link">
341+
Copyright: respecting other's intellectual property
342342
</a>
343343

344344
</li>
@@ -348,6 +348,19 @@
348348
Better be safe than sorry
349349
</a>
350350

351+
<nav class="md-nav">
352+
<ul class="md-nav__list">
353+
354+
<li class="md-nav__item">
355+
<a href="#challenge" class="md-nav__link">
356+
Challenge
357+
</a>
358+
359+
</li>
360+
361+
</ul>
362+
</nav>
363+
351364
</li>
352365

353366
<li class="md-nav__item">
@@ -427,8 +440,8 @@
427440
</li>
428441

429442
<li class="md-nav__item">
430-
<a href="#dont-steal-copyright-and-fair-use" class="md-nav__link">
431-
Don't steal: Copyright and fair use
443+
<a href="#copyright-respecting-others-intellectual-property" class="md-nav__link">
444+
Copyright: respecting other's intellectual property
432445
</a>
433446

434447
</li>
@@ -438,6 +451,19 @@
438451
Better be safe than sorry
439452
</a>
440453

454+
<nav class="md-nav">
455+
<ul class="md-nav__list">
456+
457+
<li class="md-nav__item">
458+
<a href="#challenge" class="md-nav__link">
459+
Challenge
460+
</a>
461+
462+
</li>
463+
464+
</ul>
465+
</nav>
466+
441467
</li>
442468

443469
<li class="md-nav__item">
@@ -499,16 +525,21 @@ <h3 id="dont-break-the-web-denial-of-service-attacks">Don't break the web: Denia
499525
<p>In fact, this is such an efficient way to disrupt a web site that hackers are often doing it on purpose. This is called a <a href="https://en.wikipedia.org/wiki/Denial-of-service_attack">Denial of Service (DoS) attack</a>.</p>
500526
<p>Since DoS attacks are unfortunately a common occurence on the Internet, modern web servers include measures to ward off such illegitimate use of their resources. They are watchful for large amounts of requests appearing to come from a single computer or IP address, and their first line of defense often involves refusing any further requests coming from this IP address.</p>
501527
<p>A web scraper, even one with legitimate purposes and no intent to bring a website down, can exhibit similar behaviour and, if we are not careful, result in our computer being banned from accessing a website.</p>
502-
<h3 id="dont-steal-copyright-and-fair-use">Don't steal: Copyright and fair use<a class="headerlink" href="#dont-steal-copyright-and-fair-use" title="Permanent link">&para;</a></h3>
503-
<p>It is important to recognize that in certain circumstances web scraping can be illegal. If the terms and conditions of the web site we are scraping specifically prohibit downloading and copying its content, then we could be in trouble for scraping it.</p>
504-
<p>In practice, however, web scraping is a tolerated practice, provided reasonable care is taken not to disrupt the “regular” use of a web site, as we have seen above.</p>
505-
<p>In a sense, web scraping is no different than using a web browser to visit a web page, in that it amounts to using computer software (a browser vs a scraper) to acccess data that is publicly available on the web.</p>
506-
<p>In general, if data is publicly available (the content that is being scraped is not behind a password-protected authentication system), then it is OK to scrape it, provided we don’t break the web site doing so. What is potentially problematic is if the scraped data will be shared further. For example, downloading content off one website and posting it on another website (as our own), unless explicitely permitted, would constitute copyright violation and be illegal.</p>
507-
<p>However, most copyright legislations recognize cases in which reusing some, possibly copyrighted, information in an aggregate or derivative format is considered “fair use”. In general, unless the intent is to pass off data as our own, copy it word for word or trying to make money out of it, reusing publicly available content scraped off the internet is OK.</p>
528+
<h3 id="copyright-respecting-others-intellectual-property">Copyright: respecting other's intellectual property<a class="headerlink" href="#copyright-respecting-others-intellectual-property" title="Permanent link">&para;</a></h3>
529+
<p>It is important to recognize that in certain circumstances web scraping can be illegal, and this <strong>differs from country to country</strong>.</p>
530+
<p>If the terms and conditions of the web site we are scraping specifically prohibit downloading and copying its content, then we could be in trouble for scraping it. In practice, however, web scraping is a tolerated practice, provided reasonable care is taken not to disrupt the “regular” use of a web site, as we have seen above. However you must be aware that without permisson from the copyright owner you <em>may</em> be in breach of copyright law.</p>
531+
<p>In a sense, web scraping is no different than using a web browser to visit a web page, in that it amounts to using computer software (a browser vs a scraper) to acccess data that is publicly available on the web. However, researchers should be aware of the risk since the law views web browsing differently to automated web scraping.</p>
532+
<p>In general, if data is publicly available (the content that is being scraped is not behind a password-protected authentication system), then it may be OK to scrape it, provided we don’t break the web site doing so. What is potentially problematic is if the scraped data will be shared further. For example, downloading content off one website and posting it on another website (as our own), unless explicitly permitted, may constitute a violation of copyright law.</p>
533+
<p>Copyright law in some countries recognises "fair use" (USA) or "fair dealing" (Australia) which may, under very specific circumstances, allow reusing some copyrighted material. However the scope of these exceptions is narrow and you should not assume they apply to your case.</p>
534+
<p>For an interesting (Australian) copyright case involving web scraping, see <a href="https://www.claytonutz.com/knowledge/2009/april/copyright-in-compilations-under-the-spotlight-in-high-court">IceTV vs Channel Nine</a>.</p>
508535
<h3 id="better-be-safe-than-sorry">Better be safe than sorry<a class="headerlink" href="#better-be-safe-than-sorry" title="Permanent link">&para;</a></h3>
509536
<p>Be aware that copyright and data privacy legislation typically differs from country to country. Be sure to check the laws that apply in your context. For example, in Australia, it can be illegal to scrape and store personal information such as names, phone numbers and email addresses, even if they are publicly available.</p>
510537
<p>If you are looking to scrape data for your own personal use, then the above guidelines should probably be all that you need to worry about. However, if you plan to start harvesting a large amount of data for research or commercial purposes, you should probably seek legal advice first.</p>
511-
<p>If you work in a university, chances are it has a copyright office that will help you sort out the legal aspects of your project. The university library is often the best place to start looking for help on copyright.</p>
538+
<p>If you work in a university, chances are it has a copyright office that will help you sort out the legal aspects of your project. The university library is often the best place to start looking for help on copyright related queries.</p>
539+
<h4 id="challenge">Challenge<a class="headerlink" href="#challenge" title="Permanent link">&para;</a></h4>
540+
<ul>
541+
<li>What are the contact details for the copyright office (or similar) at your organisation ?</li>
542+
</ul>
512543
<h3 id="be-nice-ask-and-share">Be nice: ask and share<a class="headerlink" href="#be-nice-ask-and-share" title="Permanent link">&para;</a></h3>
513544
<p>Depending on the scope of your project, it might be worthwhile to consider asking the owners or curators of the data you are planning to scrape if they have it already available in a structured format that could suit your project. If your aim is do use their data for research, or to use it in a way that could potentially interest them, not only it could save you the trouble of writing a web scraper, but it could also help clarify straight away what you can and cannot do with the data.</p>
514545
<p>On the other hand, when you are publishing your own data, as part of a research project, documentation or a public website, you might want to think about whether someone might be interested in getting your data for their own project. If you can, try to provide others with a way to download your raw data in a structured format, and thus save them the trouble to try and scrape your own pages!</p>

docs/sitemap.xml

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -2,42 +2,42 @@
22
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
33
<url>
44
<loc>https://monashdatafluency.github.io/python-web-scraping/</loc>
5-
<lastmod>2020-09-04</lastmod>
5+
<lastmod>2022-07-27</lastmod>
66
<changefreq>daily</changefreq>
77
</url>
88
<url>
99
<loc>https://monashdatafluency.github.io/python-web-scraping/section-0-brief-python-refresher/</loc>
10-
<lastmod>2020-09-04</lastmod>
10+
<lastmod>2022-07-27</lastmod>
1111
<changefreq>daily</changefreq>
1212
</url>
1313
<url>
1414
<loc>https://monashdatafluency.github.io/python-web-scraping/section-1-intro-to-web-scraping/</loc>
15-
<lastmod>2020-09-04</lastmod>
15+
<lastmod>2022-07-27</lastmod>
1616
<changefreq>daily</changefreq>
1717
</url>
1818
<url>
1919
<loc>https://monashdatafluency.github.io/python-web-scraping/section-2-HTML-based-scraping/</loc>
20-
<lastmod>2020-09-04</lastmod>
20+
<lastmod>2022-07-27</lastmod>
2121
<changefreq>daily</changefreq>
2222
</url>
2323
<url>
2424
<loc>https://monashdatafluency.github.io/python-web-scraping/section-3-API-based-scraping/</loc>
25-
<lastmod>2020-09-04</lastmod>
25+
<lastmod>2022-07-27</lastmod>
2626
<changefreq>daily</changefreq>
2727
</url>
2828
<url>
2929
<loc>https://monashdatafluency.github.io/python-web-scraping/section-4-wrangling-and-analysis/</loc>
30-
<lastmod>2020-09-04</lastmod>
30+
<lastmod>2022-07-27</lastmod>
3131
<changefreq>daily</changefreq>
3232
</url>
3333
<url>
3434
<loc>https://monashdatafluency.github.io/python-web-scraping/section-5-legal-and-ethical-considerations/</loc>
35-
<lastmod>2020-09-04</lastmod>
35+
<lastmod>2022-07-27</lastmod>
3636
<changefreq>daily</changefreq>
3737
</url>
3838
<url>
3939
<loc>https://monashdatafluency.github.io/python-web-scraping/section-7-references/</loc>
40-
<lastmod>2020-09-04</lastmod>
40+
<lastmod>2022-07-27</lastmod>
4141
<changefreq>daily</changefreq>
4242
</url>
4343
</urlset>

docs/sitemap.xml.gz

0 Bytes
Binary file not shown.

markdowns/section-5-legal-and-ethical-considerations.md

Lines changed: 13 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -27,17 +27,19 @@ Since DoS attacks are unfortunately a common occurence on the Internet, modern w
2727
A web scraper, even one with legitimate purposes and no intent to bring a website down, can exhibit similar behaviour and, if we are not careful, result in our computer being banned from accessing a website.
2828

2929

30-
### Don't steal: Copyright and fair use
30+
### Copyright: respecting other's intellectual property
3131

32-
It is important to recognize that in certain circumstances web scraping can be illegal. If the terms and conditions of the web site we are scraping specifically prohibit downloading and copying its content, then we could be in trouble for scraping it.
32+
It is important to recognize that in certain circumstances web scraping can be illegal, and this **differs from country to country**.
3333

34-
In practice, however, web scraping is a tolerated practice, provided reasonable care is taken not to disrupt the “regular” use of a web site, as we have seen above.
34+
If the terms and conditions of the web site we are scraping specifically prohibit downloading and copying its content, then we could be in trouble for scraping it. In practice, however, web scraping is a tolerated practice, provided reasonable care is taken not to disrupt the “regular” use of a web site, as we have seen above. However you must be aware that without permisson from the copyright owner you _may_ be in breach of copyright law.
3535

36-
In a sense, web scraping is no different than using a web browser to visit a web page, in that it amounts to using computer software (a browser vs a scraper) to acccess data that is publicly available on the web.
36+
In a sense, web scraping is no different than using a web browser to visit a web page, in that it amounts to using computer software (a browser vs a scraper) to acccess data that is publicly available on the web. However, researchers should be aware of the risk since the law views web browsing differently to automated web scraping.
3737

38-
In general, if data is publicly available (the content that is being scraped is not behind a password-protected authentication system), then it is OK to scrape it, provided we don’t break the web site doing so. What is potentially problematic is if the scraped data will be shared further. For example, downloading content off one website and posting it on another website (as our own), unless explicitely permitted, would constitute copyright violation and be illegal.
38+
In general, if data is publicly available (the content that is being scraped is not behind a password-protected authentication system), then it may be OK to scrape it, provided we don’t break the web site doing so. What is potentially problematic is if the scraped data will be shared further. For example, downloading content off one website and posting it on another website (as our own), unless explicitly permitted, may constitute a violation of copyright law.
3939

40-
However, most copyright legislations recognize cases in which reusing some, possibly copyrighted, information in an aggregate or derivative format is considered “fair use”. In general, unless the intent is to pass off data as our own, copy it word for word or trying to make money out of it, reusing publicly available content scraped off the internet is OK.
40+
Copyright law in some countries recognises "fair use" (USA) or "fair dealing" (Australia) which may, under very specific circumstances, allow reusing some copyrighted material. However the scope of these exceptions is narrow and you should not assume they apply to your case.
41+
42+
For an interesting (Australian) copyright case involving web scraping, see [IceTV vs Channel Nine](https://www.claytonutz.com/knowledge/2009/april/copyright-in-compilations-under-the-spotlight-in-high-court).
4143

4244

4345
### Better be safe than sorry
@@ -46,7 +48,11 @@ Be aware that copyright and data privacy legislation typically differs from coun
4648

4749
If you are looking to scrape data for your own personal use, then the above guidelines should probably be all that you need to worry about. However, if you plan to start harvesting a large amount of data for research or commercial purposes, you should probably seek legal advice first.
4850

49-
If you work in a university, chances are it has a copyright office that will help you sort out the legal aspects of your project. The university library is often the best place to start looking for help on copyright.
51+
If you work in a university, chances are it has a copyright office that will help you sort out the legal aspects of your project. The university library is often the best place to start looking for help on copyright related queries.
52+
53+
#### Challenge
54+
55+
- What are the contact details for the copyright office (or similar) at your organisation ?
5056

5157
### Be nice: ask and share
5258

requirements.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,7 @@ pytz==2019.3
4343
PyYAML==5.3
4444
pyzmq==18.1.1
4545
requests==2.22.0
46+
setuptools==57.5.0
4647
six==1.14.0
4748
soupsieve==1.9.5
4849
tabulate==0.8.7

0 commit comments

Comments
 (0)