Assignment #4, due Jan 13, 9am: Scrape Wikipedia data for current S&P 500 constituents

Your task is to collect and tidy Wikipedia data for the companies that constitute the Standard & Poor’s 500 index. You can find a convenient list [here](https://en.wikipedia.org/wiki/List_of_S%26P_500_companies). The idea is to scrape some data from each companies’ Wikipedia page and to prepare a tidy dataset that contains that data. You can decide yourself what data you want to collect for each constituent but things that come to mind are:

•	The info in the top right infobox
•	The length of the Wikipedia article
•	Some info on its revision history

Clearly, this is not an exhaustive list. Collect whatever data you find to be interesting and can get in a standardized way for a reasonable subset of firms. The tidy datasets should be stored in the “data” directory. If you feel like it, you can also prepare an informative visual based on your scraped data. 

You can use whatever packages or resources you find helpful for the task. As always, please make reference to all used resources in the code. Ideally, your code runs in the docker container. For Python users: Please submit plain Python code, no Jupyter notebooks please.

The deadline for this task is Monday, January 13th, 2020, 9am. Feel free to use this issue to discuss things that need clarification or to help each other.

Please note that I will be offline from Friday, 20th Dec, until Sunday, 5th Jan, 2020. Enjoy the break! 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Assignment #4, due Jan 13, 9am: Scrape Wikipedia data for current S&P 500 constituents #16

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Assignment #4, due Jan 13, 9am: Scrape Wikipedia data for current S&P 500 constituents #16

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions