Indeed Scraper

A simple web scraper that scrapes indeed.com. It is built with Scrapy.

What data does it scrape?

It scrapes the job title, company, location, salary and the short description of the jobs on the search results page. It doesn't go any further to the more detailed pages. There is more text data on these pages, but they are listed on the robots.txt file.

How can I use it?

First of all you have to have Python 3 and Scrapy installed on your computer. Then clone this repository. Move to the root direcory of the project and run: scrapy crawl jobs -o jobs.json

This will write the data to a json file in the project's root directory.

By default the scraper searches for the keyphrase "data science". You can change this behaviour by changing the variable "search_phrase" in indeed_scraper/spiders/indeed_scraper.py.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
example_data		example_data
indeed_scraper		indeed_scraper
README.md		README.md
scrapy.cfg		scrapy.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Indeed Scraper

What data does it scrape?

How can I use it?

About

Uh oh!

Releases

Packages

Languages

felix-datascience/indeed_scraper

Folders and files

Latest commit

History

Repository files navigation

Indeed Scraper

What data does it scrape?

How can I use it?

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages