Web-Scrapping-Python

Requisites:

pip install requests
pip install bs4 or pip install beautifulsoup4
pip install pandas pip install lxml or pip install html5lib

Note:
1) After running the program, we will obtain the squeezed text. We can copy and paste it in a text file.( I have saved as Output_Web_Scrapping.txt)

2)It will take some time to create the json file. Until process ends , dont kill that. ( I have stored the results in IndeedData.json)

3) IndeedData.json has very small data because I have not mentioned Attributes.

job_post_object = {
                            "job_title": div.find(name="a").text.encode('utf-8'),
                            "company": div.find(name="span").text.encode('utf-8'),
                            "location": div.find(name="span").text.encode('utf-8'),
                            "summary": div.find(name='span').text.encode('utf-8'),
                            "salary": div.find(name="div").text.encode('utf-8')
                    }

But if I mention Attributes like this,

job_post_object = {
                            "job_title": div.find(name="a", attrs={"data-tn-element":"jobTitle"}).text.encode('utf-8'),
                            "company": div.find(name="span", attrs={"class":"company"}).text.encode('utf-8'),
                            "location": div.find(name="span", attrs={"class": "location"}).text.encode('utf-8'),
                            "summary": div.find(name='span').text.encode('utf-8'),
                            "salary": div.find(name="div").text.encode('utf-8')
                    }

I get "error object has no attribute text" which means that the given URL is not user-friendly. So we can use an alternative URL.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
IndeedData.json		IndeedData.json
Output_Web_scrapping.txt		Output_Web_scrapping.txt
README.md		README.md
web_scrapping.py		web_scrapping.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Web-Scrapping-Python

About

Uh oh!

Releases

Packages

Languages

arthimj/Web-Scrapping-Python

Folders and files

Latest commit

History

Repository files navigation

Web-Scrapping-Python

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages