Skip to content

Requisites: pip install requests pip install bs4 or pip install beautifulsoup4 pip install pandas pip install lxml or pip install html5lib Store the Results: filename: output.json Note: After running the program, we will obtain the squeezed text. We can copy and paste it in a text file.( I have saved as Output_Web_Scrapping.txt) It will take som…

Notifications You must be signed in to change notification settings

arthimj/Web-Scrapping-Python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Web-Scrapping-Python

Requisites:

pip install requests
pip install bs4 or pip install beautifulsoup4
pip install pandas pip install lxml or pip install html5lib


Note:
1) After running the program, we will obtain the squeezed text. We can copy and paste it in a text file.( I have saved as Output_Web_Scrapping.txt)

2)It will take some time to create the json file. Until process ends , dont kill that. ( I have stored the results in IndeedData.json)

3) IndeedData.json has very small data because I have not mentioned Attributes.
job_post_object = {
                            "job_title": div.find(name="a").text.encode('utf-8'),
                            "company": div.find(name="span").text.encode('utf-8'),
                            "location": div.find(name="span").text.encode('utf-8'),
                            "summary": div.find(name='span').text.encode('utf-8'),
                            "salary": div.find(name="div").text.encode('utf-8')
                    } 

But if I mention Attributes like this,
job_post_object = {
                            "job_title": div.find(name="a", attrs={"data-tn-element":"jobTitle"}).text.encode('utf-8'),
                            "company": div.find(name="span", attrs={"class":"company"}).text.encode('utf-8'),
                            "location": div.find(name="span", attrs={"class": "location"}).text.encode('utf-8'),
                            "summary": div.find(name='span').text.encode('utf-8'),
                            "salary": div.find(name="div").text.encode('utf-8')
                    } 

I get "error object has no attribute text" which means that the given URL is not user-friendly. So we can use an alternative URL.

About

Requisites: pip install requests pip install bs4 or pip install beautifulsoup4 pip install pandas pip install lxml or pip install html5lib Store the Results: filename: output.json Note: After running the program, we will obtain the squeezed text. We can copy and paste it in a text file.( I have saved as Output_Web_Scrapping.txt) It will take som…

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages