-
Notifications
You must be signed in to change notification settings - Fork 0
ajaygithub2/yellow-pages-scraper
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
Through this script you will be able to scrape Name, Contact, Address and Link to the yellowpages page.
When the script starts, it will ask for 5 inputs:
Looking for : [Enter what you are looking for, like : doctors, dermatologist, bike repair, insurance, lawyer, accountant etc.]
City : [City you want data for]
State : [Please enter only in abbreviation like TX or ME or CA etc.]
No of pages to scrape : [Per page includes approx 30 records so if you enter 3 you will get around 90 records]
Data Format : [Enter 1 for csv, 2 for xlsx, 3 for pdf]
Note: CSV format is recommended.
Note: If you get less records, that means that more records aren't available on the site.
Install these libraries:
1. Pandas : pip install pandas OR
conda install pandas
2. Pdfkit : pip install pdfkit OR
conda install -c conda-forge python-pdfkit
3. BeautifulSoup : pip install bs4 OR
conda install -c conda-forge bs4
4. Requests : python -m pip install requests OR
conda install -c anaconda requests
Note: If you want your output in a pdf file, you will have to install an additional library called 'wkhtmltopdf',
if not, then there is no need to install it.
Run the script and enjoy. Thankyou.
Yellow_pages_2.0.py :
This is an updated version of 'yellowpages.py'.
Updates:
1. Scrapes emails and websites
2. Shows status of pages scraped and listings scraped in those pages.
3. Doesn't skip a page if a listing has missing info. (Only skips the listing)
These updates involves reading one webpage per listing which makes it slower then previous version.
So if you don't need these 2 , I recommend using the previous version.
Thankyou.
About
There is a script for scraping yellowpages.com website for name, contact, address and link
Topics
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published