This Python script scrapes product information from the Jumia website based on user input. It extracts details such as price, product details, rating, and purchase links for a specified product.
- Python 3.x
- Required Python packages:
requests
,beautifulsoup4
Install the required packages using:
pip install requests beautifulsoup4
# Jumia Product Scraper
## Overview
This Python script scrapes product information from the Jumia website based on user input. It extracts details such as price, product details, rating, and purchase links for a specified product.
## Prerequisites
- Python 3.x
- Required Python packages: `requests`, `beautifulsoup4`
Install the required packages using:
```bash
pip install requests beautifulsoup4
-
Run the script:
python jumia_scraper.py
-
Enter the name of the product and the desired page limit when prompted.
-
The script will fetch information from Jumia for each page within the specified limit.
-
The extracted details will be displayed, including product details, price, rating, and purchase link.
Ensure compliance with Jumia's terms of service and policies. Web scraping may be subject to legal and ethical considerations.
Feel free to customize the script to meet your specific needs. Possible improvements include adding error handling, adjusting the time delay, or enhancing the user interface.
- This script was created for educational purposes and should be used responsibly.
- Special thanks to Jumia for providing product information.
This Python script uses the `requests` and `BeautifulSoup` libraries to scrape product information from the Jumia website. It takes two user inputs: the name of the product to search for and the number of pages of results to scrape.
## Step-by-Step Explanation
### 1. Import Necessary Libraries
```python
import requests
from bs4 import BeautifulSoup
import time
The script prompts the user to enter the name of the product they want to search for and the number of pages of results they want to scrape.
product = input('Enter the name of the product: ')
page_limit = int(input('Enter the page limit: '))
The script sets the User-Agent
header to mimic a web browser, which is necessary for making requests to the Jumia website.
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
}
The productsResearch
function takes two arguments: the product name and the page number. It constructs the URL for the Jumia product search page, including the product name and page number.
def productsResearch(product, page):
url = f'https://www.jumia.com.ng/catalog/?q={product}&page={page}#catalog-listing'
The function makes a GET request to the URL using the requests
library. If the request is successful, it uses BeautifulSoup
to parse the HTML response.
try:
jumia_url = requests.get(url, headers=headers)
jumia_url.raise_for_status() # Check for HTTP errors
soup = BeautifulSoup(jumia_url.text, 'lxml')
The function then extracts the product information from the parsed HTML. It finds all the product articles on the page