ReviewRadar is a web application that classifies reviews as real or fake based on a user-defined strictness level. It leverages crawl4ai for scraping reviews, utilizes an OpenAI LLM extraction strategy for JSON formatting, and employs a pretrained SVC classifier with 88% accuracy for classification.
Users can input URLs to review pages or individual reviews for immediate classification. Additionally, there's an open API to integrate the fake review detection service into another application.
- Key Differences Between Fake and Real Reviews
- Features
- API Endpoint
- Model Approach
- How It Works
- Usage Example
- Installation & Setup
- Contributing
- License
- Contact
- Overly Positive Language (Fake): Exaggerated praise with phrases like "Love this!", "Amazing!", and "10 Stars", often without specifics.
- Repetitive Phrasing (Fake): Common phrases like "I love" and "the only problem is" repeated across reviews.
- Personal Pronouns (Fake): Frequent use of "I", "we", or "my" to create a forced personal connection.
- Contradictory/Incomplete Sentences (Fake): Some reviews seem cut off or make contradictory statements.
- Time Mentions (Fake): Fake reviews often mention usage duration to imply long-term experience.
- Specific & Critical Feedback (Real): Real reviews focus on specific product features, highlighting both pros and cons.
- Practicality Focus (Real): Emphasis on product functionality and day-to-day use.
- Balanced Opinions (Real): Mix of positive and negative points.
- Short & Objective (Real): Concise and direct feedback, avoiding unnecessary embellishment.
- Review Page Analysis: Submit URLs containing reviews to be scraped and classified as real or fake based on selected strictness levels.
- Individual Review Testing: Input single reviews and receive immediate authenticity feedback.
- Data Visualization: View insightful visualizations like pie charts and histograms to analyze the distribution of real and fake reviews.
- Open API: Access an API endpoint to integrate fake review detection into your own applications.
POST /api/openapi-verify-review
Developers can verify reviews through a simple JSON interface. The input includes the review text and a threshold level (high
, medium
, or low
), and the output is a classification of the review as real or fake.
{
"review": "Great product, highly recommend!",
"threshold": "high"
}
{
"analyzed_reviews": {
"review_text": "Great product, highly recommend!",
"is_fake": false,
"confidence": 0.92
}
}
Objective: Classify reviews as fake or genuine using text-based features.
- Support Vector Classifier (SVC): Chosen for its effectiveness in text classification tasks.
-
Data Preparation:
- Loaded a dataset of reviews and preprocessed the text (removing punctuation, filtering stop words).
- Split the dataset into training and testing sets.
-
Text Processing:
- Prepared reviews for vectorization using a text processing function.
- Applied CountVectorizer to convert the text into a bag-of-words model.
-
Model Training:
- Created a pipeline combining:
- CountVectorizer: Converts text into numerical vectors.
- TF-IDF Transformer: Scales vectors by term frequency-inverse document frequency.
- SVC Classifier: Trained on the vectorized data.
- Trained the model on the preprocessed data.
- Created a pipeline combining:
-
Model Saving:
- Saved the trained model using joblib for later predictions.
- Achieves 88% accuracy in classifying reviews as real or fake.
- Allows users to analyze new reviews by loading the model for inference.
- User Submission: Submit a URL containing reviews or an individual review.
- Scraping and Parsing: Scrapes reviews from the URL using crawl4ai and processes the content through an OpenAI LLM to generate structured JSON data.
- Review Classification: Passes the structured data through the pretrained SVC classifier to identify fake reviews.
- Visualization: Displays the results with visual insights through pie charts and histograms.
Submit a link to a review page to receive a visual breakdown of real vs. fake reviews in pie chart and histogram formats based on the chosen strictness level (high
, medium
, low
). Alternatively, manually input a review to check its authenticity.
To reproduce ReviewRadar on your local system, follow the steps below for both the frontend and backend setups.
-
Clone the Repository:
git clone https://github.com/muzzlol/review-radar.git
-
Navigate to the Frontend Directory:
cd review-radar/frontend
-
Install Dependencies:
npm install
-
Start the Frontend Application:
npm start
The application will run on
http://localhost:3000
by default.
-
Navigate to the Backend Directory:
cd ../backend
-
Set Up Environment Variables:
- Create a
.env
file in thebackend
directory based on the.env.example
provided. - Add necessary environment variables, such as your OpenAI API key.
OPENAI_API_KEY=your_openai_api_key
- Create a
-
Choose Your Installation Method:
- Using
pip
:-
Create a Virtual Environment:
python3.12 -m venv venv
-
Activate the Virtual Environment:
- On macOS/Linux:
source venv/bin/activate
- On Windows:
venv\Scripts\activate
- On macOS/Linux:
-
Install Dependencies:
pip install --upgrade pip pip install -r requirements.txt
-
- Using
Poetry
:-
Install Poetry (if not already installed):
curl -sSL https://install.python-poetry.org | python3 -
-
Set Python Version:
poetry env use python3.12
-
Check Python Version & Environment Path:
poetry env info
Take note of the
Path
field in the output - you can use this path as the Python interpreter in VS Code or your preferred editor's settings. -
Activate the Virtual Environment:
poetry shell
-
Install Dependencies:
poetry install
-
- Using
-
Run the Backend Server:
poetry run uvicorn main:app --reload
The backend server will run on
http://localhost:8000
by default.
Contributions are welcome! Please open an issue or submit a pull request for any changes or improvements. Future work includes upgrading the model to a deep learning approach to better capture the sequential relationships between words and their indexing in the overall review text.
MIT License. See LICENSE
for more information.