Skip to content

AI-enhanced Reddit monitoring tool that finds semantically relevant posts across Reddit using natural language similarity.

License

Notifications You must be signed in to change notification settings

omarsrepo/AI_reddit_scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AI powered Reddit Post Discovery Tool

A powerful, AI-enhanced Reddit monitoring tool that finds semantically relevant posts across Reddit using natural language similarity. This tool uses semantic similarity (via NLP embeddings) to discover posts that are meaningfully relevant. Designed for marketers, researchers, community managers, and developers who want to track conversations that matter.

Prototype Search Results

Features

  • Searches Reddit for posts from the past 7 days (or configurable window)
  • Accepts custom keyword lists — track any product, topic, brand, or phrase
  • Uses all-MiniLM-L6-v2 Huggingface transformer to perform cosine similarity between posts content and list of keywords to detect relevance
  • Fast filtering using precomputed embeddings
  • Uses bart-large-mnli Huggingface transformer for zero-shot classification of context assesment of posts (is the post a complaint, question, feedback etc)
  • Outputs relevant posts with metadata: title, content preview, date, subreddit, link
  • Uses Ollama and llama3.2 to create a custom LLM through the Modelfile (provided in the project structure) for custom response generation to the posts

How It Works

  1. Input: You provide a list of keywords or phrases you'd like to monitor.
  2. Embedding: The tool uses a sentence-transformers model (all-MiniLM-L6-v2) to compute sentence embeddings for both:
    • Your keyword list
    • Each Reddit post's title + body
  3. Semantic Filtering:
    • Calculates cosine similarity between post embeddings and keyword embeddings
    • Only keeps posts with a similarity above a threshold (default: 0.6)
  4. Output: Shows posts in the terminal with key info and a snippet of content.

Project Structure

Reddit_discovery_tool/
├── reddit_scraper.py
├── main.py 
├── requirements.txt 
├── Modelfile (For creating custom models. More info here: https://ollama.readthedocs.io/en/modelfile/ )
└── README.md 

Installation and Usage

  1. Clone this repository onto your machine and install all the requirements into ur virtual environment.
git clone https://github.com/your-username/reddit-discovery-tool.git
cd reddit-discovery-tool
pip install -r requirements.txt
  1. Create a .env file and add your Reddit credentials (can be obtained by creating an app here: https://www.reddit.com/prefs/apps):
REDDIT_CLIENT_ID=your_id
REDDIT_CLIENT_SECRET=your_secret
REDDIT_USER_AGENT=your_agent_string
  1. Install Ollama and any ollama model you want (Im using llama3.2 for this example)
pip install ollama
ollama serve
ollama list # Lists the currently installed models
ollama install llama3.2
  1. In the Modelfile, edit the "system prompt" and "from" parameter to your liking
# Modelfile
FROM llama3.2 (Change this to whatever model you wish to use)

# System Prompt (Edit this to suit your personal usecase)
SYSTEM """
  You're a helpful travel advisor on Reddit. Provide honest, casual, and non-salesy suggestions. Write a helpful, human-sounding Reddit comment that:
    - Directly answers the user's question or concern
    - Is friendly and useful 
    - Naturally mention the sim card provider 'BNESIM' if it fits the situation, but without sounding like an ad
    - Uses a casual Redditor tone
    - Do NOT sound robotic or overly polished
    - Write as if you’re replying in the comments thread.
    """

PARAMETER temperature 0.7
PARAMETER top_p 0.95
PARAMETER num_predict 200
  1. Create a custom model for use
ollama create choose-a-model-name -f <location of the file e.g. ./Modelfile>'
  1. Open up the main.py file. Modify the list of keywords for the kind of posts you are looking for and set the DAYS_BACK integer to set the range of days u want to search in (default is last 7 days).
# Set this to the number of days you want to look back for posts (default is as receent as 7 days)
DAYS_BACK = 7 
# Set this list of keywords to search for in Reddit posts
KEYWORDS = [
    "esim", "international roaming", "travel connectivity", 
    "bnesim", "airalo", "gigsky", "holafly", "ubigi", "flexiroam",
    "travel SIM recommendations", "roaming charges", "connectivity issue", 
    "sim not working", "bad mobile service abroad"
]
  1. Finally, run the main.py file and watch the results!
  2. Optional: Comment out the following section in my main.py file if you do not want to generate responses for the posts and only want to search for them.
response = ollama.chat(model='redditor', messages=[
         {
            'role': 'user',
            'content': post["full_text"]
         },
         ])
post["response"] = textwrap.fill(response.message.content, width=150)
print(f"Possible Response: {post['response']}\n")
#

License

MIT

About

AI-enhanced Reddit monitoring tool that finds semantically relevant posts across Reddit using natural language similarity.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages