Skip to content

YRG999/Scraper

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

95 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Scraper fork

How to Scrape Tweets From Twitter (updated)

Setup

For initial setup, create a virtual environment & run it:

$ python3 -m venv venv
$ . venv/bin/activate

Then install requirements:

$ pip install -r requirements.txt

Thereafter, just run the second venv line to activate the virtual environment.

Run

The first Scraper used snscrape, which can no longer access the Twitter API.

Scraper2 Uses Tweepy to access the Twitter API.

NOTE: Did not test as Twitter API is $100/mo.

Type python and the file name.

$ python Scraper2.py
Archived

NOTE: Reddit Scraper Broken as of 12/14/22.

Scraper uses snscrape to scrape twitter and reddit posts. Thank you to Martin Beck's How to Scrape Tweets With snscrape write-up, which got me started. See his files under /TwitterScraper. The files in that directory are not necessary to run Scraper.py

To run

  • Type python3 Scraper.py
  • Choose:
    1. To search on Twitter. This accepts the same advanced search operators as the Twitter search box.
    2. To search on Reddit.
    3. To search for a Subreddit with the term you entered. Results should show posts from that subreddit if it exists. This results aren't complete. It doesn't show the post, but it does show the URL.
  • Type the maximum number of results to receive.
  • Type a search term or terms.
  • Type a filename prefix (random numbers and the count will be appended to this name).
  • Output is a .csv file with the full name shown in the console.

To do

  • make first choice a function
  • make if statements into functions
  • make it so that you can go back and make a different choice
  • twitter from:username search
  • twitter from:username since: until: options
  • twitter search -- choose from straight up search to username, since, until

Helpful links

  • Summary:
  • author:name
  • flair:flairname
  • Show text posts only self:true
  • The body of the post: selftext:term
  • The domain of the submitted URL: site:domain
  • The submission's subreddit: subreddit:name
  • The submission title: title:term
  • The submission's URL (the website's address): url:address
  • Combined search: author:name subreddit:name searchterm

About

Building off TwitterScraper.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 80.5%
  • Python 19.5%