Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to skip empty page #50

Open
selihadji opened this issue Jan 2, 2024 · 2 comments
Open

How to skip empty page #50

selihadji opened this issue Jan 2, 2024 · 2 comments

Comments

@selihadji
Copy link

selihadji commented Jan 2, 2024

Hello, thank you very much for sharing this nice tool.
However, I have an issue. I use Jupiter Notebook with the default settings (e.g., search a term using random instances). After a while, I get, e.g. 02-Jan-24 12:40:55 - Empty page on https://nitter.soopy.moe/

Since I would like to fetch a large number of tweets, I would like to skip this and restart from where it stopped. Is that possible?

@AritzUMA
Copy link

You can create a loop code using python to check if there are tweets in the variable where you insert the tweets and if the lenght is 0 start again.

@BradKML
Copy link

BradKML commented Jan 24, 2024

My current hack (also related to #53)

!pip install ntscraper tqdm
from ntscraper import Nitter
from datetime import datetime, timedelta, date

scraper = Nitter(log_level=1, skip_instance_check=False)
user = "visakanv" # replace with the person you want to scrape
cache = []
date = date.today() # defaults to the day of query, please note for debugging
while True:
    dl_tweets = scraper.get_tweets(user, mode='user', until=str(date))['tweets']
    if len(dl_tweets) == 0: break # break if end of history is reached
    else: cache.extend(dl_tweets) # cache list concatenation
    date_str = dl_tweets[-1]['date'][:12] # get the date string to convert, forget the time and timezones
    date = datetime.strptime(date_str, '%b %d, %Y').date() + timedelta(days=1) # Get Twitter date format

Would convert the output into a DataFrame if I have the time but atm it seems to work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants