Pipelines & NLP for identification of trending topics in crypto, by ecosystem, over the last 24-48 hours.
- clone repo and open
trending-topics.Rproj - You'll need a twitter api key
twitter-secret.txtand chatGPT api keychatgpt-secret.txtthese are gitignored and placed in thetrending_topics/directory where the scheduled Rmarkdown lives (update_topics_pipeline.Rmd). - You'll also need
snowflake-details.jsonfor the submitSnowflake function in the form:
{ "driver": "YOUR-LOCAL-SNOWFLAKE-DRIVER-HERE",
"server_url": "YOUR-URL-HERE.snowflakecomputing.com",
"username": "PUT-USERNAME-HERE",
"password": "PUT-PASSWORD-HERE",
"role": "INTERNAL_DEV",
"warehouse": "DATA_SCIENCE",
"database": ""
}- The
reprex-for-pulltweets-ai-summarize.Rcontains the internal functions & example accounts for the broader pipeline. It sources fromtrending_topics/source_funcs_and_secrets.Rwhich will load the required libraries, functions, and secrets to run the pipeline.
Full pipeline diagram included as an image + pdf.
- Pulls
target_twitter_accountsand callspull_account_tweets. - Dump dataframe form in
raw_tweet_dump, calladd_new_tweets()proc to clean & append toprocessed_tweets. - Ingest unused tweets to call
chatgpt_id_topicat the day-ecosystem level and get thesubjectsandsummaries. - Update used tweets to
used_in_summary = TRUEand insert intoai_summary.
Website (in-dev) offers a UI over the summaries that uses term frequency to link back to relevant tweets at the day-ecosystem level.
