Skip to content

vemshari27/alphastream_py

Repository files navigation

AlphaStream — alphastream_py

AlphaStream fuses social media signals, news, and market trends to surface the hottest tradeable stock and crypto ideas. This repo is the backend intelligence layer — no UI. It ingests raw data, scores it, ranks symbols, and emits CSVs that a future web/app layer will consume.


Demo (Notebook Output)

Running notebooks/stocks.ipynb aggregates Reddit signals for stock tickers and outputs ranked symbols with scores, mention counts, and supporting post context.

AlphaStream Stocks Ranking

Sample Rankings

Symbol Rank Score Mentions Top Post
HERE 25.59 5 US imposing 25% tariffs on steel, aluminum, and copper derivatives...
NDAQ 18.33 4 Michael Burry Flags Structural Manipulation Risk In Nasdaq Rules...
CRCW 15.63 3 Trump's Stone Age Rhetoric Triggers $440M Crypto Wipeout...
QQQ 5.13 1 The Nasdaq is being taken over. SpaceX is IPOing...
TSLA 5.13 1 The Nasdaq is being taken over. SpaceX is IPOing...

Architecture

AlphaStream  (alphastream_py/main.py)
└── SMTracker  (alphastream_py/sm_tracker/main.py)
    ├── IngestionPipeline  →  Scraper → DataProcessor → DataStorage
    └── Engine v0          →  SymbolExtractor → DataAnalyzer → Ranker

Engine v0 Scoring

post_weight  = log(1 + reddit_score) × 0.5^(age_days / half_life_days)
rank_score   = Σ post_weight  for all posts mentioning a symbol

Symbols are extracted from post text using three strategies: $TICKER, (TICKER), and standalone uppercase words matched against a 10,000+ ticker universe with company-name aliases. Common noise words (US, AI, BUY, etc.) are blocklisted.


Project Structure

alphastream-py/
├── alphastream_py/                   # Main package
│   ├── __init__.py                   # Exports: Config, AlphaStream
│   ├── main.py                       # AlphaStream — top-level entry point
│   │
│   ├── config/
│   │   ├── config.py                 # Config dataclasses (Global, SMTracker, Ingestion, Engine, Portfolio)
│   │   ├── config.yaml               # Default configuration
│   │   ├── credentials.py            # Credentials loader (.env / secrets.yaml)
│   │   ├── schemas.py                # Validation schemas
│   │   └── setup.py                  # init_credentials / check_credentials helpers
│   │
│   └── sm_tracker/                   # Social-media tracking subsystem
│       ├── main.py                   # SMTracker orchestrator
│       │
│       ├── ingestion/
│       │   ├── ingestion.py          # IngestionPipeline
│       │   ├── scraper.py            # RedditScraper (live), TwitterScraper (stub), NewsScraper (stub)
│       │   ├── data_processor.py     # Normalizes raw posts → standard DataFrame
│       │   ├── data_storage.py       # Saves processed CSVs
│       │   └── sources/              # Reserved for future per-source modules
│       │
│       ├── engine/
│       │   ├── engine.py             # Engine v0 — main ranking orchestrator
│       │   ├── analyzer.py           # DataAnalyzer — upvote + recency weighting
│       │   ├── ranker.py             # Ranker — aggregates weights, returns top-K
│       │   ├── symbols.py            # SymbolExtractor — ticker detection from text
│       │   └── models/               # Future ML models (stubs)
│       │       ├── base_model.py
│       │       ├── stock_model.py
│       │       ├── option_model.py
│       │       └── crypto_model.py
│       │
│       └── utils/
│           ├── logger.py
│           ├── data_utils.py
│           └── validators.py
│
├── notebooks/
│   ├── stocks.ipynb                  # Interactive pipeline runner
│   └── data/                         # Notebook-local data outputs
│
├── data/
│   ├── raw/sm/reddit/                # Cached Reddit pickle files (week_<date>.pkl)
│   ├── processed/                    # Normalized post CSVs
│   ├── outputs/                      # Ranked ticker CSVs
│   └── universe/tickers.csv          # Ticker universe (Symbol + CleanName)
│
├── tests/
├── pyproject.toml
├── setup.py
├── requirements.txt
├── .env / secrets.yaml               # Credentials (not in git)
└── README.md

Installation

# Clone and install in editable mode (use your project conda env)
pip install -e .

# With optional scraping deps
pip install -e ".[scraping]"

Credentials go in .env (copy from .env.example) or secrets.yaml (copy from secrets.example.yaml).


Usage

Notebook (recommended)

Open notebooks/stocks.ipynb and run all cells. It runs the full pipeline and renders a ranked table.

Python API

from alphastream_py import AlphaStream

app = AlphaStream("alphastream_py/config/config.yaml")

# Full pipeline: ingest → rank → save CSV
results = app.run_full_pipeline()

# Ingestion only
data, stats = app.run_ingestion_only()

# Ranking only (from an existing processed CSV)
results = app.run_engine_only(input_csv="data/processed/all_20260517.csv", top_k=10)

CLI

python -m alphastream_py.sm_tracker.main --stage full
python -m alphastream_py.sm_tracker.main --stage ingestion
python -m alphastream_py.sm_tracker.main --stage engine --input data/processed/all_20260517.csv --top-k 10

Configuration

Key settings in alphastream_py/config/config.yaml:

sm_tracker:
  ingestion:
    reddit_enabled: true
    reddit_subreddits: [stocks, investing, wallstreetbets, cryptocurrency]
    reddit_post_limit: 10
    reddit_use_cached: true      # Use local pickle cache if fresh
    lookback_period: 10          # Days of history

  engine:
    half_life_days: 3.0          # Recency decay — lower = fresher posts weighted more
    top_k: 10                    # Symbols returned
    top_posts_per_symbol: 3      # Supporting posts shown per symbol
    universe_file: data/universe/tickers.csv
    processed_dir: data/processed
    output_dir: data/outputs

Data Sources

Source Status Notes
Reddit (PRAW) Live Scrapes top posts per subreddit; weekly cache
Twitter/X Stub Tweepy wired; needs credentials
News Stub Structure exists; not yet implemented
Market data Stub Structure exists; not yet implemented

Testing

pytest tests/ -v
pytest tests/ --cov=alphastream_py --cov-report=term-missing

Roadmap

Next (Engine v1)

  • Sentiment scoring with VADER or FinBERT
  • News ingestion (NewsAPI / BeautifulSoup)
  • Market data via yfinance (volume spikes as signal)
  • Backtesting harness

Pre-launch

  • FastAPI REST layer
  • Scheduled runs (APScheduler)
  • PostgreSQL/MongoDB storage
  • Web dashboard (Streamlit or React)

About

AlphaStream: A God’s‑eye view that fuses social media, news, and time‑series trends to give you the hottest tradeable stock and crypto ideas.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors