AlphaStream fuses social media signals, news, and market trends to surface the hottest tradeable stock and crypto ideas. This repo is the backend intelligence layer — no UI. It ingests raw data, scores it, ranks symbols, and emits CSVs that a future web/app layer will consume.
Running notebooks/stocks.ipynb aggregates Reddit signals for stock tickers and outputs ranked symbols with scores, mention counts, and supporting post context.
| Symbol | Rank Score | Mentions | Top Post |
|---|---|---|---|
| HERE | 25.59 | 5 | US imposing 25% tariffs on steel, aluminum, and copper derivatives... |
| NDAQ | 18.33 | 4 | Michael Burry Flags Structural Manipulation Risk In Nasdaq Rules... |
| CRCW | 15.63 | 3 | Trump's Stone Age Rhetoric Triggers $440M Crypto Wipeout... |
| QQQ | 5.13 | 1 | The Nasdaq is being taken over. SpaceX is IPOing... |
| TSLA | 5.13 | 1 | The Nasdaq is being taken over. SpaceX is IPOing... |
AlphaStream (alphastream_py/main.py)
└── SMTracker (alphastream_py/sm_tracker/main.py)
├── IngestionPipeline → Scraper → DataProcessor → DataStorage
└── Engine v0 → SymbolExtractor → DataAnalyzer → Ranker
post_weight = log(1 + reddit_score) × 0.5^(age_days / half_life_days)
rank_score = Σ post_weight for all posts mentioning a symbol
Symbols are extracted from post text using three strategies: $TICKER, (TICKER), and standalone uppercase words matched against a 10,000+ ticker universe with company-name aliases. Common noise words (US, AI, BUY, etc.) are blocklisted.
alphastream-py/
├── alphastream_py/ # Main package
│ ├── __init__.py # Exports: Config, AlphaStream
│ ├── main.py # AlphaStream — top-level entry point
│ │
│ ├── config/
│ │ ├── config.py # Config dataclasses (Global, SMTracker, Ingestion, Engine, Portfolio)
│ │ ├── config.yaml # Default configuration
│ │ ├── credentials.py # Credentials loader (.env / secrets.yaml)
│ │ ├── schemas.py # Validation schemas
│ │ └── setup.py # init_credentials / check_credentials helpers
│ │
│ └── sm_tracker/ # Social-media tracking subsystem
│ ├── main.py # SMTracker orchestrator
│ │
│ ├── ingestion/
│ │ ├── ingestion.py # IngestionPipeline
│ │ ├── scraper.py # RedditScraper (live), TwitterScraper (stub), NewsScraper (stub)
│ │ ├── data_processor.py # Normalizes raw posts → standard DataFrame
│ │ ├── data_storage.py # Saves processed CSVs
│ │ └── sources/ # Reserved for future per-source modules
│ │
│ ├── engine/
│ │ ├── engine.py # Engine v0 — main ranking orchestrator
│ │ ├── analyzer.py # DataAnalyzer — upvote + recency weighting
│ │ ├── ranker.py # Ranker — aggregates weights, returns top-K
│ │ ├── symbols.py # SymbolExtractor — ticker detection from text
│ │ └── models/ # Future ML models (stubs)
│ │ ├── base_model.py
│ │ ├── stock_model.py
│ │ ├── option_model.py
│ │ └── crypto_model.py
│ │
│ └── utils/
│ ├── logger.py
│ ├── data_utils.py
│ └── validators.py
│
├── notebooks/
│ ├── stocks.ipynb # Interactive pipeline runner
│ └── data/ # Notebook-local data outputs
│
├── data/
│ ├── raw/sm/reddit/ # Cached Reddit pickle files (week_<date>.pkl)
│ ├── processed/ # Normalized post CSVs
│ ├── outputs/ # Ranked ticker CSVs
│ └── universe/tickers.csv # Ticker universe (Symbol + CleanName)
│
├── tests/
├── pyproject.toml
├── setup.py
├── requirements.txt
├── .env / secrets.yaml # Credentials (not in git)
└── README.md
# Clone and install in editable mode (use your project conda env)
pip install -e .
# With optional scraping deps
pip install -e ".[scraping]"Credentials go in .env (copy from .env.example) or secrets.yaml (copy from secrets.example.yaml).
Open notebooks/stocks.ipynb and run all cells. It runs the full pipeline and renders a ranked table.
from alphastream_py import AlphaStream
app = AlphaStream("alphastream_py/config/config.yaml")
# Full pipeline: ingest → rank → save CSV
results = app.run_full_pipeline()
# Ingestion only
data, stats = app.run_ingestion_only()
# Ranking only (from an existing processed CSV)
results = app.run_engine_only(input_csv="data/processed/all_20260517.csv", top_k=10)python -m alphastream_py.sm_tracker.main --stage full
python -m alphastream_py.sm_tracker.main --stage ingestion
python -m alphastream_py.sm_tracker.main --stage engine --input data/processed/all_20260517.csv --top-k 10Key settings in alphastream_py/config/config.yaml:
sm_tracker:
ingestion:
reddit_enabled: true
reddit_subreddits: [stocks, investing, wallstreetbets, cryptocurrency]
reddit_post_limit: 10
reddit_use_cached: true # Use local pickle cache if fresh
lookback_period: 10 # Days of history
engine:
half_life_days: 3.0 # Recency decay — lower = fresher posts weighted more
top_k: 10 # Symbols returned
top_posts_per_symbol: 3 # Supporting posts shown per symbol
universe_file: data/universe/tickers.csv
processed_dir: data/processed
output_dir: data/outputs| Source | Status | Notes |
|---|---|---|
| Reddit (PRAW) | Live | Scrapes top posts per subreddit; weekly cache |
| Twitter/X | Stub | Tweepy wired; needs credentials |
| News | Stub | Structure exists; not yet implemented |
| Market data | Stub | Structure exists; not yet implemented |
pytest tests/ -v
pytest tests/ --cov=alphastream_py --cov-report=term-missingNext (Engine v1)
- Sentiment scoring with VADER or FinBERT
- News ingestion (NewsAPI / BeautifulSoup)
- Market data via
yfinance(volume spikes as signal) - Backtesting harness
Pre-launch
- FastAPI REST layer
- Scheduled runs (APScheduler)
- PostgreSQL/MongoDB storage
- Web dashboard (Streamlit or React)
