AlphaStream — `alphastream_py`

AlphaStream fuses social media signals, news, and market trends to surface the hottest tradeable stock and crypto ideas. This repo is the backend intelligence layer — no UI. It ingests raw data, scores it, ranks symbols, and emits CSVs that a future web/app layer will consume.

Demo (Notebook Output)

Running notebooks/stocks.ipynb aggregates Reddit signals for stock tickers and outputs ranked symbols with scores, mention counts, and supporting post context.

Sample Rankings

Symbol	Rank Score	Mentions	Top Post
HERE	25.59	5	US imposing 25% tariffs on steel, aluminum, and copper derivatives...
NDAQ	18.33	4	Michael Burry Flags Structural Manipulation Risk In Nasdaq Rules...
CRCW	15.63	3	Trump's Stone Age Rhetoric Triggers $440M Crypto Wipeout...
QQQ	5.13	1	The Nasdaq is being taken over. SpaceX is IPOing...
TSLA	5.13	1	The Nasdaq is being taken over. SpaceX is IPOing...

Architecture

AlphaStream  (alphastream_py/main.py)
└── SMTracker  (alphastream_py/sm_tracker/main.py)
    ├── IngestionPipeline  →  Scraper → DataProcessor → DataStorage
    └── Engine v0          →  SymbolExtractor → DataAnalyzer → Ranker

Engine v0 Scoring

post_weight  = log(1 + reddit_score) × 0.5^(age_days / half_life_days)
rank_score   = Σ post_weight  for all posts mentioning a symbol

Symbols are extracted from post text using three strategies: $TICKER, (TICKER), and standalone uppercase words matched against a 10,000+ ticker universe with company-name aliases. Common noise words (US, AI, BUY, etc.) are blocklisted.

Project Structure

alphastream-py/
├── alphastream_py/                   # Main package
│   ├── __init__.py                   # Exports: Config, AlphaStream
│   ├── main.py                       # AlphaStream — top-level entry point
│   │
│   ├── config/
│   │   ├── config.py                 # Config dataclasses (Global, SMTracker, Ingestion, Engine, Portfolio)
│   │   ├── config.yaml               # Default configuration
│   │   ├── credentials.py            # Credentials loader (.env / secrets.yaml)
│   │   ├── schemas.py                # Validation schemas
│   │   └── setup.py                  # init_credentials / check_credentials helpers
│   │
│   └── sm_tracker/                   # Social-media tracking subsystem
│       ├── main.py                   # SMTracker orchestrator
│       │
│       ├── ingestion/
│       │   ├── ingestion.py          # IngestionPipeline
│       │   ├── scraper.py            # RedditScraper (live), TwitterScraper (stub), NewsScraper (stub)
│       │   ├── data_processor.py     # Normalizes raw posts → standard DataFrame
│       │   ├── data_storage.py       # Saves processed CSVs
│       │   └── sources/              # Reserved for future per-source modules
│       │
│       ├── engine/
│       │   ├── engine.py             # Engine v0 — main ranking orchestrator
│       │   ├── analyzer.py           # DataAnalyzer — upvote + recency weighting
│       │   ├── ranker.py             # Ranker — aggregates weights, returns top-K
│       │   ├── symbols.py            # SymbolExtractor — ticker detection from text
│       │   └── models/               # Future ML models (stubs)
│       │       ├── base_model.py
│       │       ├── stock_model.py
│       │       ├── option_model.py
│       │       └── crypto_model.py
│       │
│       └── utils/
│           ├── logger.py
│           ├── data_utils.py
│           └── validators.py
│
├── notebooks/
│   ├── stocks.ipynb                  # Interactive pipeline runner
│   └── data/                         # Notebook-local data outputs
│
├── data/
│   ├── raw/sm/reddit/                # Cached Reddit pickle files (week_<date>.pkl)
│   ├── processed/                    # Normalized post CSVs
│   ├── outputs/                      # Ranked ticker CSVs
│   └── universe/tickers.csv          # Ticker universe (Symbol + CleanName)
│
├── tests/
├── pyproject.toml
├── setup.py
├── requirements.txt
├── .env / secrets.yaml               # Credentials (not in git)
└── README.md

Installation

# Clone and install in editable mode (use your project conda env)
pip install -e .

# With optional scraping deps
pip install -e ".[scraping]"

Credentials go in .env (copy from .env.example) or secrets.yaml (copy from secrets.example.yaml).

Usage

Notebook (recommended)

Open notebooks/stocks.ipynb and run all cells. It runs the full pipeline and renders a ranked table.

Python API

from alphastream_py import AlphaStream

app = AlphaStream("alphastream_py/config/config.yaml")

# Full pipeline: ingest → rank → save CSV
results = app.run_full_pipeline()

# Ingestion only
data, stats = app.run_ingestion_only()

# Ranking only (from an existing processed CSV)
results = app.run_engine_only(input_csv="data/processed/all_20260517.csv", top_k=10)

CLI

python -m alphastream_py.sm_tracker.main --stage full
python -m alphastream_py.sm_tracker.main --stage ingestion
python -m alphastream_py.sm_tracker.main --stage engine --input data/processed/all_20260517.csv --top-k 10

Configuration

Key settings in alphastream_py/config/config.yaml:

sm_tracker:
  ingestion:
    reddit_enabled: true
    reddit_subreddits: [stocks, investing, wallstreetbets, cryptocurrency]
    reddit_post_limit: 10
    reddit_use_cached: true      # Use local pickle cache if fresh
    lookback_period: 10          # Days of history

  engine:
    half_life_days: 3.0          # Recency decay — lower = fresher posts weighted more
    top_k: 10                    # Symbols returned
    top_posts_per_symbol: 3      # Supporting posts shown per symbol
    universe_file: data/universe/tickers.csv
    processed_dir: data/processed
    output_dir: data/outputs

Data Sources

Source	Status	Notes
Reddit (PRAW)	Live	Scrapes top posts per subreddit; weekly cache
Twitter/X	Stub	Tweepy wired; needs credentials
News	Stub	Structure exists; not yet implemented
Market data	Stub	Structure exists; not yet implemented

Testing

pytest tests/ -v
pytest tests/ --cov=alphastream_py --cov-report=term-missing

Roadmap

Next (Engine v1)

Sentiment scoring with VADER or FinBERT
News ingestion (NewsAPI / BeautifulSoup)
Market data via yfinance (volume spikes as signal)
Backtesting harness

Pre-launch

FastAPI REST layer
Scheduled runs (APScheduler)
PostgreSQL/MongoDB storage
Web dashboard (Streamlit or React)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AlphaStream — `alphastream_py`

Demo (Notebook Output)

Sample Rankings

Architecture

Engine v0 Scoring

Project Structure

Installation

Usage

Notebook (recommended)

Python API

CLI

Configuration

Data Sources

Testing

Roadmap

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
alphastream_py		alphastream_py
arch		arch
data		data
images		images
notebooks		notebooks
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
PROGRESS.md		PROGRESS.md
README.md		README.md
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
secrets.example.yaml		secrets.example.yaml
setup.py		setup.py

Folders and files

Latest commit

History

Repository files navigation

AlphaStream — alphastream_py

Demo (Notebook Output)

Sample Rankings

Architecture

Engine v0 Scoring

Project Structure

Installation

Usage

Notebook (recommended)

Python API

CLI

Configuration

Data Sources

Testing

Roadmap

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

AlphaStream — `alphastream_py`

Packages