This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
This is a Python-based financial data extraction and analysis toolkit that scrapes real-time stock data from multiple sources (Yahoo Finance, NASDAQ, MarketWatch, BigCharts, Alpaca) and performs ML/NLP sentiment analysis on financial news articles.
Python Environment:
- Requires Python 3.12+
- Uses virtual environment at
/home/dbrace/venv/dave02/ - Always activate the virtual environment before running code:
source /home/dbrace/venv/dave02/bin/activateDependencies: Install all required packages:
pip install -r requirements.txty_*.py- Yahoo Finance data extractorsnasdaq_*.py- NASDAQ.com data extractorsml_*.py- Machine learning and NLP modules*_md.py- Market data extractors for other sources
- Web Scrapers: Extract live data using BeautifulSoup4 and requests_html
- API Integrations: Native APIs for NASDAQ and Alpaca
- Data Processing: Convert to Pandas DataFrames, NumPy arrays, Python dicts
- ML/NLP Pipeline: Sentiment analysis using scikit-learn, NLTK, transformers
- Graph Database: Neo4j integration for relationship mapping
- Yahoo Finance: Top gainers/losers, small caps, tech events, news feeds
- NASDAQ: Real-time quotes, unusual volume detection
- MarketWatch/BigCharts: Live quotes, company details
- Alpaca: Live quotes, 60-second OHLCV candlestick data
- ml_sentiment.py: Core sentiment analysis engine
- ml_cvbow.py: Count vectorizer and bag-of-words processing
- ml_nlpreader.py: Natural language processing pipeline
- ml_urlhinter.py: URL classification and credibility scoring
- ml_yahoofinews.py: Yahoo Finance news processing
Execute modules directly:
python nasdaq_quotes.py
python y_topgainers.py
python ml_sentiment.py- Data Schema Volatility: Financial websites frequently update their data schemas, breaking scrapers
- Rate Limiting: Web scrapers may encounter rate limits or anti-bot measures
- API Keys: Alpaca integration requires API credentials
- Real-time Data: Most extractors work with live market data during trading hours
- Cookie Management: Yahoo Finance modules use
y_cookiemonster.pyfor session management
- Add neo4j backend server to the stock pricing system