Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
The diff you're trying to view is too large. We only load the first 3000 changed files.
117 changes: 117 additions & 0 deletions .github/copilot-instructions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
# Qlib AI 编程助手指南

## 项目概述
Qlib 是微软面向AI的量化投资平台,支持基于机器学习的交易策略研究。该平台提供从数据处理到模型训练、回测和部署的完整ML流水线。

## 核心架构

### 数据层 (`qlib/data/`)
- **提供者**: LocalProvider、ClientProvider 处理数据访问模式
- **数据集**: Alpha158(158个技术指标)、Alpha360(360个特征)是主要数据集
- **表达式**: 使用金融领域专用语言,如 `$close`、`Ref($close, 1)`、`Mean($close, 5)`
- **处理器**: `DataHandlerLP` 使用可配置的处理器处理原始数据(标准化、过滤)

### 模型层 (`qlib/model/`, `qlib/contrib/model/`)
- 模型继承自 `Model` 基类,具有 `.fit()` 和 `.predict()` 方法
- 贡献模型包括 LightGBM、神经网络(LSTM、GRU、Transformer变体)
- 模型配置指定 `class`、`module_path` 和初始化的 `kwargs`

### 工作流层 (`qlib/workflow/`)
- **记录器 (R)**: 使用MLflow后端的全局实验跟踪系统
- **QlibRecorder**: 通过 `R.start()`、`R.log_params()`、`R.log_metrics()` 管理实验
- **配置驱动工作流**: YAML配置定义整个ML流水线

### 策略与执行 (`qlib/strategy/`, `qlib/backtest/`)
- **TopkDropoutStrategy**: 选择前k只股票,丢弃后n只以减少换手
- **BacktestTracker**: 包含交易成本、滑点的投资组合模拟
- **NestedExecutor**: 多层策略优化(投资组合+订单执行)

## 开发工作流程

### 运行模型
```bash
# 单一模型执行
cd examples && qrun benchmarks/LightGBM/workflow_config_lightgbm_Alpha158.yaml

# 调试模式
python -m pdb qlib/cli/run.py examples/benchmarks/LightGBM/workflow_config_lightgbm_Alpha158.yaml

# 批量模型比较
python examples/run_all_model.py run 3 lightgbm Alpha158
```

### 测试模式
```bash
# 快速测试(排除慢速标记)
cd tests && python -m pytest . -m "not slow"

# 特定测试类别
python -m pytest tests/model/ -v
python -m pytest tests/data/ -k "test_alpha"
```

### 数据管理
```bash
# 下载公开数据
python scripts/get_data.py qlib_data --target_dir ~/.qlib/qlib_data/cn_data --region cn

# 健康检查
python scripts/check_data_health.py check_data --qlib_dir ~/.qlib/qlib_data/cn_data
```

## 关键约定

### 配置模式
- **嵌套配置**: 使用 `<MODEL>`, `<DATASET>` 占位符进行交叉引用
- **市场区域**: `REG_CN`(中国)、`REG_US`(美国)影响数据路径和交易规则
- **时间分段**: `segments: {train: [start, end], valid: [...], test: [...]}`

### 数据处理器处理
```python
# 学习vs推理的默认处理器
_DEFAULT_LEARN_PROCESSORS = [
{"class": "DropnaLabel"},
{"class": "CSZScoreNorm", "kwargs": {"fields_group": "label"}},
]
_DEFAULT_INFER_PROCESSORS = [
{"class": "ProcessInf", "kwargs": {}},
{"class": "ZScoreNorm", "kwargs": {}},
{"class": "Fillna", "kwargs": {}},
]
```

### 表达式语法
- `$close`, `$volume`, `$high`, `$low` 用于OHLCV数据
- `Ref($close, 1)` 用于回望(昨日收盘价)
- `Mean($close, 5)` 用于滚动窗口
- `Greater($close, Ref($close, 1))` 用于条件判断

## 开发指南

### 添加新模型
1. 在 `examples/benchmarks/ModelName/` 中创建文件夹
2. 包含 `requirements.txt`、`README.md`、`workflow_config_modelname_Alpha158.yaml`
3. 在 `qlib/contrib/model/` 中按照现有模式实现
4. 模型类需要 `.fit(dataset)` 和 `.predict(dataset)` 方法

### 内存与性能
- 使用 `NUM_USABLE_CPU = max(multiprocessing.cpu_count() - 2, 1)` 进行并行处理
- 缓存设置:`expression_cache`、`dataset_cache` 用于性能优化
- 高频数据需要 `"maxtasksperchild": 1` 以避免内存泄漏

### 错误处理
- 使用 `qlib.init(provider_uri="~/.qlib/qlib_data/cn_data", region="cn")` 初始化Qlib
- 实验前检查数据健康状况
- 使用 `R.start()` 上下文管理器正确处理实验生命周期

### 测试集成
- 模型应同时支持Alpha158和Alpha360数据集
- 在适用时验证多个市场区域(cn/us)
- 用不同时间段测试以确保时间稳健性

## 常见问题
- 运行 `qrun` 前务必 `cd examples` 以避免导入冲突
- macOS上的LightGBM需要 `brew install libomp`
- Windows/macOS使用不同的多进程方法 - 检查平台兼容性
- 表达式缓存依赖Redis进行分布式设置
- Pandas版本兼容性:在groupby操作中设置 `group_key=False`
17 changes: 17 additions & 0 deletions TODO_OPT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Optimization TODO List

## 1. Dynamic Shuffling for MPS Compatibility
**Affected Models:**
- `GeneralPTNN` (`qlib/contrib/model/pytorch_general_nn.py`)
- `ADARNN` (`qlib/contrib/model/pytorch_adarnn.py`)
- Any other models where `DataLoader(shuffle=True)` was disabled to fix MPS crashes.

**Issue:**
To resolve segmentation faults on macOS (MPS backend) caused by `DataLoader` shuffling non-contiguous memory, we disabled `shuffle=True` and implemented a one-time manual shuffle before the training loop. This means the training data order is fixed across all epochs, which is suboptimal compared to per-epoch shuffling.

**Optimization Goal:**
Restore the behavior of shuffling data at the beginning of **every epoch** to ensure better model convergence.

**Proposed Solution:**
- Modify the `fit` loop to re-shuffle indices and re-create the `DataLoader` (or use a custom `Sampler`) at the start of each epoch.
- Ensure that the shuffling mechanism respects the C-contiguous memory requirement for MPS.
72 changes: 72 additions & 0 deletions examples/benchmarks/ADARNN/inspect_model.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
import sys
import pickle
import torch
import os

def inspect_params(file_path):
if not os.path.exists(file_path):
print(f"Error: File not found: {file_path}")
return

print(f"Loading {file_path}...")
try:
with open(file_path, "rb") as f:
obj = pickle.load(f)
except Exception as e:
print(f"Pickle load failed: {e}")
try:
import dill
print("Trying dill...")
with open(file_path, "rb") as f:
obj = dill.load(f)
except ImportError:
print("Dill not installed, and pickle failed.")
return
except Exception as e:
print(f"Dill load failed: {e}")
return

print(f"\nType: {type(obj)}")

# Print Qlib Model Hyperparameters
print("\n=== Model Hyperparameters ===")
if hasattr(obj, "__dict__"):
for k, v in obj.__dict__.items():
# Skip private attributes and large objects
if k.startswith("_"):
continue
if isinstance(v, (int, float, str, bool)):
print(f"{k}: {v}")
elif isinstance(v, (list, tuple)) and len(v) < 20:
print(f"{k}: {v}")
elif v is None:
print(f"{k}: None")
else:
# For other objects, just print type to avoid clutter
if k not in ["model", "net"]:
print(f"{k}: <{type(v).__name__}>")

# Inspect PyTorch Structure
print("\n=== PyTorch Architecture ===")
torch_model = None
if hasattr(obj, "model") and isinstance(obj.model, torch.nn.Module):
torch_model = obj.model
elif hasattr(obj, "net") and isinstance(obj.net, torch.nn.Module):
torch_model = obj.net

if torch_model:
print(torch_model)

# Print parameter count
total_params = sum(p.numel() for p in torch_model.parameters())
trainable_params = sum(p.numel() for p in torch_model.parameters() if p.requires_grad)
print(f"\nTotal Parameters: {total_params:,}")
print(f"Trainable Parameters: {trainable_params:,}")
else:
print("No PyTorch model found in 'model' or 'net' attributes.")

if __name__ == "__main__":
if len(sys.argv) < 2:
print("Usage: python inspect_model.py <path_to_params.pkl>")
else:
inspect_params(sys.argv[1])
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ task:
dropout: 0.0
n_epochs: 200
lr: 1e-3
early_stop: 10
early_stop: 20
batch_size: 800
metric: loss
loss: mse
Expand Down
Loading
Loading