diff --git a/docs/driving_reward_analysis/README.md b/docs/driving_reward_analysis/README.md new file mode 100644 index 0000000000..421c9182a2 --- /dev/null +++ b/docs/driving_reward_analysis/README.md @@ -0,0 +1,101 @@ +# 自动驾驶奖励函数分析系统 + +## 1. 项目简介 +本项目实现了一套完整的自动驾驶奖励函数建模与可视化分析系统。通过数学建模方式定义速度跟踪、距离控制、舒适性三类核心奖励函数,支持参数化配置、批量仿真与可视化输出,无需任何模拟器即可独立运行。 + +系统可广泛应用于: +- 强化学习自动驾驶算法的奖励设计 +- 自适应巡航控制(ACC)策略评估 +- 自动驾驶决策系统的安全性与舒适性权衡分析 + +## 2. 选题说明 +- **技术方案**: 基于纯 Python + NumPy + Matplotlib 的奖励函数建模与可视化 +- **设计思路**: 将自动驾驶中的核心评价指标抽象为数学函数,通过参数化配置实现灵活的奖励设计,支持单场景分析与批量对比实验 +- **独特价值**: 无需 CARLA、AirSim 等重型模拟器,纯代码即可生成专业级分析图表 + +## 3. 开发运行环境 +- **操作系统**: Windows 10/11, Ubuntu 20.04/22.04, macOS +- **编程语言**: Python 3.8+ +- **核心依赖**: NumPy, Matplotlib +- **开发工具**: Visual Studio Code / PyCharm / Jupyter Notebook + +## 4. 模块结构与入口 +- 本模块的所有核心代码存放于 `src/driving_reward_analysis` 目录下 +- 模块的主程序入口为 `main.py`[text](../../src/driving_reward_analysis/README.md) +- 奖励函数定义位于 `rewards.py` +- 可视化工具位于 `visualizer.py` + +--- + +# driving_reward_analysis: 自动驾驶奖励函数分析系统 + +## 1. 模块功能 +本模块实现了自动驾驶奖励函数的完整建模与分析流程: + +- **速度跟踪奖励**: 基于高斯分布建模,车辆速度越接近目标速度奖励越高,支持自定义目标速度和容忍度 +- **距离控制奖励**: 分段线性函数建模,保持安全距离时获得最大奖励,过近时惩罚急剧增加 +- **舒适性奖励**: 基于加速度的二次函数建模,零加速度时奖励最高,剧烈加减速时惩罚增加 +- **批量仿真**: 支持多组参数配置的批量实验,生成对比分析图表 +- **可视化输出**: 自动生成专业的奖励函数曲线图,支持保存为 PNG 格式 + +## 2. 运行指南 + +### 步骤 1:安装依赖 +```bash +pip install numpy matplotlib -i https://pypi.tuna.tsinghua.edu.cn/simple +``` + +### 步骤 2:运行主程序 +在项目根目录下执行: +```bash +python src/driving_reward_analysis/main.py +``` + +程序将自动生成 `outputs/` 目录,并保存以下效果图: +- `reward_func_analysis.png` — 三类奖励函数综合分析图 +- `reward_comparison.png` — 不同参数下的奖励函数对比图 + +### 步骤 3:自定义参数 +编辑 `config.yaml` 文件调整奖励函数参数: +```yaml +speed_reward: + target_speed: 25.0 # 目标速度 (m/s) + tolerance: 5.0 # 速度容忍度 + +distance_reward: + safe_distance: 15.0 # 安全距离 (m) + critical_distance: 5.0 # 临界距离 (m) + +comfort_reward: + max_acceleration: 2.0 # 最大加速度 (m/s²) +``` + +## 3. 模块文件说明 +| 文件 | 功能 | +|------|------| +| `main.py` | 主入口,执行奖励函数建模与可视化 | +| `rewards.py` | 奖励函数定义:速度、距离、舒适性 | +| `visualizer.py` | 可视化工具:绘制奖励函数曲线 | +| `config.yaml` | 配置文件:奖励函数参数 | +| `utils.py` | 工具函数:数据生成、文件保存 | + +## 4. 奖励函数数学定义 + +### 速度跟踪奖励 +$$R_{speed}(v) = -\left(\frac{v - v_{target}}{\sigma}\right)^2$$ + +其中 $v_{target}$ 为目标速度,$\sigma$ 为容忍度参数。 + +### 距离控制奖励 +$$R_{dist}(d) = \begin{cases} -100 & d < d_{critical} \\ -10 \times (d_{safe} - d) & d_{critical} \leq d < d_{safe} \\ 0 & d \geq d_{safe} \end{cases}$$ + +### 舒适性奖励 +$$R_{comfort}(a) = -a^2$$ + +其中 $a$ 为加速度,零加速度时奖励最高。 + +## 5. 参考 +- [强化学习奖励设计](https://spinningup.openai.com/en/latest/spinningup/rl_intro.html) +- [自适应巡航控制](https://en.wikipedia.org/wiki/Adaptive_cruise_control) +- [NumPy 文档](https://numpy.org/doc/) +- [Matplotlib 文档](https://matplotlib.org/stable/) diff --git a/mkdocs.yml b/mkdocs.yml index 14c875e981..8256b7ba92 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -51,14 +51,18 @@ nav: - 自动驾驶感知系统: 'carla_yolo_detection/README.md' - 自动驾驶车辆语义分割: 'auto_drive_seg/README.md' - 无人机路径学习: 'drone_path_learning/README.md' +- 自动驾驶奖励分析: 'driving_reward_analysis/README.md' - 双足人形机器人SAC步态仿真: 'mujoco_running/running.md' - 人形机器人项目: mujoco_hci_sim/README.md +- 车辆自动驾驶辅助功能: 'vehicle_autonomous_core/README.md' - RL-ACC: 'rl_acc/README.md' - CARLA天气鲁棒性测试: 'carla_weather_robustness/README.md' - CARLA多传感器自动驾驶仿真平台: 'carla_multisensor_platform/carla_multisensor_platform.md' +- 无人机飞行控制程序: 'drone_flight_sim/README.md' +- 交通拥堵仿真与智能调控: 'lidar_project/README.md' # - mdx_math 用于行内公式显示 markdown_extensions: - admonition - mdx_math - - tables + - tables \ No newline at end of file diff --git a/src/driving_reward_analysis/README.md b/src/driving_reward_analysis/README.md new file mode 100644 index 0000000000..36fc7c0a94 --- /dev/null +++ b/src/driving_reward_analysis/README.md @@ -0,0 +1,101 @@ +# 自动驾驶奖励函数分析系统 + +## 1. 项目简介 +本项目实现了一套完整的自动驾驶奖励函数建模与可视化分析系统。通过数学建模方式定义速度跟踪、距离控制、舒适性三类核心奖励函数,支持参数化配置、批量仿真与可视化输出,无需任何模拟器即可独立运行。 + +系统可广泛应用于: +- 强化学习自动驾驶算法的奖励设计 +- 自适应巡航控制(ACC)策略评估 +- 自动驾驶决策系统的安全性与舒适性权衡分析 + +## 2. 选题说明 +- **技术方案**: 基于纯 Python + NumPy + Matplotlib 的奖励函数建模与可视化 +- **设计思路**: 将自动驾驶中的核心评价指标抽象为数学函数,通过参数化配置实现灵活的奖励设计,支持单场景分析与批量对比实验 +- **独特价值**: 无需 CARLA、AirSim 等重型模拟器,纯代码即可生成专业级分析图表 + +## 3. 开发运行环境 +- **操作系统**: Windows 10/11, Ubuntu 20.04/22.04, macOS +- **编程语言**: Python 3.8+ +- **核心依赖**: NumPy, Matplotlib +- **开发工具**: Visual Studio Code / PyCharm / Jupyter Notebook + +## 4. 模块结构与入口 +- 本模块的所有核心代码存放于 `src/driving_reward_analysis` 目录下 +- 模块的主程序入口为 `main.py` +- 奖励函数定义位于 `rewards.py` +- 可视化工具位于 `visualizer.py` + +--- + +# driving_reward_analysis: 自动驾驶奖励函数分析系统 + +## 1. 模块功能 +本模块实现了自动驾驶奖励函数的完整建模与分析流程: + +- **速度跟踪奖励**: 基于高斯分布建模,车辆速度越接近目标速度奖励越高,支持自定义目标速度和容忍度 +- **距离控制奖励**: 分段线性函数建模,保持安全距离时获得最大奖励,过近时惩罚急剧增加 +- **舒适性奖励**: 基于加速度的二次函数建模,零加速度时奖励最高,剧烈加减速时惩罚增加 +- **批量仿真**: 支持多组参数配置的批量实验,生成对比分析图表 +- **可视化输出**: 自动生成专业的奖励函数曲线图,支持保存为 PNG 格式 + +## 2. 运行指南 + +### 步骤 1:安装依赖 +```bash +pip install numpy matplotlib -i https://pypi.tuna.tsinghua.edu.cn/simple +``` + +### 步骤 2:运行主程序 +在项目根目录下执行: +```bash +python src/driving_reward_analysis/main.py +``` + +程序将自动生成 `outputs/` 目录,并保存以下效果图: +- `reward_func_analysis.png` — 三类奖励函数综合分析图 +- `reward_comparison.png` — 不同参数下的奖励函数对比图 + +### 步骤 3:自定义参数 +编辑 `config.yaml` 文件调整奖励函数参数: +```yaml +speed_reward: + target_speed: 25.0 # 目标速度 (m/s) + tolerance: 5.0 # 速度容忍度 + +distance_reward: + safe_distance: 15.0 # 安全距离 (m) + critical_distance: 5.0 # 临界距离 (m) + +comfort_reward: + max_acceleration: 2.0 # 最大加速度 (m/s²) +``` + +## 3. 模块文件说明 +| 文件 | 功能 | +|------|------| +| `main.py` | 主入口,执行奖励函数建模与可视化 | +| `rewards.py` | 奖励函数定义:速度、距离、舒适性 | +| `visualizer.py` | 可视化工具:绘制奖励函数曲线 | +| `config.yaml` | 配置文件:奖励函数参数 | +| `utils.py` | 工具函数:数据生成、文件保存 | + +## 4. 奖励函数数学定义 + +### 速度跟踪奖励 +$$R_{speed}(v) = -\left(\frac{v - v_{target}}{\sigma}\right)^2$$ + +其中 $v_{target}$ 为目标速度,$\sigma$ 为容忍度参数。 + +### 距离控制奖励 +$$R_{dist}(d) = \begin{cases} -100 & d < d_{critical} \\ -10 \times (d_{safe} - d) & d_{critical} \leq d < d_{safe} \\ 0 & d \geq d_{safe} \end{cases}$$ + +### 舒适性奖励 +$$R_{comfort}(a) = -a^2$$ + +其中 $a$ 为加速度,零加速度时奖励最高。 + +## 5. 参考 +- [强化学习奖励设计](https://spinningup.openai.com/en/latest/spinningup/rl_intro.html) +- [自适应巡航控制](https://en.wikipedia.org/wiki/Adaptive_cruise_control) +- [NumPy 文档](https://numpy.org/doc/) +- [Matplotlib 文档](https://matplotlib.org/stable/) diff --git a/src/driving_reward_analysis/config.yaml b/src/driving_reward_analysis/config.yaml new file mode 100644 index 0000000000..d6689f0b2d --- /dev/null +++ b/src/driving_reward_analysis/config.yaml @@ -0,0 +1,17 @@ +# 自动驾驶奖励函数分析系统 - 配置文件 + +speed_reward: + target_speed: 25.0 # 目标速度 (m/s) + tolerance: 5.0 # 速度容忍度 (m/s) + +distance_reward: + safe_distance: 15.0 # 安全距离 (m) + critical_distance: 5.0 # 临界距离 (m) + +comfort_reward: + max_acceleration: 2.0 # 最大加速度 (m/s²) + +visualization: + fig_size: [12, 10] # 图表尺寸 + dpi: 100 # 输出分辨率 + output_dir: "outputs" # 输出目录 diff --git a/src/driving_reward_analysis/main.py b/src/driving_reward_analysis/main.py new file mode 100644 index 0000000000..42ba2ee1ff --- /dev/null +++ b/src/driving_reward_analysis/main.py @@ -0,0 +1,110 @@ +"""自动驾驶奖励函数分析系统 - 主入口. + +运行方式: + python main.py # 执行完整分析 + python main.py --mode plot # 仅绘制理论曲线 + python main.py --mode sim # 仅运行仿真分析 +""" + +import sys +import argparse +from utils import load_config, generate_scenario_data, compute_rewards, ensure_output_dir +from visualizer import ( + plot_reward_curves, + plot_scenario_rewards, + plot_speed_reward_comparison, + plot_distance_reward_comparison, +) + + +def run_analysis(output_dir: str) -> None: + """执行完整分析流程.""" + print("=" * 60) + print(" 自动驾驶奖励函数分析系统") + print("=" * 60) + + # 1. 加载配置 + config = load_config() + print(f"\n[Config] 目标速度: {config['speed_reward']['target_speed']} m/s") + print(f"[Config] 安全距离: {config['distance_reward']['safe_distance']} m") + print(f"[Config] 临界距离: {config['distance_reward']['critical_distance']} m") + print(f"[Config] 最大加速度: {config['comfort_reward']['max_acceleration']} m/s²") + + # 2. 绘制理论奖励曲线 + print("\n[Plot] 生成奖励函数理论曲线...") + plot_reward_curves(output_dir, config) + + # 3. 生成场景数据 + print("\n[Sim] 生成仿真场景数据...") + scenario = generate_scenario_data(num_steps=200, dt=0.1) + + # 4. 计算奖励值 + print("[Sim] 计算奖励值...") + rewards = compute_rewards(scenario, config) + + # 5. 输出统计信息 + print("\n[Stats] 奖励统计:") + for name, key in [ + ("速度跟踪奖励", "speed_reward"), + ("距离控制奖励", "distance_reward"), + ("舒适性奖励", "comfort_reward"), + ("总奖励", "total_reward"), + ]: + vals = rewards[key] + print(f" {name}: mean={vals.mean():.3f}, std={vals.std():.3f}, " + f"min={vals.min():.3f}, max={vals.max():.3f}") + + # 6. 绘制场景奖励变化图 + print("\n[Plot] 生成场景仿真奖励曲线...") + plot_scenario_rewards(rewards, output_dir) + + # 7. 参数对比图 + print("\n[Plot] 生成参数对比图...") + plot_speed_reward_comparison(output_dir, config) + plot_distance_reward_comparison(output_dir, config) + + print(f"\n{'=' * 60}") + print(f" 分析完成!效果图已保存至: {output_dir}") + print(f"{'=' * 60}") + + +def main() -> None: + """主函数.""" + parser = argparse.ArgumentParser(description="自动驾驶奖励函数分析系统") + parser.add_argument( + "--mode", + choices=["all", "plot", "sim"], + default="all", + help="运行模式: all=全部, plot=仅绘图, sim=仅仿真 (default: all)", + ) + args = parser.parse_args() + + output_dir = ensure_output_dir("outputs") + + if args.mode in ("all", "plot"): + config = load_config() + plot_reward_curves(output_dir, config) + plot_speed_reward_comparison(output_dir, config) + plot_distance_reward_comparison(output_dir, config) + + if args.mode in ("all", "sim"): + config = load_config() + scenario = generate_scenario_data() + rewards = compute_rewards(scenario, config) + plot_scenario_rewards(rewards, output_dir) + + print("\n[Stats] 奖励统计:") + for name, key in [ + ("速度跟踪奖励", "speed_reward"), + ("距离控制奖励", "distance_reward"), + ("舒适性奖励", "comfort_reward"), + ("总奖励", "total_reward"), + ]: + vals = rewards[key] + print(f" {name}: mean={vals.mean():.3f}, std={vals.std():.3f}") + + print(f"\n完成!输出目录: {output_dir}") + + +if __name__ == "__main__": + main() diff --git a/src/driving_reward_analysis/rewards.py b/src/driving_reward_analysis/rewards.py new file mode 100644 index 0000000000..120ecd3073 --- /dev/null +++ b/src/driving_reward_analysis/rewards.py @@ -0,0 +1,91 @@ +"""自动驾驶奖励函数定义模块. + +实现三类核心奖励函数: +- 速度跟踪奖励 (Speed Tracking Reward) +- 距离控制奖励 (Distance Control Reward) +- 舒适性奖励 (Comfort Reward) +""" + +import numpy as np + + +def speed_reward( + velocity: np.ndarray, + config: dict, +) -> np.ndarray: + """速度跟踪奖励函数. + + 基于高斯分布建模,车辆速度越接近目标速度奖励越高. + R_speed(v) = -((v - v_target) / sigma)^2 + + Args: + velocity: 当前速度数组 (m/s) + config: 速度奖励配置,包含 target_speed 和 tolerance + + Returns: + 速度奖励值数组 + """ + target = config.get("target_speed", 25.0) + sigma = config.get("tolerance", 5.0) + + reward = -((velocity - target) / sigma) ** 2 + return np.clip(reward, -100, 0) + + +def distance_reward( + distance: np.ndarray, + config: dict, +) -> np.ndarray: + """距离控制奖励函数. + + 分段线性函数建模: + - d < critical: 严重惩罚 (-100) + - critical <= d < safe: 线性增长 + - d >= safe: 最大奖励 (0) + + Args: + distance: 当前前车距离数组 (m) + config: 距离奖励配置,包含 safe_distance 和 critical_distance + + Returns: + 距离奖励值数组 + """ + safe = config.get("safe_distance", 15.0) + critical = config.get("critical_distance", 5.0) + + reward = np.zeros_like(distance, dtype=np.float64) + + # d < critical: 严重惩罚 + mask_critical = distance < critical + reward[mask_critical] = -100.0 + + # critical <= d < safe: 线性增长 + mask_linear = (distance >= critical) & (distance < safe) + reward[mask_linear] = -10.0 * (safe - distance[mask_linear]) + + # d >= safe: 0 (最大奖励) + # 默认为 0 + + return reward + + +def comfort_reward( + acceleration: np.ndarray, + config: dict, +) -> np.ndarray: + """舒适性奖励函数. + + 基于加速度的二次函数建模,零加速度时奖励最高. + R_comfort(a) = -a^2 + + Args: + acceleration: 当前加速度数组 (m/s²) + config: 舒适性奖励配置,包含 max_acceleration + + Returns: + 舒适性奖励值数组 + """ + max_acc = config.get("max_acceleration", 2.0) + + reward = -(acceleration**2) + return np.clip(reward, -(max_acc**2) * 2, 0) diff --git a/src/driving_reward_analysis/utils.py b/src/driving_reward_analysis/utils.py new file mode 100644 index 0000000000..0ecc501652 --- /dev/null +++ b/src/driving_reward_analysis/utils.py @@ -0,0 +1,107 @@ +"""工具函数:配置加载、数据生成、文件保存.""" + +import os +import yaml +import numpy as np + + +def load_config(config_path: str = None) -> dict: + """加载 YAML 配置文件. + + Args: + config_path: 配置文件路径,默认为同目录下的 config.yaml + + Returns: + 配置字典 + """ + if config_path is None: + config_path = os.path.join(os.path.dirname(__file__), "config.yaml") + + with open(config_path, "r", encoding="utf-8") as f: + config = yaml.safe_load(f) + return config + + +def generate_scenario_data( + num_steps: int = 200, + dt: float = 0.1, + seed: int = 42, +) -> dict: + """生成仿真场景数据. + + 模拟车辆在高速公路上行驶的场景,包含速度、距离和加速度数据. + + Args: + num_steps: 时间步数 + dt: 时间步长 (s) + seed: 随机种子 + + Returns: + 包含 velocity, distance, acceleration 的字典 + """ + rng = np.random.default_rng(seed) + t = np.arange(num_steps) * dt + + # 模拟速度变化:基准速度 + 随机波动 + 周期性变化 + velocity = 25.0 + 5.0 * np.sin(0.1 * t) + rng.normal(0, 2.0, num_steps) + + # 模拟前车距离变化 + distance = 20.0 - 8.0 * np.sin(0.08 * t) + rng.normal(0, 3.0, num_steps) + distance = np.clip(distance, 1.0, 40.0) + + # 加速度通过速度差分计算 + acceleration = np.gradient(velocity, dt) + + return { + "time": t, + "velocity": velocity, + "distance": distance, + "acceleration": acceleration, + } + + +def compute_rewards( + scenario: dict, + config: dict, +) -> dict: + """计算场景中所有奖励值. + + Args: + scenario: 场景数据字典 + config: 配置字典 + + Returns: + 包含各类奖励值的字典 + """ + from rewards import speed_reward, distance_reward, comfort_reward + + v = scenario["velocity"] + d = scenario["distance"] + a = scenario["acceleration"] + + return { + "time": scenario["time"], + "speed_reward": speed_reward(v, config["speed_reward"]), + "distance_reward": distance_reward(d, config["distance_reward"]), + "comfort_reward": comfort_reward(a, config["comfort_reward"]), + "total_reward": ( + speed_reward(v, config["speed_reward"]) + + distance_reward(d, config["distance_reward"]) + + comfort_reward(a, config["comfort_reward"]) + ), + } + + +def ensure_output_dir(output_dir: str) -> str: + """确保输出目录存在. + + Args: + output_dir: 输出目录路径 + + Returns: + 输出目录的绝对路径 + """ + base = os.path.dirname(__file__) + full_path = os.path.join(base, output_dir) + os.makedirs(full_path, exist_ok=True) + return full_path diff --git a/src/driving_reward_analysis/visualizer.py b/src/driving_reward_analysis/visualizer.py new file mode 100644 index 0000000000..c1204ac475 --- /dev/null +++ b/src/driving_reward_analysis/visualizer.py @@ -0,0 +1,232 @@ +"""可视化模块:生成奖励函数分析图表.""" + +import os +import numpy as np +import matplotlib.pyplot as plt + + +# 设置中文字体 +plt.rcParams["font.sans-serif"] = ["SimHei", "Microsoft YaHei", "DejaVu Sans"] +plt.rcParams["axes.unicode_minus"] = False + + +def plot_reward_curves( + output_dir: str, + config: dict, + prefix: str = "", +) -> str: + """绘制三类奖励函数的理论曲线. + + Args: + output_dir: 输出目录路径 + config: 配置字典 + + Returns: + 保存的图片路径 + """ + speed_cfg = config["speed_reward"] + distance_cfg = config["distance_reward"] + comfort_cfg = config["comfort_reward"] + + fig, axes = plt.subplots(1, 3, figsize=(15, 4.5)) + + # ---- 速度跟踪奖励 ---- + v = np.linspace(0, 50, 200) + speed_r = -((v - speed_cfg["target_speed"]) / speed_cfg["tolerance"]) ** 2 + speed_r = np.clip(speed_r, -100, 0) + + ax = axes[0] + ax.plot(v, speed_r, "b-", linewidth=2) + ax.axvline(speed_cfg["target_speed"], color="r", linestyle="--", alpha=0.7, + label=f'Target={speed_cfg["target_speed"]} m/s') + ax.fill_between( + v, -10, speed_r, + where=(v > speed_cfg["target_speed"] - speed_cfg["tolerance"]) + & (v < speed_cfg["target_speed"] + speed_cfg["tolerance"]), + alpha=0.15, color="green", + ) + ax.set_xlabel("Speed (m/s)", fontsize=11) + ax.set_ylabel("Reward", fontsize=11) + ax.set_title("Speed Tracking Reward", fontsize=13, fontweight="bold") + ax.legend(fontsize=9) + ax.grid(True, alpha=0.3) + + # ---- 距离控制奖励 ---- + d = np.linspace(0, 30, 200) + safe = distance_cfg["safe_distance"] + critical = distance_cfg["critical_distance"] + dist_r = np.zeros_like(d) + dist_r[d < critical] = -100 + mask = (d >= critical) & (d < safe) + dist_r[mask] = -10 * (safe - d[mask]) + + ax = axes[1] + ax.plot(d, dist_r, "orange", linewidth=2) + ax.axvline(critical, color="r", linestyle="--", alpha=0.7, + label=f'Critical={critical} m') + ax.axvline(safe, color="g", linestyle="--", alpha=0.7, + label=f'Safe={safe} m') + ax.set_xlabel("Distance (m)", fontsize=11) + ax.set_ylabel("Reward", fontsize=11) + ax.set_title("Distance Control Reward", fontsize=13, fontweight="bold") + ax.legend(fontsize=9) + ax.grid(True, alpha=0.3) + + # ---- 舒适性奖励 ---- + a = np.linspace(-5, 5, 200) + comfort_r = -(a**2) + comfort_r = np.clip(comfort_r, -(comfort_cfg["max_acceleration"]**2) * 2, 0) + + ax = axes[2] + ax.plot(a, comfort_r, "green", linewidth=2) + ax.axvline(0, color="gray", linestyle="--", alpha=0.5) + ax.axvspan( + -comfort_cfg["max_acceleration"], comfort_cfg["max_acceleration"], + alpha=0.15, color="green", label=f'±{comfort_cfg["max_acceleration"]} m/s²' + ) + ax.set_xlabel("Acceleration (m/s²)", fontsize=11) + ax.set_ylabel("Reward", fontsize=11) + ax.set_title("Comfort Reward", fontsize=13, fontweight="bold") + ax.legend(fontsize=9) + ax.grid(True, alpha=0.3) + + fig.suptitle("Autonomous Driving Reward Function Analysis", + fontsize=15, fontweight="bold", y=1.02) + plt.tight_layout() + + save_path = os.path.join(output_dir, f"{prefix}reward_func_analysis.png") + fig.savefig(save_path, dpi=150, bbox_inches="tight") + plt.close(fig) + print(f"[Saved] {save_path}") + return save_path + + +def plot_scenario_rewards( + rewards: dict, + output_dir: str, + prefix: str = "", +) -> str: + """绘制场景仿真中的奖励变化曲线. + + Args: + rewards: 包含 time, speed_reward, distance_reward, comfort_reward, total_reward 的字典 + output_dir: 输出目录路径 + + Returns: + 保存的图片路径 + """ + t = rewards["time"] + + fig, axes = plt.subplots(4, 1, figsize=(12, 10), sharex=True) + + titles = [ + "Speed Tracking Reward", + "Distance Control Reward", + "Comfort Reward", + "Total Reward", + ] + keys = ["speed_reward", "distance_reward", "comfort_reward", "total_reward"] + colors = ["blue", "orange", "green", "purple"] + + for i, (ax, title, key, color) in enumerate(zip(axes, titles, keys, colors)): + ax.plot(t, rewards[key], color=color, linewidth=1.5) + ax.set_ylabel("Reward", fontsize=11) + ax.set_title(title, fontsize=12, fontweight="bold") + ax.grid(True, alpha=0.3) + ax.axhline(0, color="gray", linestyle="--", alpha=0.5) + + axes[-1].set_xlabel("Time (s)", fontsize=11) + + fig.suptitle("Reward Evolution During Simulation", + fontsize=14, fontweight="bold", y=1.01) + plt.tight_layout() + + save_path = os.path.join(output_dir, f"{prefix}reward_comparison.png") + fig.savefig(save_path, dpi=150, bbox_inches="tight") + plt.close(fig) + print(f"[Saved] {save_path}") + return save_path + + +def plot_speed_reward_comparison( + output_dir: str, + config: dict, +) -> str: + """绘制不同目标速度下的速度奖励对比图. + + Args: + output_dir: 输出目录路径 + config: 配置字典 + + Returns: + 保存的图片路径 + """ + v = np.linspace(0, 50, 200) + tolerance = config["speed_reward"]["tolerance"] + + targets = [20.0, 25.0, 30.0] + colors = ["blue", "orange", "green"] + + fig, ax = plt.subplots(figsize=(8, 5)) + + for target, color in zip(targets, colors): + r = -((v - target) / tolerance) ** 2 + r = np.clip(r, -100, 0) + ax.plot(v, r, color=color, linewidth=2, label=f'Target={target} m/s') + + ax.set_xlabel("Speed (m/s)", fontsize=12) + ax.set_ylabel("Reward", fontsize=12) + ax.set_title("Speed Reward with Different Target Speeds", + fontsize=14, fontweight="bold") + ax.legend(fontsize=10) + ax.grid(True, alpha=0.3) + + plt.tight_layout() + save_path = os.path.join(output_dir, "speed_reward_comparison.png") + fig.savefig(save_path, dpi=150, bbox_inches="tight") + plt.close(fig) + print(f"[Saved] {save_path}") + return save_path + + +def plot_distance_reward_comparison( + output_dir: str, + config: dict, +) -> str: + """绘制不同安全距离下的距离奖励对比图. + + Args: + output_dir: 输出目录路径 + config: 配置字典 + + Returns: + 保存的图片路径 + """ + d = np.linspace(0, 30, 200) + critical = config["distance_reward"]["critical_distance"] + + safe_values = [10.0, 15.0, 20.0] + colors = ["orange", "red", "brown"] + + fig, ax = plt.subplots(figsize=(8, 5)) + + for safe, color in zip(safe_values, colors): + r = np.zeros_like(d) + r[d < critical] = -100 + mask = (d >= critical) & (d < safe) + r[mask] = -10 * (safe - d[mask]) + ax.plot(d, r, color=color, linewidth=2, label=f'Safe={safe} m') + + ax.set_xlabel("Distance (m)", fontsize=12) + ax.set_ylabel("Reward", fontsize=12) + ax.set_title("Distance Reward with Different Safe Distances", + fontsize=14, fontweight="bold") + ax.legend(fontsize=10) + ax.grid(True, alpha=0.3) + + plt.tight_layout() + save_path = os.path.join(output_dir, "distance_reward_comparison.png") + fig.savefig(save_path, dpi=150, bbox_inches="tight") + plt.close(fig) + print(f"[Saved] {save_path}") + return save_path