🚀 802SLM - 小型语言模型学习项目 | 802SLM - Small Language Model Learning Project

一个专注于学习现代大模型架构的个人学习仓库 | A personal learning project focused on modern large language model architectures

📋 项目简介 | Project Overview

本项目是一个学习型项目，专注于实现和理解现代大语言模型的核心技术栈。项目实现了 GQA (Group Query Attention)、MLA (Multi-head Latent Attention) 和 MOE (Mixture of Experts) 等先进架构，旨在通过实践深入理解这些技术。

This is a learning project focused on implementing and understanding core technologies of modern large language models. The project implements advanced architectures including GQA (Group Query Attention), MLA (Multi-head Latent Attention), and MOE (Mixture of Experts), aiming to deeply understand these technologies through hands-on practice.

🔬 技术特色 | Technical Features

GQA (Group Query Attention): 实现分组查询注意力机制，平衡计算效率与模型性能
MLA (Multi-head Latent Attention): 基于矩阵吸收的高效多头注意力实现
MOE (Mixture of Experts): 混合专家系统架构（开发中）
KV缓存优化: 支持高效的推理时缓存机制
RoPE位置编码: 实现旋转位置编码，支持长序列建模

📚 代码继承关系 | Code Inheritance

本项目部分代码参考了 minimind 项目，在其基础上进行了扩展和改进，实现了更高效的注意力机制和模型架构。

Some code in this project references the minimind project, with extensions and improvements implemented on that basis to achieve more efficient attention mechanisms and model architectures.

🏗️ 项目结构 | Project Structure

802_SLM/
├── 📁 model/                    # 核心模型实现
│   ├── SLM_model.py            # 基础SLM模型 (GQA实现)
│   ├── SLM_MLA_model.py        # 集成MLA的SLM模型
│   ├── MLA.py                  # MLA注意力模块实现
│   ├── SLM_MLA_MOE.py          # MLA+MOE混合架构 (待开发)
│   └── model_inference.py      # 推理优化模型
├── 📁 train/                   # 训练脚本
│   ├── pretrain.py             # 预训练脚本
│   ├── fsft.py                 # 全参数微调脚本
│   └── pretrain_model_test.py  # 模型测试
├── 📁 dataset/                 # 数据集处理
│   ├── init_dataset.py         # 数据集初始化
│   ├── pretrain_hq.jsonl       # 预训练数据
│   ├── sft_512.jsonl           # 指令微调数据
│   └── sft_mini_512.jsonl      # 小型微调数据
├── 📁 tokenizer/               # 分词器
│   ├── train_tokenizer_native_python.py  # 原生Python分词器训练
│   ├── optimize_train_tokenizer.py      # 分词器优化
│   └── tokenizer_third_party.py          # 第三方分词器支持
├── 📁 convert/                 # 模型转换
│   └── convert_to_transformer.py         # 转换为Transformers格式
├── 📁 pth_model/               # 训练好的模型权重
├── 📁 802SLM/                  # HuggingFace格式模型
└── 📄 test_eval_*.py           # 评估和推理测试

🚀 快速开始 | Quick Start

🛠️ 环境配置 | Environment Setup

克隆项目 | Clone the repository

git clone https://github.com/SeaTheDestiny/802SLM.git
cd 802_SLM

创建虚拟环境 | Create virtual environment

conda create -n minimind python=3.8
conda activate minimind

安装依赖 | Install dependencies

pip install -r requirements.txt

📊 模型训练 | Model Training

预训练 | Pretraining

# 启动预训练
python train/pretrain.py

全参数微调 | Full Parameter Fine-tuning

# 全参数微调
python train/fsft.py

🎯 模型推理 | Model Inference

# 基础推理测试
python test_eval_inference.py 

# 模型评估
python test_eval_model.py

🧠 核心技术 | Core Technologies

GQA (Group Query Attention)

实现位置: model/SLM_model.py:64-136
特点: 通过分组查询减少KV缓存开销，提升推理效率
优势: 在保持模型性能的同时显著降低内存使用

MLA (Multi-head Latent Attention)

实现位置: model/SLM_MLA_model.py:64-224, model/MLA.py:68-228
特点:
- 基于矩阵吸收的高效注意力计算
- 支持LoRA低秩适配
- 集成RoPE位置编码
- KV缓存优化支持
优势: 大幅减少注意力计算复杂度，支持长序列建模

MOE (Mixture of Experts)

状态: 开发中 (In Development)
目标: 实现稀疏专家混合，进一步提升模型容量和效率

📈 模型配置 | Model Configurations

基础SLM配置 | Basic SLM Config

SLMconfig(
    hidden_size=512,
    num_hidden_layers=8,
    num_attention_heads=8,
    vocab_size=6400,
    max_position_embeddings=32768
)

MLA增强配置 | MLA Enhanced Config

SLMconfig(
    hidden_size=768,
    num_hidden_layers=14,
    num_heads=8,
    q_lora_rank=24,
    kv_lora_rank=24,
    qk_rope_head_dim=64,
    qk_nope_head_dim=16,
    v_head_dim=64
)

🔧 主要依赖 | Dependencies

深度学习框架: PyTorch 2.3.0, Transformers 4.48.0
数据处理: datasets, pandas, numpy
训练工具: wandb, trl, peft
评估工具: scikit-learn, sentence-transformers
Web服务: Flask, Streamlit

完整依赖列表请查看 requirements.txt。

🤝 贡献指南 | Contributing

欢迎提交 Issue 和 Pull Request！由于这是一个学习项目，我们特别鼓励：

技术讨论和代码优化建议
新架构实现的贡献
文档改进和错误修正
训练技巧和经验分享

Issues and Pull Requests are welcome! As this is a learning project, we especially encourage:

Technical discussions and code optimization suggestions
Contributions to new architecture implementations
Documentation improvements and bug fixes
Training techniques and experience sharing

🙏 致谢 | Acknowledgments

minimind - 提供了基础架构和灵感
Hugging Face Transformers - 提供了强大的模型框架
PyTorch团队 - 优秀的深度学习框架

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 802SLM - 小型语言模型学习项目 | 802SLM - Small Language Model Learning Project

📋 项目简介 | Project Overview

🔬 技术特色 | Technical Features

📚 代码继承关系 | Code Inheritance

🏗️ 项目结构 | Project Structure

🚀 快速开始 | Quick Start

🛠️ 环境配置 | Environment Setup

📊 模型训练 | Model Training

预训练 | Pretraining

全参数微调 | Full Parameter Fine-tuning

🎯 模型推理 | Model Inference

🧠 核心技术 | Core Technologies

GQA (Group Query Attention)

MLA (Multi-head Latent Attention)

MOE (Mixture of Experts)

📈 模型配置 | Model Configurations

基础SLM配置 | Basic SLM Config

MLA增强配置 | MLA Enhanced Config

🔧 主要依赖 | Dependencies

🤝 贡献指南 | Contributing

🙏 致谢 | Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
802SLM		802SLM
convert		convert
dataset		dataset
model		model
tokenizer		tokenizer
train		train
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
test_eval_MLA_MOE.py		test_eval_MLA_MOE.py
test_eval_MLA_model.py		test_eval_MLA_model.py
test_eval_inference.py		test_eval_inference.py
test_eval_model.py		test_eval_model.py

Folders and files

Latest commit

History

Repository files navigation

🚀 802SLM - 小型语言模型学习项目 | 802SLM - Small Language Model Learning Project

📋 项目简介 | Project Overview

🔬 技术特色 | Technical Features

📚 代码继承关系 | Code Inheritance

🏗️ 项目结构 | Project Structure

🚀 快速开始 | Quick Start

🛠️ 环境配置 | Environment Setup

📊 模型训练 | Model Training

预训练 | Pretraining

全参数微调 | Full Parameter Fine-tuning

🎯 模型推理 | Model Inference

🧠 核心技术 | Core Technologies

GQA (Group Query Attention)

MLA (Multi-head Latent Attention)

MOE (Mixture of Experts)

📈 模型配置 | Model Configurations

基础SLM配置 | Basic SLM Config

MLA增强配置 | MLA Enhanced Config

🔧 主要依赖 | Dependencies

🤝 贡献指南 | Contributing

🙏 致谢 | Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages