diff --git a/readme.md b/readme.md
index 87979f5..211a7fb 100644
--- a/readme.md
+++ b/readme.md
@@ -1,5 +1,11 @@
 # LLM Papers We Recommend to Read
 
+## 🌐 Language / 언어
+- **English** (Current)
+- **[한국어 (Korean)](./translation/ko/readme_ko.md)**
+
+---
+
 The past several years has marked the steady rise of large language models (LLMs), largely driven by advancements in computational power, data availability, and algorithmic innovation. LLMs have profoundly shaped the research landscape, introducing new methodologies and paradigms that challenge traditional approaches.
 
 We have also expanded our research interests to the field of LLMs. Here are some research papers related to LLMs. We highly recommend beginners to read and thoroughly understand these papers.
diff --git a/translation/ko/moe_related_ko.md b/translation/ko/moe_related_ko.md
new file mode 100644
index 0000000..3192d25
--- /dev/null
+++ b/translation/ko/moe_related_ko.md
@@ -0,0 +1,38 @@
+## MoE 추론 최적화
+
+| 제목                                                        | 링크                                                         |
+| ------------------------------------------------------------ | ------------------------------------------------------------ |
+| Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference | [[paper]](http://arxiv.org/abs/2308.12066)                   |
+| Fast Inference of Mixture-of-Experts Language Models with Offloading | [[paper]](http://arxiv.org/abs/2312.17238)                   |
+| MoE-Infinity: Activation-Aware Expert Offloading for Efficient MoE Serving | [[paper]](http://arxiv.org/abs/2401.14361)                   |
+| Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models | [[paper]](http://arxiv.org/abs/2402.07033)                   |
+| Exploiting Inter-Layer Expert Affinity for Accelerating Mixture-of-Experts Model Inference | [[paper]](http://arxiv.org/abs/2401.08383)                   |
+| SiDA: Sparsity-Inspired Data-Aware Serving for Efficient and Scalable Large Mixture-of-Experts Models | [[paper]](http://arxiv.org/abs/2310.18859)                   |
+| SwapMoE: Efficient Memory-Constrained Serving of Large Sparse MoE Models via Dynamic Expert Pruning and Swapping | [[paper]](http://arxiv.org/abs/2308.15030)                   |
+| Accelerating Distributed MoE Training and Inference with Lina | [[paper]](https://www.usenix.org/conference/atc23/presentation/li-jiamin) |
+| Towards MoE Deployment: Mitigating Inefficiencies in Mixture-of-Expert (MoE) Inference | [[paper]](http://arxiv.org/abs/2303.06182)                   |
+| EdgeMoE: Fast On-Device Inference of MoE-based Large Language Models | [[paper]](http://arxiv.org/abs/2308.14352)                   |
+| AdapMoE: Adaptive Sensitivity-based Expert Gating and Management for Efficient MoE Inference | [[paper]](http://arxiv.org/abs/2408.10284)                   |
+| ExpertFlow: Optimized Expert Activation and Token Allocation for Efficient Mixture-of-Experts Inference | [[paper]](http://arxiv.org/abs/2410.17954)                   |
+| ProMoE: Fast MoE-based LLM Serving using Proactive Caching   | [[paper]](http://arxiv.org/abs/2410.22134)                   |
+| HOBBIT: A Mixed Precision Expert Offloading System for Fast MoE Inference | [[paper]](http://arxiv.org/abs/2411.01433)                   |
+| Toward Efficient Inference for Mixture of Experts            | [[paper]](https://proceedings.neurips.cc/paper_files/paper/2024/hash/98bf3b8505c611ac21055dd9d355c66e-Abstract-Conference.html) |
+| A Survey on Inference Optimization Techniques for Mixture of Experts Models | [[paper]](http://arxiv.org/abs/2412.14219)                   |
+| MoESys: A Distributed and Efficient Mixture-of-Experts Training and Inference System for Internet Services | [[paper]](http://arxiv.org/abs/2205.10034)                   |
+| EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference | [[paper]](http://arxiv.org/abs/2410.12247)                   |
+| fMoE: Fine-Grained Expert Offloading for Large Mixture-of-Experts Serving | [[paper]](http://arxiv.org/abs/2502.05370)                   |
+| MoETuner: Optimized Mixture of Expert Serving with Balanced Expert Placement and Token Routing | [[paper]](http://arxiv.org/abs/2502.06643)                   |
+| Klotski: Efficient Mixture-of-Expert Inference via Expert-Aware Multi-Batch Pipeline | [[paper]](http://arxiv.org/abs/2502.06888)                   |
+| Optimizing Distributed Deployment of Mixture-of-Experts Model Inference in Serverless Computing | [[paper]](http://arxiv.org/abs/2501.05313)                   |
+| DAOP: Data-Aware Offloading and Predictive Pre-Calculation for Efficient MoE Inference | [[paper]](http://arxiv.org/abs/2501.10375)                   |
+| Comet: Fine-grained Computation-communication Overlapping for Mixture-of-Experts | [[paper]](http://arxiv.org/abs/2502.19811)                   |
+| Harnessing Inter-GPU Shared Memory for Seamless MoE Communication-Computation Fusion | [[paper]](https://dl.acm.org/doi/10.1145/3710848.3710868)    |
+| CoServe: Efficient Collaboration-of-Experts (CoE) Model Inference with Limited Memory | [[paper]](http://arxiv.org/abs/2503.02354)                   |
+| eMoE: Task-aware Memory Efficient Mixture-of-Experts-Based (MoE) Model Inference | [[paper]](http://arxiv.org/abs/2503.06823)                   |
+| Accelerating MoE Model Inference with Expert Sharding        | [[paper]](http://arxiv.org/abs/2503.08467)                   |
+| Samoyeds: Accelerating MoE Models with Structured Sparsity Leveraging Sparse Tensor Cores | [[paper]](http://arxiv.org/abs/2503.10725)                   |
+| MoE-Gen: High-Throughput MoE Inference on a Single GPU with Module-Based Batching | [[paper]](http://arxiv.org/abs/2503.09716)                   |
+| MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism | [[paper]](https://arxiv.org/abs/2504.02263)                  |
+| D$^2$MoE: Dual Routing and Dynamic Scheduling for Efficient On-Device MoE-based LLM Serving | [[paper]](http://arxiv.org/abs/2504.15299)                   |
+| Faster MoE LLM Inference for Extremely Large Models          | [[paper]](http://arxiv.org/abs/2505.03531)                   |
+| Toward Cost-Efficient Serving of Mixture-of-Experts with Asynchrony | [[paper]](http://arxiv.org/abs/2505.08944)                   |
\ No newline at end of file
diff --git a/translation/ko/paper.md b/translation/ko/paper.md
new file mode 100644
index 0000000..2eee37d
--- /dev/null
+++ b/translation/ko/paper.md
@@ -0,0 +1,2 @@
+# paper
+
diff --git a/translation/ko/readme_ko.md b/translation/ko/readme_ko.md
new file mode 100644
index 0000000..da7e7f5
--- /dev/null
+++ b/translation/ko/readme_ko.md
@@ -0,0 +1,165 @@
+# 우리가 읽기를 추천하는 LLM 논문들
+
+## 🌐 Language / 언어
+- **[English](../../readme.md)**
+- **한국어 (Korean)** (Current)
+
+---
+
+최근 몇 년 동안 대규모 언어 모델(LLMs)이 꾸준히 발전해왔고 이는 계산 능력, 데이터 가용성, 알고리즘 혁신 덕분입니다. LLM들은 연구 환경을 근본적으로 변화시켰고, 전통적인 접근 방식에 도전하는 새로운 방법론과 패러다임을 도입했습니다.
+
+저희도 이런 흐름에 따라 연구 관심사를 LLM 분야로 확장했습니다. 이하 제시되는 내용들은 LLM과 관련된 몇 가지 연구 논문들입니다. 우리는 초보자들이 이러한 논문들을 읽고 철저히 이해하기를 강력히 추천합니다.
+
+:smile: **우리는 모든 기여를 환영하고 소중히 여깁니다.**
+
+## LLM의 기본 구조(Basic Architectures of LLMs)
+
+| 제목                                                        | 링크                                                         |
+| ------------------------------------------------------------ | ------------------------------------------------------------ |
+| Sequence to Sequence Learning with Neural Networks           | [[paper]](https://arxiv.org/abs/1409.3215)                   |
+| Transformer: Attention Is All You Need                       | [[paper]](http://arxiv.org/abs/1706.03762)                   |
+| BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding | [[paper]](https://arxiv.org/abs/1810.04805)                  |
+| GPT: Improving Language Understanding by Generative Pre-Training | [[paper]](https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf) |
+| GPT2: Language Models are Unsupervised Multitask Learners    | [[paper]](https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf) |
+| GPT3: Language Models are Few-Shot Learners                  | [[paper]](https://arxiv.org/abs/2005.14165)                  |
+| GPT3.5: Fine-Tuning Language Models from Human Preferences   | [[paper]](https://arxiv.org/abs/1909.08593)                  |
+| LLaMA: Open and Efficient Foundation Language Models         | [[paper]](http://arxiv.org/abs/2302.13971)                   |
+| Llama 2: Open Foundation and Fine-Tuned Chat Models          | [[paper]](http://arxiv.org/abs/2307.09288)                   |
+| Qwen2.5-1M Technical Report                                  | [[paper]](https://arxiv.org/abs/2501.15383)                  |
+
+> **이하의 모든 논문을 읽기 전에 기본 구조에 대한 논문을 읽는것을 강력히 권고합니다.**
+
+### 멀티모달 대규모 언어 모델(Multimodal Large Language Models)
+
+| 제목                                                        | 링크                                        |
+| ------------------------------------------------------------ | ------------------------------------------- |
+| Efficient Multimodal Large Language Models: A Survey         | [[paper]](http://arxiv.org/abs/2405.10739)  |
+| CLIP: Learning Transferable Visual Models From Natural Language Supervision | [[paper]](https://arxiv.org/abs/2103.00020) |
+| Seed1.5-VL Technical Report                                  | [[paper]](http://arxiv.org/abs/2505.07062)  |
+| MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining | [[paper]](http://arxiv.org/abs/2505.07608)  |
+
+> 멀티모달이란 여러 가지 유형의 데이터를 결합하여 처리하는 것을 의미합니다.
+
+## 병렬 훈련 시스템(Parallelism Training System)
+
+| 제목                                                        | 링크                                                         |
+| ------------------------------------------------------------ | ------------------------------------------------------------ |
+| Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism | [[paper]](http://arxiv.org/abs/1909.08053)                   |
+| ZeRO: Memory Optimizations Toward Training Trillion Parameter Models | [[paper]](http://arxiv.org/abs/1910.02054)                   |
+| ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning | [[paper]](http://arxiv.org/abs/2104.07857)                   |
+| ZeRO-Offload: Democratizing Billion-Scale Model Training     | [[paper]](https://www.usenix.org/conference/atc21/presentation/ren-jie) |
+| PipeDream: generalized pipeline parallelism for DNN training | [[paper]](https://arxiv.org/abs/1806.03377)                  |
+| GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism | [[paper]](http://arxiv.org/abs/1811.06965)                   |
+| TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale Language Models | [[paper]](http://arxiv.org/abs/2102.07988)                   |
+| GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding | [[paper]](http://arxiv.org/abs/2006.16668)                   |
+| PanGu-$\Sigma$: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing | [[paper]](http://arxiv.org/abs/2303.10845)                   |
+| DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale | [[paper]](https://proceedings.mlr.press/v162/rajbhandari22a.html) |
+| Accelerating Distributed MoE Training and Inference with Lina | [[paper]](https://www.usenix.org/conference/atc23/presentation/li-jiamin) |
+| Galvatron: Efficient Transformer Training over Multiple GPUs Using Automatic Parallelism | [[paper]](http://arxiv.org/abs/2211.13878)                   |
+| Alpa: Automating Inter- and {Intra-Operator} Parallelism for Distributed Deep Learning | [[paper]](https://www.usenix.org/conference/osdi22/presentation/zheng-lianmin) |
+| Pangu Ultra MoE: How to Train Your Big MoE on Ascend NPUs    | [[paper]](http://arxiv.org/abs/2505.04519)                   |
+
+> 병렬 처리 훈련 시스템(Parallelism Training System)이란 대규모 딥러닝 모델(특히 LLM)을 여러 대의 GPU, TPU, 또는 컴퓨트 노드에 분산시켜 동시에 학습시키는 기술 및 시스템을 의미합니다.
+
+## LLM 서빙 시스템(LLM Serving System)
+
+LLM 서빙 작업은 다음과 같은 분야로 분류될 수 있다고 생각합니다: *시스템 최적화* (예: vLLM), *스케줄링 최적화* (예: DistServe, Llumnix), *오프로딩* (예: FlexGen), *접두사 공유*, *KV 캐시 압축/제거/선택*, 그리고 *추측적 디코딩*.
+
+향후 실제 분류를 진행할 예정입니다.
+
+| 제목                                                        | 링크                                                         |
+| ------------------------------------------------------------ | ------------------------------------------------------------ |
+| Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems | [[paper]](http://arxiv.org/abs/2312.15234)                   |
+| FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness | [[paper]](https://arxiv.org/abs/2205.14135)                  |
+| Efficiently Scaling Transformer Inference                    | [[paper]](http://arxiv.org/abs/2211.05102)                   |
+| vLLM: Efficient Memory Management for Large Language Model Serving with PagedAttention | [[paper]](http://arxiv.org/abs/2309.06180)                   |
+| DeepSpeed-Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale | [[paper]](https://arxiv.org/abs/2207.00032)                  |
+| Orca: A Distributed Serving System for Transformer-Based Generative Models | [[paper]](https://www.usenix.org/conference/osdi22/presentation/yu) |
+| FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU | [[paper]](http://arxiv.org/abs/2303.06865)                   |
+| S$^{3}$: Increasing GPU Utilization during Generative Inference for Higher Throughput | [[paper]](http://arxiv.org/abs/2306.06000)                   |
+| Splitwise: Efficient generative LLM inference using phase splitting | [[paper]](http://arxiv.org/abs/2311.18677)                   |
+| SpecInfer: Accelerating Generative Large Language Model Serving with Speculative Inference and Token Tree Verification | [[paper]](https://arxiv.org/abs/2305.09781)                  |
+| Petals: Collaborative Inference and Fine-tuning of Large Models | [[paper]](https://aclanthology.org/2023.acl-demo.54)         |
+| PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU | [[paper]](http://arxiv.org/abs/2312.12456)                   |
+| DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving | [[paper]](http://arxiv.org/abs/2401.09670)                   |
+| LoongServe: Efficiently Serving Long-context Large Language Models with Elastic Sequence Parallelism | [[paper]](http://arxiv.org/abs/2404.09526)                   |
+| Vidur: A Large-Scale Simulation Framework For LLM Inference  | [[paper]](http://arxiv.org/abs/2405.05465)                   |
+| Lean Attention: Hardware-Aware Scalable Attention Mechanism for the Decode-Phase of Transformers | [[paper]](http://arxiv.org/abs/2405.10480)                   |
+| AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving | [[paper]](http://arxiv.org/abs/2302.11665)                   |
+| SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills | [[paper]](http://arxiv.org/abs/2308.16369)                   |
+| Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve | [[paper]](http://arxiv.org/abs/2403.02310)                   |
+| Llumnix: Dynamic Scheduling for Large Language Model Serving | [[paper]](http://arxiv.org/abs/2406.03243)                   |
+| Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving | [[paper]](http://arxiv.org/abs/2407.00079)                   |
+| InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management | [[paper]](http://arxiv.org/abs/2406.19707)                   |
+| ServerlessLLM: Low-Latency Serverless Inference for Large Language Models | [[paper]](https://www.usenix.org/conference/osdi24/presentation/fu) |
+| Is the GPU Half-Empty or Half-Full? Practical Scheduling Techniques for LLMs | [[paper]](http://arxiv.org/abs/2410.17840)                   |
+| NEO: Saving GPU Memory Crisis with CPU Offloading for Online LLM Inference | [[paper]](http://arxiv.org/abs/2411.01142)                   |
+| EcoServe: Maximizing Multi-Resource Utilization with SLO Guarantees in LLM Serving | [[paper]](http://arxiv.org/abs/2411.06364)                   |
+| Injecting Adrenaline into LLM Serving: Boosting Resource Utilization and Throughput via Attention Disaggregation | [[paper]](http://arxiv.org/abs/2503.20552)                   |
+| semi-PD: Towards Efficient LLM Serving via Phase-Wise Disaggregated Computation and Unified Storage | [[paper]](http://arxiv.org/abs/2504.19867)                   |
+
+> LLM Serving System은 대규모 언어 모델(LLM)을 **실제 서비스 환경에서 사용자 요청에 따라 빠르고 효율적으로 응답하도록 배포·운영하는 시스템을 의미**합니다. 
+
+### 다중 LoRA를 사용한 LLM 서빙(Serving LLMs with Multiple LoRAs)
+
+| 제목                                                        | 링크                                                         |
+| ------------------------------------------------------------ | ------------------------------------------------------------ |
+| PetS: A Unified Framework for Parameter-Efficient Transformers Serving | [[paper]](https://www.usenix.org/conference/atc22/presentation/zhou-zhe) |
+| Punica: Multi-Tenant LoRA Serving                            | [[paper]](http://arxiv.org/abs/2310.18547)                   |
+| S-LoRA: Serving Thousands of Concurrent LoRA Adapters        | [[paper]](http://arxiv.org/abs/2311.03285)                   |
+| dLoRA: Dynamically Orchestrating Requests and Adapters for LoRA LLM Serving | [[paper]](https://www.usenix.org/conference/osdi24/presentation/wu-bingyang) |
+
+> LoRA(Low-Rank Adaptation)는 대형 언어 모델(LLM)을 효율적으로 미세조정(fine-tuning)하기 위한 대표적인 최신 기법입니다.
+
+## 매개변수 효율적 미세 조정 (Parameter-Efficient Fine-Tuning, PEFT)
+
+| 제목                                                        | 링크                                                         |
+| ------------------------------------------------------------ | ------------------------------------------------------------ |
+| Delta Tuning: A Comprehensive Study of Parameter Efficient Methods for Pre-trained Language Models | [[paper]](http://arxiv.org/abs/2203.06904)                   |
+| Parameter-Efficient Transfer Learning for NLP                | [[paper]](https://proceedings.mlr.press/v97/houlsby19a.html) |
+| Prefix-Tuning: Optimizing Continuous Prompts for Generation  | [[paper]](http://arxiv.org/abs/2101.00190)                   |
+| LoRA: Low-Rank Adaptation of Large Language Models           | [[paper]](http://arxiv.org/abs/2106.09685)                   |
+| Towards a Unified View of Parameter-Efficient Transfer Learning | [[paper]](http://arxiv.org/abs/2110.04366)                   |
+| Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning | [[paper]](http://arxiv.org/abs/2303.10512)                   |
+| When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method | [[paper]](http://arxiv.org/abs/2402.17193)                   |
+| Unleashing the Power of Task-Specific Directions in Parameter Efficient Fine-tuning | [[paper]](http://arxiv.org/abs/2409.01035)                   |
+| DoRA: Weight-Decomposed Low-Rank Adaptation                  | [[paper]](http://arxiv.org/abs/2402.09353)                   |
+| GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection | [[paper]](http://arxiv.org/abs/2403.03507)                   |
+
+> Parameter-Efficient Fine-Tuning(PEFT)은 기존의 전체 파라미터를 모두 업데이트하는 방식(Full Fine-Tuning)과 달리 모델의 일부 파라미터만 선택적으로 학습하거나 소규모의 추가 파라미터(예: 어댑터, LoRA 등)를 도입해 학습하는 것이 핵심입니다.
+
+## 압축 (양자화, 희소성) : Compression (Quantization, Sparsity)
+
+| 제목                                                        | 링크                                        |
+| ------------------------------------------------------------ | ------------------------------------------- |
+| LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale | [[paper]](http://arxiv.org/abs/2208.07339)  |
+| SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models | [[paper]](http://arxiv.org/abs/2211.10438)  |
+| GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers | [[paper]](https://arxiv.org/abs/2210.17323) |
+| AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration | [[paper]](http://arxiv.org/abs/2306.00978)  |
+| QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving | [[paper]](http://arxiv.org/abs/2405.04532)  |
+| Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time | [[paper]](https://arxiv.org/abs/2310.17157) |
+| Atom: Low-bit Quantization for Efficient and Accurate LLM Serving | [[paper]](http://arxiv.org/abs/2310.19102)  |
+| QLoRA: Efficient Finetuning of Quantized LLMs                | [[paper]](http://arxiv.org/abs/2305.14314)  |
+| QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models | [[paper]](http://arxiv.org/abs/2309.14717)  |
+
+> 양자화란 모델의 파라미터를 더 적은 비트 수로 표현하여 메모리 사용량을 줄이고, 연산 속도를 높이는 기법입니다.
+> 희소성이란 모델 내부의 뉴런 활성화나 가중치 등에서 “대부분의 값이 0이거나 거의 사용되지 않고 소수만 의미 있게 작동하는” 특성을 의미합니다.
+
+## 전문가 혼합 (Mixture of Experts) 관련 최적화
+
+MoE 추론 최적화에 대해서는 이 [링크](./moe_related_ko.md)를 참조하세요.
+
+> Mixture of Experts란 한 모델이 모든 입력을 처리하는 대신, 여러 "전문가(Expert)" 네트워크 중 일부만 선택적으로 활성화해서 입력을 처리하는 방식입니다.
+
+## LLM을 위한 강화 학습, 시스템 최적화(Reinforced Learning for LLMs, System Optimization)
+
+인간 피드백으로부터의 강화 학습 (RLHF) & 검증 가능한 보상을 통한 강화 학습 (RLVR)
+
+| 제목                                                        | 링크                                        |
+| ------------------------------------------------------------ | ------------------------------------------- |
+| OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework | [[paper]](https://arxiv.org/abs/2405.11143) |
+| HybridFlow: A Flexible and Efficient RLHF Framework          | [[paper]](https://arxiv.org/abs/2409.19256) |
+| StreamRL: Scalable, Heterogeneous, and Elastic RL for LLMs with Disaggregated Stream Generation | [[paper]](http://arxiv.org/abs/2504.15930)  |
+
+> RLHF(인간 피드백으로부터의 강화 학습)는 모델이 인간의 피드백을 통해 학습하도록 하는 방법입니다.
+> RLVR(검증 가능한 보상을 통한 강화 학습)는 모델이 외부 보상 신호를 통해 학습하도록 하는 방법입니다.
\ No newline at end of file