PasaLab · hanaoverride · Jun 27, 2025
diff --git a/readme.md b/readme.md
@@ -1,5 +1,11 @@
 # LLM Papers We Recommend to Read
 
+## 🌐 Language / 언어
+- **English** (Current)
+- **[한국어 (Korean)](./translation/ko/readme_ko.md)**
+
+---
+
 The past several years has marked the steady rise of large language models (LLMs), largely driven by advancements in computational power, data availability, and algorithmic innovation. LLMs have profoundly shaped the research landscape, introducing new methodologies and paradigms that challenge traditional approaches.
 
 We have also expanded our research interests to the field of LLMs. Here are some research papers related to LLMs. We highly recommend beginners to read and thoroughly understand these papers.

diff --git a/translation/ko/moe_related_ko.md b/translation/ko/moe_related_ko.md
@@ -0,0 +1,38 @@
+## MoE 추론 최적화
+
+| 제목                                                        | 링크                                                         |
+| ------------------------------------------------------------ | ------------------------------------------------------------ |
+| Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference | [[paper]](http://arxiv.org/abs/2308.12066)                   |
+| Fast Inference of Mixture-of-Experts Language Models with Offloading | [[paper]](http://arxiv.org/abs/2312.17238)                   |
+| MoE-Infinity: Activation-Aware Expert Offloading for Efficient MoE Serving | [[paper]](http://arxiv.org/abs/2401.14361)                   |
+| Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models | [[paper]](http://arxiv.org/abs/2402.07033)                   |
+| Exploiting Inter-Layer Expert Affinity for Accelerating Mixture-of-Experts Model Inference | [[paper]](http://arxiv.org/abs/2401.08383)                   |
+| SiDA: Sparsity-Inspired Data-Aware Serving for Efficient and Scalable Large Mixture-of-Experts Models | [[paper]](http://arxiv.org/abs/2310.18859)                   |
+| SwapMoE: Efficient Memory-Constrained Serving of Large Sparse MoE Models via Dynamic Expert Pruning and Swapping | [[paper]](http://arxiv.org/abs/2308.15030)                   |
+| Accelerating Distributed MoE Training and Inference with Lina | [[paper]](https://www.usenix.org/conference/atc23/presentation/li-jiamin) |
+| Towards MoE Deployment: Mitigating Inefficiencies in Mixture-of-Expert (MoE) Inference | [[paper]](http://arxiv.org/abs/2303.06182)                   |
+| EdgeMoE: Fast On-Device Inference of MoE-based Large Language Models | [[paper]](http://arxiv.org/abs/2308.14352)                   |
+| AdapMoE: Adaptive Sensitivity-based Expert Gating and Management for Efficient MoE Inference | [[paper]](http://arxiv.org/abs/2408.10284)                   |
+| ExpertFlow: Optimized Expert Activation and Token Allocation for Efficient Mixture-of-Experts Inference | [[paper]](http://arxiv.org/abs/2410.17954)                   |
+| ProMoE: Fast MoE-based LLM Serving using Proactive Caching   | [[paper]](http://arxiv.org/abs/2410.22134)                   |
+| HOBBIT: A Mixed Precision Expert Offloading System for Fast MoE Inference | [[paper]](http://arxiv.org/abs/2411.01433)                   |
+| Toward Efficient Inference for Mixture of Experts            | [[paper]](https://proceedings.neurips.cc/paper_files/paper/2024/hash/98bf3b8505c611ac21055dd9d355c66e-Abstract-Conference.html) |
+| A Survey on Inference Optimization Techniques for Mixture of Experts Models | [[paper]](http://arxiv.org/abs/2412.14219)                   |
+| MoESys: A Distributed and Efficient Mixture-of-Experts Training and Inference System for Internet Services | [[paper]](http://arxiv.org/abs/2205.10034)                   |
+| EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference | [[paper]](http://arxiv.org/abs/2410.12247)                   |
+| fMoE: Fine-Grained Expert Offloading for Large Mixture-of-Experts Serving | [[paper]](http://arxiv.org/abs/2502.05370)                   |
+| MoETuner: Optimized Mixture of Expert Serving with Balanced Expert Placement and Token Routing | [[paper]](http://arxiv.org/abs/2502.06643)                   |
+| Klotski: Efficient Mixture-of-Expert Inference via Expert-Aware Multi-Batch Pipeline | [[paper]](http://arxiv.org/abs/2502.06888)                   |
+| Optimizing Distributed Deployment of Mixture-of-Experts Model Inference in Serverless Computing | [[paper]](http://arxiv.org/abs/2501.05313)                   |
+| DAOP: Data-Aware Offloading and Predictive Pre-Calculation for Efficient MoE Inference | [[paper]](http://arxiv.org/abs/2501.10375)                   |
+| Comet: Fine-grained Computation-communication Overlapping for Mixture-of-Experts | [[paper]](http://arxiv.org/abs/2502.19811)                   |
+| Harnessing Inter-GPU Shared Memory for Seamless MoE Communication-Computation Fusion | [[paper]](https://dl.acm.org/doi/10.1145/3710848.3710868)    |
+| CoServe: Efficient Collaboration-of-Experts (CoE) Model Inference with Limited Memory | [[paper]](http://arxiv.org/abs/2503.02354)                   |
+| eMoE: Task-aware Memory Efficient Mixture-of-Experts-Based (MoE) Model Inference | [[paper]](http://arxiv.org/abs/2503.06823)                   |
+| Accelerating MoE Model Inference with Expert Sharding        | [[paper]](http://arxiv.org/abs/2503.08467)                   |
+| Samoyeds: Accelerating MoE Models with Structured Sparsity Leveraging Sparse Tensor Cores | [[paper]](http://arxiv.org/abs/2503.10725)                   |
+| MoE-Gen: High-Throughput MoE Inference on a Single GPU with Module-Based Batching | [[paper]](http://arxiv.org/abs/2503.09716)                   |
+| MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism | [[paper]](https://arxiv.org/abs/2504.02263)                  |
+| D$^2$MoE: Dual Routing and Dynamic Scheduling for Efficient On-Device MoE-based LLM Serving | [[paper]](http://arxiv.org/abs/2504.15299)                   |
+| Faster MoE LLM Inference for Extremely Large Models          | [[paper]](http://arxiv.org/abs/2505.03531)                   |
+| Toward Cost-Efficient Serving of Mixture-of-Experts with Asynchrony | [[paper]](http://arxiv.org/abs/2505.08944)                   |
diff --git a/translation/ko/paper.md b/translation/ko/paper.md
@@ -0,0 +1,2 @@
+# paper
+