Skip to content

feat: Add Korean translation for README with bilingual navigation #2

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions readme.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
# LLM Papers We Recommend to Read

## 🌐 Language / μ–Έμ–΄
- **English** (Current)
- **[ν•œκ΅­μ–΄ (Korean)](./translation/ko/readme_ko.md)**

---

The past several years has marked the steady rise of large language models (LLMs), largely driven by advancements in computational power, data availability, and algorithmic innovation. LLMs have profoundly shaped the research landscape, introducing new methodologies and paradigms that challenge traditional approaches.

We have also expanded our research interests to the field of LLMs. Here are some research papers related to LLMs. We highly recommend beginners to read and thoroughly understand these papers.
Expand Down
38 changes: 38 additions & 0 deletions translation/ko/moe_related_ko.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
## MoE μΆ”λ‘  μ΅œμ ν™”

| 제λͺ© | 링크 |
| ------------------------------------------------------------ | ------------------------------------------------------------ |
| Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference | [[paper]](http://arxiv.org/abs/2308.12066) |
| Fast Inference of Mixture-of-Experts Language Models with Offloading | [[paper]](http://arxiv.org/abs/2312.17238) |
| MoE-Infinity: Activation-Aware Expert Offloading for Efficient MoE Serving | [[paper]](http://arxiv.org/abs/2401.14361) |
| Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models | [[paper]](http://arxiv.org/abs/2402.07033) |
| Exploiting Inter-Layer Expert Affinity for Accelerating Mixture-of-Experts Model Inference | [[paper]](http://arxiv.org/abs/2401.08383) |
| SiDA: Sparsity-Inspired Data-Aware Serving for Efficient and Scalable Large Mixture-of-Experts Models | [[paper]](http://arxiv.org/abs/2310.18859) |
| SwapMoE: Efficient Memory-Constrained Serving of Large Sparse MoE Models via Dynamic Expert Pruning and Swapping | [[paper]](http://arxiv.org/abs/2308.15030) |
| Accelerating Distributed MoE Training and Inference with Lina | [[paper]](https://www.usenix.org/conference/atc23/presentation/li-jiamin) |
| Towards MoE Deployment: Mitigating Inefficiencies in Mixture-of-Expert (MoE) Inference | [[paper]](http://arxiv.org/abs/2303.06182) |
| EdgeMoE: Fast On-Device Inference of MoE-based Large Language Models | [[paper]](http://arxiv.org/abs/2308.14352) |
| AdapMoE: Adaptive Sensitivity-based Expert Gating and Management for Efficient MoE Inference | [[paper]](http://arxiv.org/abs/2408.10284) |
| ExpertFlow: Optimized Expert Activation and Token Allocation for Efficient Mixture-of-Experts Inference | [[paper]](http://arxiv.org/abs/2410.17954) |
| ProMoE: Fast MoE-based LLM Serving using Proactive Caching | [[paper]](http://arxiv.org/abs/2410.22134) |
| HOBBIT: A Mixed Precision Expert Offloading System for Fast MoE Inference | [[paper]](http://arxiv.org/abs/2411.01433) |
| Toward Efficient Inference for Mixture of Experts | [[paper]](https://proceedings.neurips.cc/paper_files/paper/2024/hash/98bf3b8505c611ac21055dd9d355c66e-Abstract-Conference.html) |
| A Survey on Inference Optimization Techniques for Mixture of Experts Models | [[paper]](http://arxiv.org/abs/2412.14219) |
| MoESys: A Distributed and Efficient Mixture-of-Experts Training and Inference System for Internet Services | [[paper]](http://arxiv.org/abs/2205.10034) |
| EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference | [[paper]](http://arxiv.org/abs/2410.12247) |
| fMoE: Fine-Grained Expert Offloading for Large Mixture-of-Experts Serving | [[paper]](http://arxiv.org/abs/2502.05370) |
| MoETuner: Optimized Mixture of Expert Serving with Balanced Expert Placement and Token Routing | [[paper]](http://arxiv.org/abs/2502.06643) |
| Klotski: Efficient Mixture-of-Expert Inference via Expert-Aware Multi-Batch Pipeline | [[paper]](http://arxiv.org/abs/2502.06888) |
| Optimizing Distributed Deployment of Mixture-of-Experts Model Inference in Serverless Computing | [[paper]](http://arxiv.org/abs/2501.05313) |
| DAOP: Data-Aware Offloading and Predictive Pre-Calculation for Efficient MoE Inference | [[paper]](http://arxiv.org/abs/2501.10375) |
| Comet: Fine-grained Computation-communication Overlapping for Mixture-of-Experts | [[paper]](http://arxiv.org/abs/2502.19811) |
| Harnessing Inter-GPU Shared Memory for Seamless MoE Communication-Computation Fusion | [[paper]](https://dl.acm.org/doi/10.1145/3710848.3710868) |
| CoServe: Efficient Collaboration-of-Experts (CoE) Model Inference with Limited Memory | [[paper]](http://arxiv.org/abs/2503.02354) |
| eMoE: Task-aware Memory Efficient Mixture-of-Experts-Based (MoE) Model Inference | [[paper]](http://arxiv.org/abs/2503.06823) |
| Accelerating MoE Model Inference with Expert Sharding | [[paper]](http://arxiv.org/abs/2503.08467) |
| Samoyeds: Accelerating MoE Models with Structured Sparsity Leveraging Sparse Tensor Cores | [[paper]](http://arxiv.org/abs/2503.10725) |
| MoE-Gen: High-Throughput MoE Inference on a Single GPU with Module-Based Batching | [[paper]](http://arxiv.org/abs/2503.09716) |
| MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism | [[paper]](https://arxiv.org/abs/2504.02263) |
| D$^2$MoE: Dual Routing and Dynamic Scheduling for Efficient On-Device MoE-based LLM Serving | [[paper]](http://arxiv.org/abs/2504.15299) |
| Faster MoE LLM Inference for Extremely Large Models | [[paper]](http://arxiv.org/abs/2505.03531) |
| Toward Cost-Efficient Serving of Mixture-of-Experts with Asynchrony | [[paper]](http://arxiv.org/abs/2505.08944) |
2 changes: 2 additions & 0 deletions translation/ko/paper.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# paper

Loading