Skip to content
View Nateiru's full-sized avatar
🎋
Focusing
🎋
Focusing

Highlights

  • Pro

Block or report Nateiru

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

Python 8,594 1,436 Updated Mar 26, 2025

A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology

C++ 1,012 154 Updated Mar 26, 2025

FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/

C++ 1,280 553 Updated Mar 26, 2025

Material for gpu-mode lectures

Jupyter Notebook 4,126 414 Updated Feb 9, 2025

LLM inference in C/C++

C++ 77,233 11,213 Updated Mar 26, 2025

The Art of Debugging

C 862 39 Updated Aug 3, 2024

📰 Must-read papers on KV Cache Compression (constantly updating 🤗).

356 8 Updated Mar 25, 2025

Magnificent app which corrects your previous console command.

Python 91,142 3,657 Updated Jul 19, 2024

🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!

Python 17,154 1,909 Updated Feb 23, 2025

A faster int-to-int hashmap implemented in C++.

C++ 41 7 Updated Jan 6, 2025

Concurrency primitives, safe memory reclamation mechanisms and non-blocking (including lock-free) data structures designed to aid in the research, design and implementation of high performance conc…

C 2,465 323 Updated Mar 7, 2025

LLM training in simple, raw C/CUDA

Cuda 26,149 3,004 Updated Oct 2, 2024

Systems for GenAI

123 8 Updated Mar 8, 2025

Development repository for the Triton language and compiler

MLIR 14,992 1,887 Updated Mar 26, 2025

Boosting 4-bit inference kernels with 2:4 Sparsity

Cuda 71 5 Updated Sep 4, 2024

A basic deep learning library, comparable to a very minimal version of PyTorch.

Python 13 2 Updated Mar 1, 2023

µcoro

C++ 131 17 Updated Jan 16, 2025

QRec: A Python Framework for quick implementation of recommender systems (TensorFlow Based)

Python 1,606 406 Updated Dec 26, 2023

练习下用pytorch来复现下经典的推荐系统模型, 如MF, FM, DeepConn, MMOE, PLE, DeepFM, NFM, DCN, AFM, AutoInt, ONN, FiBiNET, DCN-v2, AFN, DCAP等

Python 595 125 Updated Mar 14, 2022

Notes about courses Dive into Deep Learning by Mu Li

Jupyter Notebook 3,514 560 Updated Apr 11, 2023

A model compilation solution for various hardware

MLIR 415 45 Updated Mar 16, 2025

row-major matmul optimization

C++ 613 86 Updated Sep 9, 2023

how to optimize some algorithm in cuda.

Cuda 2,038 182 Updated Mar 26, 2025

C++ library for executors

C++ 500 76 Updated Sep 21, 2016

AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术

Jupyter Notebook 13,017 1,871 Updated Mar 26, 2025

关于Transformer模型的最简洁pytorch实现,包含详细注释

Jupyter Notebook 184 23 Updated Nov 13, 2023

🧡 Follow everything in one place

TypeScript 24,358 1,031 Updated Mar 26, 2025

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Python 5,930 515 Updated Mar 26, 2025

Pytorch domain library for recommendation systems

Python 2,070 488 Updated Mar 26, 2025
Next
Showing results