Highlights
- Pro
Lists (1)
Sort Name ascending (A-Z)
Stars
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology
FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/
📰 Must-read papers on KV Cache Compression (constantly updating 🤗).
Magnificent app which corrects your previous console command.
🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!
A faster int-to-int hashmap implemented in C++.
Concurrency primitives, safe memory reclamation mechanisms and non-blocking (including lock-free) data structures designed to aid in the research, design and implementation of high performance conc…
Development repository for the Triton language and compiler
Boosting 4-bit inference kernels with 2:4 Sparsity
A basic deep learning library, comparable to a very minimal version of PyTorch.
QRec: A Python Framework for quick implementation of recommender systems (TensorFlow Based)
练习下用pytorch来复现下经典的推荐系统模型, 如MF, FM, DeepConn, MMOE, PLE, DeepFM, NFM, DCN, AFM, AutoInt, ONN, FiBiNET, DCN-v2, AFN, DCAP等
Notes about courses Dive into Deep Learning by Mu Li
A model compilation solution for various hardware
how to optimize some algorithm in cuda.
AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术
关于Transformer模型的最简洁pytorch实现,包含详细注释
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Pytorch domain library for recommendation systems