Stars
Get up and running with Llama 3.3, DeepSeek-R1, Phi-4, Gemma 3, and other large language models.
SGLang is a fast serving framework for large language models and vision language models.
BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.
Intel® Extension for TensorFlow*
A personal experimental C++ Syntax 2 -> Syntax 1 compiler
A retargetable MLIR-based machine learning compiler and runtime toolkit.
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
Enabling PyTorch on XLA Devices (e.g. Google TPU)
Development repository for the Triton language and compiler
A list of awesome compiler projects and papers for tensor computation and deep learning.
A machine learning compiler for GPUs, CPUs, and ML accelerators
An implementation of a deep learning recommendation model (DLRM)
Zerocopy makes zero-cost memory manipulation effortless. We write `unsafe` so you don’t have to.
A Zig language server supporting Zig developers with features like autocomplete and goto definition
The build system and package manager for MoonBit
Repository hosting code for "Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations" (https://arxiv.org/abs/2402.17152).
RobustMQ is a next-generation, high-performance, cloud-native, converged message queue that is compatible with multiple mainstream message queuing protocols and has complete Serveless capabilities.
🐉 Making Rust a first-class language and ecosystem for GPU shaders 🚧
HierarchicalKV is a part of NVIDIA Merlin and provides hierarchical key-value storage to meet RecSys requirements. The key capability of HierarchicalKV is to store key-value feature-embeddings on h…
AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术
Efficent platform for inference and serving local LLMs including an OpenAI compatible API server.
HugeCTR is a high efficiency GPU framework designed for Click-Through-Rate (CTR) estimating training
Tool for safe ergonomic Rust/C++ interop driven from existing C++ headers
Write safer FFI code in Rust without polluting it with unsafe code