
Starred repositories
Supporting PyTorch models with the Google AI Edge TFLite runtime.
LiteRT is the new name for TensorFlow Lite (TFLite). While the name is new, it's still the same trusted, high-performance runtime for on-device AI, now with an expanded vision.
Representation and Reference Lowering of ONNX Models in MLIR Compiler Infrastructure
🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!
ONNX Script enables developers to naturally author ONNX functions and models using a subset of Python.
A minimal GPU design in Verilog to learn how GPUs work from the ground up
ModernBERT model optimized for Apple Neural Engine.
General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O
A playbook for systematically maximizing the performance of deep learning models.
Modern C++ Programming Course (C++03/11/14/17/20/23/26)
Making the community's best AI chat models available to everyone.
[ICLR 2025] VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation
Fast and accurate automatic speech recognition (ASR) for edge devices
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer
Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.
The official implementation of paper: SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction.
HART: Efficient Visual Generation with Hybrid Autoregressive Transformer
llama3.np is a pure NumPy implementation for Llama 3 model.
llama3.cuda is a pure C/CUDA implementation for Llama 3 model.