Skip to content

Commit

Permalink
[Doc] Reorganize documentation and update contents. (mlc-ai#256)
Browse files Browse the repository at this point in the history
  • Loading branch information
yzh119 authored May 29, 2023
1 parent 2b0bb21 commit ddb14d4
Show file tree
Hide file tree
Showing 10 changed files with 161 additions and 117 deletions.
2 changes: 0 additions & 2 deletions docs/contribute/community.rst
Original file line number Diff line number Diff line change
Expand Up @@ -53,8 +53,6 @@ on github directly. Once your update is complete, you can click the ``contribute
Contribute New Models to MLC-LLM
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Woohoo! Congratulations

* If you compiled a model for existing model architecture by following our :doc:`../tutorials/compile-models` tutorial.
Please upload your model stored in ``dist`` directory to Hugging Face (an example can be
found `here <https://huggingface.co/mlc-ai/mlc-chat-vicuna-v1-7b-q4f32_0/tree/main>`__), and please create a
Expand Down
81 changes: 75 additions & 6 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,23 +14,92 @@ Everything runs locally with no server support and accelerated with local GPUs o

`Join our Discord Server! <https://discord.gg/9Xpy2HGBuD>`_

.. _navigation:

We prepared a navigation page for you to get started with MLC-LLM.
Navigation
----------

.. toctree::
:maxdepth: 1
:caption: Navigation
Before you start, please select your use case so that we can narrow down the search scope.

.. tabs ::
.. tab :: I want to run models on my device.
Please select your platform:
.. tabs ::
.. tab :: Android
Please check our instructions on `MLC-LLM Android app <https://mlc.ai/mlc-llm/#android>`__.
.. tab :: iOS
Please check our instructions on `MLC-LLM IOS app <https://mlc.ai/mlc-llm/#iphone>`__
.. tab :: WebGPU
Please check `Web-LLM project <https://mlc.ai/web-llm/>`__.
.. tab :: PC
MLC-LLM already provided a set of prebuilt models which you can deploy directly, check the :doc:`model-zoo` for the list of supported models.
.. tabs ::
.. tab :: Use prebuilt models.
Please check :doc:`tutorials/deploy-models` for instructions on preparing models and deploying models with MLC-LLM CLI.
navigation.rst
* If you are a Mac OS user, and you will run a model compiled with ``metal`` backend, congratulations! You don't need to install any external drivers/packages but ``metal`` is natively supported by Mac OS.
* If you will run a model compiled with ``CUDA`` backend, please install CUDA accordingly to our :ref:`CUDA installation guide <software-dependencies-cuda>`.
* If you will run a model compiled with ``Vulkan`` backend, please install Vulkan Driver accordingly to our :ref:`Vulkan Driver Installation Guide <software-dependencies-vulkan-driver>`.
.. tab :: Build your own models.
There are two different cases where you need to build your own models:
* Use your own moden weights, or use different quantization data type/running data type.
* Please check :doc:`tutorials/compile-models` for details on how to prepare build models for existing architectures.
* Use a brand new model architecture which is not supported by MLC-LLM yet.
* Please check :doc:`tutorials/bring-your-own-models` for details on how to add new model architectures to the MLC-LLM family.
In either cases, you are ecouraged to contribute to the MLC-LLM, see :ref:`contribute-new-models` on guidelines for contributing new models.
.. tab :: I need to customize MLC-LLM.
There are lots of interesting ways to further improve and customize MLC-LLM.
* The performance of MLC-LLM can be improved in a lot of ways, including (but not limited to) fusing multi-head attention with FlashAttention algorithm, or using more advanced quantization algorithms.
* We can also add new backends/language binding with the existing infrastructure.
Before you start, please check our :doc:`tutorials/customize` to see how can you customize MLC-LLM for your own purpose.
You are encouraged to contribute to the MLC-LLM if your found your customization intersting.
.. tabs ::
.. tab :: I need to customize TVM-Unity
In this case, user need to change TVM-Unity codebase. Please check :ref:`tvm-unity-build-from-source` on how to install TVM-Unity from source.
* If user want to compile models with ``CUDA`` backend, please install CUDA according to our :ref:`CUDA installation guide <software-dependencies-cuda>`.
* If user want to compile models with ``Vulkan`` backend, please install Vulkan-SDK according to our :ref:`Vulkan SDK Installation Guide <software-dependencies-vulkan-sdk>`.
* If user want to compile models with ``OpenCL`` backend, please install OpenCL-SDK according to our :ref:`OpenCL SDK Installation Guide <software-dependencies-opencl-sdk>`.
.. tab :: Use original TVM-Unity
In this case, please install prebuilt TVM-Unity package according to our :ref:`Installation Guidelines <tvm-unity-install-prebuilt-package>`.
.. toctree::
:maxdepth: 1
:caption: Tutorials

install/index.rst
install/software-dependencies.rst
tutorials/compile-models.rst
tutorials/deploy-models.rst
tutorials/compile-models.rst
tutorials/bring-your-own-models.rst
tutorials/customize.rst

Expand Down
3 changes: 2 additions & 1 deletion docs/install/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ Install and Setup
.. contents:: Table of Contents
:depth: 3


.. _tvm-unity-install:

Install TVM Unity
Expand Down Expand Up @@ -167,7 +168,7 @@ Therefore, it is highly recommended to validate TVM Unity installation before us
USE_ROCM: OFF
.. note::
``GIT_COMMIT_HASH`` indicates the exact commit of the TVM build, and it can be found on GitHub via `<https://github.com/mlc-ai/relax/commit/$GIT_COMMIT_HASH>`_.
``GIT_COMMIT_HASH`` indicates the exact commit of the TVM build, and it can be found on GitHub via ``<https://github.com/mlc-ai/relax/commit/$GIT_COMMIT_HASH>``.

**Step 4. Check device detection.** Sometimes it could be helpful to understand if TVM could detect your device at all with the following commands:

Expand Down
2 changes: 1 addition & 1 deletion docs/install/software-dependencies.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ Software Dependencies
:depth: 2
:local:

While we have included most of the dependencies in our pre-built wheels/scripts, there are still some platform-dependent packages that you will need to install on your own. In most cases, you won't need all the packages listed on this page. If you're unsure about which packages are required for your specific use case, please refer to the :doc:`navigation page <../navigation>` first.
While we have included most of the dependencies in our pre-built wheels/scripts, there are still some platform-dependent packages that you will need to install on your own. In most cases, you won't need all the packages listed on this page. If you're unsure about which packages are required for your specific use case, please check the :ref:`navigation panel <navigation>` first.

.. _software-dependencies-cuda:

Expand Down
6 changes: 4 additions & 2 deletions docs/misc/faq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@ Frequently Asked Questions (FAQ)

This is a list of Frequently Asked Questions (FAQ) about the MLC-LLM. Feel free to suggest new entries!

How can I customize the temperature, repetition penalty of models?
... How can I customize the temperature, repetition penalty of models?
There is a ``mlc-chat-config.json`` file under your model directory, where you can modify the parameters.

* There is a ``mlc-chat-config.json`` file under your model directory, where you can modity the parameters.
... What's the quantization algorithm MLC-LLM using?
MLC-LLM does not impose any restrictions on the choice of the quantization algorithm, and the design should be compatible with all quantization algorithms. By default, we utilize the grouping quantization method discussed in the paper `The case for 4-bit precision: k-bit Inference Scaling Laws <https://arxiv.org/abs/2212.09720>`__.
67 changes: 49 additions & 18 deletions docs/model-zoo.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,16 @@ Model Zoo

MLC-LLM is a universal solution for deploying different language models, any language models that can be described in `TVM Relax <https://mlc.ai/chapter_graph_optimization/index.html>`__ (which is a general representation for Neural Networks, and can be imported from models written in PyTorch) should recognized by MLC-LLM and thus deployed to different backends with the help of TVM Unity.

The community has already supported several LLM architectures, including LLaMA, GPT-NeoX, and MOSS. We eagerly anticipate further contributions from the community to expand the range of supported model architectures, with the goal of democratizing the deployment of LLMs.
The community has already supported several LLM architectures, including LLaMA, GPT-NeoX, and GPT-J. We eagerly anticipate further contributions from the community to expand the range of supported model architectures, with the goal of democratizing the deployment of LLMs.

List of Officially Supported Models
-----------------------------------
.. _off-the-shelf-models:

Below is a list of models officially supported by MLC-LLM community. Pre-built models are hosted on Hugging Face, eliminating the need for users to compile them on their own. Each model is accompanied by detailed configurations. These models have undergone extensive testing on various devices, and their kernel performance has been fine-tuned by developers with the help of TVM.
Off-the-Shelf Models
--------------------

Below is a list of off-the-shelf prebuilt models compiled by MLC-LLM community. These prebuilt models are hosted on Hugging Face, eliminating the need for users to compile them on their own. Each model is accompanied by detailed configurations. These models have undergone extensive testing on various devices, and their kernel performance has been fine-tuned by developers with the help of TVM.

.. list-table:: Supported Models
.. list-table:: Off-the-Shelf Models
:widths: 15 15 15 15
:header-rows: 1

Expand All @@ -24,36 +26,65 @@ Below is a list of models officially supported by MLC-LLM community. Pre-built m
* - `vicuna-v1-7b-q4f32_0`
- `Vicuna <https://lmsys.org/blog/2023-03-30-vicuna/>`__
- * Weight storage data type: int4
* running data type: float32
* symmetric quantization
* Running data type: float32
* Symmetric quantization
- `link <https://huggingface.co/mlc-ai/mlc-chat-vicuna-v1-7b-q4f32_0>`__
* - `vicuna-v1-7b-q4f16_0`
- `Vicuna <https://lmsys.org/blog/2023-03-30-vicuna/>`__
- * Weight storage data type: int4
* running data type: float16
* symmetric quantization
* Running data type: float16
* Symmetric quantization
- `link <https://huggingface.co/mlc-ai/mlc-chat-vicuna-v1-7b-q4f16_0>`__
* - `RedPajama-INCITE-Chat-3B-v1-q4f32_0`
- `RedPajama <https://www.together.xyz/blog/redpajama>`__
- * Weight storage data type: int4
* running data type: float32
* symmetric quantization
* Running data type: float32
* Symmetric quantization
- `link <https://huggingface.co/mlc-ai/mlc-chat-RedPajama-INCITE-Chat-3B-v1-q4f32_0>`__
* - `RedPajama-INCITE-Chat-3B-v1-q4f16_0`
- `RedPajama <https://www.together.xyz/blog/redpajama>`__
- * Weight storage data type: int4
* running data type: float16
* symmetric quantization
* Running data type: float16
* Symmetric quantization
- `link <https://huggingface.co/mlc-ai/mlc-chat-RedPajama-INCITE-Chat-3B-v1-q4f16_0>`__

You can see `MLC-LLM pull requests <https://github.com/mlc-ai/mlc-llm/pulls?q=is%3Aopen+is%3Apr+label%3Anew-models>`__ to check the ongoing efforts of new models.
You can check `MLC-LLM pull requests <https://github.com/mlc-ai/mlc-llm/pulls?q=is%3Aopen+is%3Apr+label%3Anew-models>`__ to track the ongoing efforts of new models. We encourage users to upload their compiled models to Hugging Face and share them with the community.

Try different models
--------------------
.. _supported-model-architectures:

Supported Model Architectures
-----------------------------

MLC-LLM supports the following model architectures:

.. list-table:: Supported Model Architectures
:widths: 15 15 15 15
:header-rows: 1

* - Category Code
- Series
- Model Definition
- Variants
* - ``llama``
- `LLaMa <https://github.com/facebookresearch/llama>`__
- `Relax Code <https://github.com/mlc-ai/mlc-llm/blob/main/mlc_llm/relax_model/llama.py>`__
- * `Alpaca <https://github.com/tatsu-lab/stanford_alpaca>`__
* `Vicuna <https://lmsys.org/blog/2023-03-30-vicuna/>`__
* `Guanaco <https://github.com/artidoro/qlora>`__
* - ``gpt-neox``
- `GPT-NeoX <https://github.com/EleutherAI/gpt-neox>`__
- `Relax Code <https://github.com/mlc-ai/mlc-llm/blob/main/mlc_llm/relax_model/gpt_neox.py>`__
- * `RedPajama <https://www.together.xyz/blog/redpajama>`__
* `Dolly <https://github.com/databrickslabs/dolly>`__
* `Pythia <https://huggingface.co/EleutherAI/pythia-1.4b>`__
* - ``gptj``
- `GPT-J <https://github.com/kingoflolz/mesh-transformer-jax>`__
- `Relax Code <https://github.com/mlc-ai/mlc-llm/blob/main/mlc_llm/relax_model/gptj.py>`__
- * `MOSS <https://github.com/OpenLMLab/MOSS>`__

Please check :doc:`/tutorials/compile-models` on how to compile models with supported model architectures, and :doc:`/tutorials/bring-your-own-models` on how to bring a new LLM model architecture.
For models within these model architectures, you can check the :doc:`/tutorials/compile-models` on how to compile models. Please create a new issue if you want to request a new model architecture. Our tutorial :doc:`/tutorials/bring-your-own-models` introduces how to bring a new model architecture to MLC-LLM.

Contribute to MLC-LLM Model Zoo
-------------------------------

Awesome! Please check our :doc:`/contribute/community` on how to contribute to MLC-LLM.
Ready to contribute your compiled models/new model architectures? Awesome! Please check our :doc:`/contribute/community` on how to contribute to MLC-LLM.
75 changes: 0 additions & 75 deletions docs/navigation.rst

This file was deleted.

Loading

0 comments on commit ddb14d4

Please sign in to comment.