From ddb14d4743ec4ab32e2a8d4fe898c4630aa4f9c5 Mon Sep 17 00:00:00 2001
From: Zihao Ye <zihaoye.cs@gmail.com>
Date: Mon, 29 May 2023 06:16:10 -0700
Subject: [PATCH] [Doc] Reorganize documentation and update contents. (#256)

---
 docs/contribute/community.rst          |  2 -
 docs/index.rst                         | 81 ++++++++++++++++++++++++--
 docs/install/index.rst                 |  3 +-
 docs/install/software-dependencies.rst |  2 +-
 docs/misc/faq.rst                      |  6 +-
 docs/model-zoo.rst                     | 67 +++++++++++++++------
 docs/navigation.rst                    | 75 ------------------------
 docs/tutorials/compile-models.rst      | 12 ++--
 docs/tutorials/deploy-models.rst       | 29 ++++++---
 scripts/prep_deps.sh                   |  1 -
 10 files changed, 161 insertions(+), 117 deletions(-)
 delete mode 100644 docs/navigation.rst

diff --git a/docs/contribute/community.rst b/docs/contribute/community.rst
index f1568de81f..1e92a01050 100644
--- a/docs/contribute/community.rst
+++ b/docs/contribute/community.rst
@@ -53,8 +53,6 @@ on github directly. Once your update is complete, you can click the ``contribute
 Contribute New Models to MLC-LLM
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
-Woohoo! Congratulations
-
 * If you compiled a model for existing model architecture by following our :doc:`../tutorials/compile-models` tutorial.
   Please upload your model stored in ``dist`` directory to Hugging Face (an example can be
   found `here <https://huggingface.co/mlc-ai/mlc-chat-vicuna-v1-7b-q4f32_0/tree/main>`__), and please create a
diff --git a/docs/index.rst b/docs/index.rst
index 410eb1047c..e145ae896a 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -14,14 +14,83 @@ Everything runs locally with no server support and accelerated with local GPUs o
 
 `Join our Discord Server! <https://discord.gg/9Xpy2HGBuD>`_
 
+.. _navigation:
 
-We prepared a navigation page for you to get started with MLC-LLM.
+Navigation
+----------
 
-.. toctree::
-   :maxdepth: 1
-   :caption: Navigation
+Before you start, please select your use case so that we can narrow down the search scope.
+
+.. tabs ::
+
+   .. tab :: I want to run models on my device.
+ 
+
+      Please select your platform:
+
+      .. tabs ::
+
+         .. tab :: Android
+
+            Please check our instructions on `MLC-LLM Android app <https://mlc.ai/mlc-llm/#android>`__.
+
+         .. tab :: iOS
+
+            Please check our instructions on `MLC-LLM IOS app <https://mlc.ai/mlc-llm/#iphone>`__
+         
+         .. tab :: WebGPU
+
+            Please check `Web-LLM project <https://mlc.ai/web-llm/>`__.
+         
+         .. tab :: PC
+
+            MLC-LLM already provided a set of prebuilt models which you can deploy directly, check the :doc:`model-zoo` for the list of supported models.
+
+            .. tabs ::
+               
+               .. tab :: Use prebuilt models.
+
+                  Please check :doc:`tutorials/deploy-models` for instructions on preparing models and deploying models with MLC-LLM CLI.
 
-   navigation.rst
+                  * If you are a Mac OS user, and you will run a model compiled with ``metal`` backend, congratulations! You don't need to install any external drivers/packages but ``metal`` is natively supported by Mac OS.
+                  * If you will run a model compiled with ``CUDA`` backend, please install CUDA accordingly to our :ref:`CUDA installation guide <software-dependencies-cuda>`.
+                  * If you will run a model compiled with ``Vulkan`` backend, please install Vulkan Driver accordingly to our :ref:`Vulkan Driver Installation Guide <software-dependencies-vulkan-driver>`.
+
+               .. tab :: Build your own models.
+
+                  There are two different cases where you need to build your own models:
+
+                  * Use your own moden weights, or use different quantization data type/running data type.
+                  * Please check :doc:`tutorials/compile-models` for details on how to prepare build models for existing architectures.
+                  * Use a brand new model architecture which is not supported by MLC-LLM yet.
+                  * Please check :doc:`tutorials/bring-your-own-models` for details on how to add new model architectures to the MLC-LLM family.
+                
+                  In either cases, you are ecouraged to contribute to the MLC-LLM, see :ref:`contribute-new-models` on guidelines for contributing new models.
+
+   .. tab :: I need to customize MLC-LLM.
+
+      There are lots of interesting ways to further improve and customize MLC-LLM.
+
+      * The performance of MLC-LLM can be improved in a lot of ways, including (but not limited to) fusing multi-head attention with FlashAttention algorithm, or using more advanced quantization algorithms.
+      * We can also add new backends/language binding with the existing infrastructure.
+      
+      Before you start, please check our :doc:`tutorials/customize` to see how can you customize MLC-LLM for your own purpose.
+
+      You are encouraged to contribute to the MLC-LLM if your found your customization intersting.
+
+      .. tabs ::
+     
+         .. tab :: I need to customize TVM-Unity
+
+            In this case, user need to change TVM-Unity codebase. Please check :ref:`tvm-unity-build-from-source` on how to install TVM-Unity from source.
+
+            * If user want to compile models with ``CUDA`` backend, please install CUDA according to our :ref:`CUDA installation guide <software-dependencies-cuda>`.
+            * If user want to compile models with ``Vulkan`` backend, please install Vulkan-SDK according to our :ref:`Vulkan SDK Installation Guide <software-dependencies-vulkan-sdk>`.
+            * If user want to compile models with ``OpenCL`` backend, please install OpenCL-SDK according to our :ref:`OpenCL SDK Installation Guide <software-dependencies-opencl-sdk>`.
+
+         .. tab :: Use original TVM-Unity
+
+            In this case, please install prebuilt TVM-Unity package according to our :ref:`Installation Guidelines <tvm-unity-install-prebuilt-package>`.
 
 .. toctree::
    :maxdepth: 1
@@ -29,8 +98,8 @@ We prepared a navigation page for you to get started with MLC-LLM.
 
    install/index.rst
    install/software-dependencies.rst
-   tutorials/compile-models.rst
    tutorials/deploy-models.rst
+   tutorials/compile-models.rst
    tutorials/bring-your-own-models.rst
    tutorials/customize.rst
 
diff --git a/docs/install/index.rst b/docs/install/index.rst
index fb32480c97..566b2f2891 100644
--- a/docs/install/index.rst
+++ b/docs/install/index.rst
@@ -7,6 +7,7 @@ Install and Setup
 .. contents:: Table of Contents
     :depth: 3
 
+
 .. _tvm-unity-install:
 
 Install TVM Unity
@@ -167,7 +168,7 @@ Therefore, it is highly recommended to validate TVM Unity installation before us
     USE_ROCM: OFF
 
 .. note::
-    ``GIT_COMMIT_HASH`` indicates the exact commit of the TVM build, and it can be found on GitHub via `<https://github.com/mlc-ai/relax/commit/$GIT_COMMIT_HASH>`_.
+    ``GIT_COMMIT_HASH`` indicates the exact commit of the TVM build, and it can be found on GitHub via ``<https://github.com/mlc-ai/relax/commit/$GIT_COMMIT_HASH>``.
 
 **Step 4. Check device detection.** Sometimes it could be helpful to understand if TVM could detect your device at all with the following commands:
 
diff --git a/docs/install/software-dependencies.rst b/docs/install/software-dependencies.rst
index 1fe00620e5..95a4b318bb 100644
--- a/docs/install/software-dependencies.rst
+++ b/docs/install/software-dependencies.rst
@@ -5,7 +5,7 @@ Software Dependencies
     :depth: 2
     :local:
 
-While we have included most of the dependencies in our pre-built wheels/scripts, there are still some platform-dependent packages that you will need to install on your own. In most cases, you won't need all the packages listed on this page. If you're unsure about which packages are required for your specific use case, please refer to the :doc:`navigation page <../navigation>` first.
+While we have included most of the dependencies in our pre-built wheels/scripts, there are still some platform-dependent packages that you will need to install on your own. In most cases, you won't need all the packages listed on this page. If you're unsure about which packages are required for your specific use case, please check the :ref:`navigation panel <navigation>` first.
 
 .. _software-dependencies-cuda:
 
diff --git a/docs/misc/faq.rst b/docs/misc/faq.rst
index 4b89a6d0f8..7daf8333db 100644
--- a/docs/misc/faq.rst
+++ b/docs/misc/faq.rst
@@ -3,6 +3,8 @@ Frequently Asked Questions (FAQ)
 
 This is a list of Frequently Asked Questions (FAQ) about the MLC-LLM. Feel free to suggest new entries!
 
-How can I customize the temperature, repetition penalty of models?
+... How can I customize the temperature, repetition penalty of models?
+   There is a ``mlc-chat-config.json`` file under your model directory, where you can modify the parameters.
 
-*  There is a ``mlc-chat-config.json`` file under your model directory, where you can modity the parameters.
+... What's the quantization algorithm MLC-LLM using?
+   MLC-LLM does not impose any restrictions on the choice of the quantization algorithm, and the design should be compatible with all quantization algorithms. By default, we utilize the grouping quantization method discussed in the paper `The case for 4-bit precision: k-bit Inference Scaling Laws <https://arxiv.org/abs/2212.09720>`__.
\ No newline at end of file
diff --git a/docs/model-zoo.rst b/docs/model-zoo.rst
index ccdecb4821..8ef661f59f 100644
--- a/docs/model-zoo.rst
+++ b/docs/model-zoo.rst
@@ -6,14 +6,16 @@ Model Zoo
 
 MLC-LLM is a universal solution for deploying different language models, any language models that can be described in `TVM Relax <https://mlc.ai/chapter_graph_optimization/index.html>`__ (which is a general representation for Neural Networks, and can be imported from models written in PyTorch) should recognized by MLC-LLM and thus deployed to different backends with the help of TVM Unity.
 
-The community has already supported several LLM architectures, including LLaMA, GPT-NeoX, and MOSS. We eagerly anticipate further contributions from the community to expand the range of supported model architectures, with the goal of democratizing the deployment of LLMs.
+The community has already supported several LLM architectures, including LLaMA, GPT-NeoX, and GPT-J. We eagerly anticipate further contributions from the community to expand the range of supported model architectures, with the goal of democratizing the deployment of LLMs.
 
-List of Officially Supported Models
------------------------------------
+.. _off-the-shelf-models:
 
-Below is a list of models officially supported by MLC-LLM community. Pre-built models are hosted on Hugging Face, eliminating the need for users to compile them on their own. Each model is accompanied by detailed configurations. These models have undergone extensive testing on various devices, and their kernel performance has been fine-tuned by developers with the help of TVM.
+Off-the-Shelf Models
+--------------------
+
+Below is a list of off-the-shelf prebuilt models compiled by MLC-LLM community. These prebuilt models are hosted on Hugging Face, eliminating the need for users to compile them on their own. Each model is accompanied by detailed configurations. These models have undergone extensive testing on various devices, and their kernel performance has been fine-tuned by developers with the help of TVM.
 
-.. list-table:: Supported Models
+.. list-table:: Off-the-Shelf Models
   :widths: 15 15 15 15
   :header-rows: 1
 
@@ -24,36 +26,65 @@ Below is a list of models officially supported by MLC-LLM community. Pre-built m
   * - `vicuna-v1-7b-q4f32_0`
     - `Vicuna <https://lmsys.org/blog/2023-03-30-vicuna/>`__
     - * Weight storage data type: int4
-      * running data type: float32
-      * symmetric quantization
+      * Running data type: float32
+      * Symmetric quantization
     - `link <https://huggingface.co/mlc-ai/mlc-chat-vicuna-v1-7b-q4f32_0>`__
   * - `vicuna-v1-7b-q4f16_0`
     - `Vicuna <https://lmsys.org/blog/2023-03-30-vicuna/>`__
     - * Weight storage data type: int4
-      * running data type: float16
-      * symmetric quantization
+      * Running data type: float16
+      * Symmetric quantization
     - `link <https://huggingface.co/mlc-ai/mlc-chat-vicuna-v1-7b-q4f16_0>`__
   * - `RedPajama-INCITE-Chat-3B-v1-q4f32_0`
     - `RedPajama <https://www.together.xyz/blog/redpajama>`__
     - * Weight storage data type: int4
-      * running data type: float32
-      * symmetric quantization 
+      * Running data type: float32
+      * Symmetric quantization 
     - `link <https://huggingface.co/mlc-ai/mlc-chat-RedPajama-INCITE-Chat-3B-v1-q4f32_0>`__
   * - `RedPajama-INCITE-Chat-3B-v1-q4f16_0`
     - `RedPajama <https://www.together.xyz/blog/redpajama>`__
     - * Weight storage data type: int4
-      * running data type: float16
-      * symmetric quantization 
+      * Running data type: float16
+      * Symmetric quantization 
     - `link <https://huggingface.co/mlc-ai/mlc-chat-RedPajama-INCITE-Chat-3B-v1-q4f16_0>`__
 
-You can see `MLC-LLM pull requests <https://github.com/mlc-ai/mlc-llm/pulls?q=is%3Aopen+is%3Apr+label%3Anew-models>`__ to check the ongoing efforts of new models.
+You can check `MLC-LLM pull requests <https://github.com/mlc-ai/mlc-llm/pulls?q=is%3Aopen+is%3Apr+label%3Anew-models>`__ to track the ongoing efforts of new models. We encourage users to upload their compiled models to Hugging Face and share them with the community.
 
-Try different models
---------------------
+.. _supported-model-architectures:
+
+Supported Model Architectures
+-----------------------------
+
+MLC-LLM supports the following model architectures:
+
+.. list-table:: Supported Model Architectures
+  :widths: 15 15 15 15
+  :header-rows: 1
+
+  * - Category Code
+    - Series
+    - Model Definition
+    - Variants
+  * - ``llama``
+    - `LLaMa <https://github.com/facebookresearch/llama>`__
+    - `Relax Code <https://github.com/mlc-ai/mlc-llm/blob/main/mlc_llm/relax_model/llama.py>`__
+    - * `Alpaca <https://github.com/tatsu-lab/stanford_alpaca>`__
+      * `Vicuna <https://lmsys.org/blog/2023-03-30-vicuna/>`__
+      * `Guanaco <https://github.com/artidoro/qlora>`__
+  * - ``gpt-neox``
+    - `GPT-NeoX <https://github.com/EleutherAI/gpt-neox>`__
+    - `Relax Code <https://github.com/mlc-ai/mlc-llm/blob/main/mlc_llm/relax_model/gpt_neox.py>`__
+    - * `RedPajama <https://www.together.xyz/blog/redpajama>`__
+      * `Dolly <https://github.com/databrickslabs/dolly>`__
+      * `Pythia <https://huggingface.co/EleutherAI/pythia-1.4b>`__
+  * - ``gptj``
+    - `GPT-J <https://github.com/kingoflolz/mesh-transformer-jax>`__
+    - `Relax Code <https://github.com/mlc-ai/mlc-llm/blob/main/mlc_llm/relax_model/gptj.py>`__
+    - * `MOSS <https://github.com/OpenLMLab/MOSS>`__
 
-Please check :doc:`/tutorials/compile-models` on how to compile models with supported model architectures, and :doc:`/tutorials/bring-your-own-models` on how to bring a new LLM model architecture.
+For models within these model architectures, you can check the :doc:`/tutorials/compile-models` on how to compile models. Please create a new issue if you want to request a new model architecture. Our tutorial :doc:`/tutorials/bring-your-own-models` introduces how to bring a new  model architecture to MLC-LLM.
 
 Contribute to MLC-LLM Model Zoo
 -------------------------------
 
-Awesome! Please check our :doc:`/contribute/community` on how to contribute to MLC-LLM.
+Ready to contribute your compiled models/new model architectures? Awesome! Please check our :doc:`/contribute/community` on how to contribute to MLC-LLM.
diff --git a/docs/navigation.rst b/docs/navigation.rst
deleted file mode 100644
index c36956ba9d..0000000000
--- a/docs/navigation.rst
+++ /dev/null
@@ -1,75 +0,0 @@
-Navigation
-==========
-
-Before you start, please select your use case so that we can narrow down the search scope.
-
-.. tabs ::
-
-   .. tab :: I want to run models on my device.
- 
-
-      Please select your platform:
-
-      .. tabs ::
-
-         .. tab :: Android
-
-            Please check our instructions on `MLC-LLM Android app <https://mlc.ai/mlc-llm/#android>`__.
-
-         .. tab :: iOS
-
-            Please check our instructions on `MLC-LLM IOS app <https://mlc.ai/mlc-llm/#iphone>`__
-         
-         .. tab :: WebGPU
-
-            Please check `Web-LLM project <https://mlc.ai/web-llm/>`__.
-         
-         .. tab :: PC
-
-            MLC-LLM already provided a set of prebuilt models which you can deploy directly, check the :doc:`model-zoo` for the list of supported models.
-
-            .. tabs ::
-               
-               .. tab :: Use prebuilt models.
-
-                  Please check :doc:`tutorials/deploy-models` for instructions on preparing models and deploying models with MLC-LLM CLI.
-
-                  * If you are a Mac OS user, and you will run a model compiled with ``metal`` backend, congratulations! You don't need to install any external drivers/packages but ``metal`` is natively supported by Mac OS.
-                  * If you will run a model compiled with ``CUDA`` backend, please install CUDA accordingly to our :ref:`CUDA installation guide <software-dependencies-cuda>`.
-                  * If you will run a model compiled with ``Vulkan`` backend, please install Vulkan Driver accordingly to our :ref:`Vulkan Driver Installation Guide <software-dependencies-vulkan-driver>`.
-
-               .. tab :: Build your own models.
-
-                  There are two different cases where you need to build your own models:
-
-                  * Use your own moden weights, or use different quantization data type/running data type.
-                  * Please check :doc:`tutorials/compile-models` for details on how to prepare build models for existing architectures.
-                  * Use a brand new model architecture which is not supported by MLC-LLM yet.
-                  * Please check :doc:`tutorials/bring-your-own-models` for details on how to add new model architectures to the MLC-LLM family.
-                
-                  In either cases, you are ecouraged to contribute to the MLC-LLM, see :ref:`contribute-new-models` on guidelines for contributing new models.
-
-   .. tab :: I need to customize MLC-LLM.
-
-      There are lots of interesting ways to further improve and customize MLC-LLM.
-
-      * The performance of MLC-LLM can be improved in a lot of ways, including (but not limited to) fusing multi-head attention with FlashAttention algorithm, or using more advanced quantization algorithms.
-      * We can also add new backends/language binding with the existing infrastructure.
-      
-      Before you start, please check our :doc:`tutorials/customize` to see how can you customize MLC-LLM for your own purpose.
-
-      You are encouraged to contribute to the MLC-LLM if your found your customization intersting.
-
-      .. tabs ::
-     
-         .. tab :: I need to customize TVM-Unity
-
-            In this case, user need to change TVM-Unity codebase. Please check :ref:`tvm-unity-build-from-source` on how to install TVM-Unity from source.
-
-            * If user want to compile models with ``CUDA`` backend, please install CUDA according to our :ref:`CUDA installation guide <software-dependencies-cuda>`.
-            * If user want to compile models with ``Vulkan`` backend, please install Vulkan-SDK according to our :ref:`Vulkan SDK Installation Guide <software-dependencies-vulkan-sdk>`.
-            * If user want to compile models with ``OpenCL`` backend, please install OpenCL-SDK according to our :ref:`OpenCL SDK Installation Guide <software-dependencies-opencl-sdk>`.
-
-         .. tab :: Use original TVM-Unity
-
-            In this case, please install prebuilt TVM-Unity package according to our :ref:`Installation Guidelines <tvm-unity-install-prebuilt-package>`.
\ No newline at end of file
diff --git a/docs/tutorials/compile-models.rst b/docs/tutorials/compile-models.rst
index b807274ec2..ff22033be2 100644
--- a/docs/tutorials/compile-models.rst
+++ b/docs/tutorials/compile-models.rst
@@ -3,11 +3,13 @@
 How to Compile Models
 =====================
 
-In this tutorial, we will guide you on how to **build** LLM whose architectures are already supported by MLC LLM to different backends. Before diving into this tutorial, you should first finish the :ref:`Installation and Setup`. In the following content, we assume you have already installed them and will not cover the installation part. After finish building, you can checkout the tutorial `deploy-build-applications <http://127.0.0.1>`_.
+In this tutorial, we will guide you on how to build an LLM with architectures that are already supported by MLC LLM for different backends. Before proceeding with this tutorial, make sure you have completed the :doc:`/install/index` tutorial. In the following content, we assume that you have already installed the necessary components and will not cover the installation steps.
 
-.. note::
-    At this moment, MLC LLM officially supports two model architectures: `LLaMA <https://github.com/facebookresearch/llama>`_ and `GPT-NeoX <https://github.com/EleutherAI/gpt-neox>`_.
-    To build models with other model architectures, please refer to the tutorial `model-architecture-variant <http://127.0.0.1>`_.
+We have provided a list of off-the-shelf prebuilt models (refer to the :ref:`off-the-shelf-models` section) that you can directly use without the need for building. To learn how to deploy these models, refer to the tutorial :doc:deploy-models.
+
+If your model is not included in the list of off-the-shelf models, but its architecture falls within the supported model architectures (see the :ref:`supported-model-architectures` section), you can follow this tutorial to build your model for different backends.
+
+In the event that your model architecture is not supported, you can refer to the tutorial :doc:`bring-your-own-models` to learn how to introduce new model architectures.
 
 This tutorial contains the following sections in order:
 
@@ -254,6 +256,8 @@ Here are some notes on the build commands above:
 
 After running the build script successfully, you can proceed to the next tutorial on `how to deploy models to different backends <http:127.0.0.1>`_.
 
+.. warning::
+    In certain cases, using 3-bit quantization for compiling can be overly aggressive and may result in the compiled model generating meaningless text. If you encounter issues where the compiled model does not perform as expected, consider utilizing a higher number of bits for quantization (e.g., 4-bit quantization).
 
 
 Why Need Build?
diff --git a/docs/tutorials/deploy-models.rst b/docs/tutorials/deploy-models.rst
index ef8273700e..80f17b0b04 100644
--- a/docs/tutorials/deploy-models.rst
+++ b/docs/tutorials/deploy-models.rst
@@ -13,6 +13,19 @@ We first introduce how to prepare the (pre)built model libraries and weights, an
     :depth: 1
     :local:
 
+
+.. _clone_repo:
+
+Clone the MLC LLM Repository
+----------------------------
+
+If you haven't already cloned the MLC-LLM repository, now is the perfect time to do so.
+
+.. code:: shell
+
+    git clone git@github.com:mlc-ai/mlc-llm.git --recursive
+    cd mlc-llm
+
 .. _knowing-local-id:
 
 Get to Know Model's Local ID
@@ -274,14 +287,21 @@ If you are a MLC-LLM developer and you add some functionalities to the CLI, you
 
 .. code:: shell
 
+    # create build directory
     mkdir -p build
+    # prepare dependencies
+    bash scripts/prep_deps.sh
+    source "$HOME/.cargo/env"
+    # generation cmake config
     python3 cmake/gen_cmake_config.py
-    cp cmake/config.cmake build
+    cp config.cmake build
+    # build
     cd build
     cmake ..
     make -j$(nproc)
     sudo make install
-    ldconfig  # Refresh shared library cache
+    # Refresh shared library cache
+    ldconfig  
     cd -
 
 .. note::
@@ -327,14 +347,9 @@ After confirming the local id, we can run the model in CLI by
 
 .. code:: shell
 
-    # If CLI is installed from Conda:
     mlc_chat_cli --local-id LOCAL_ID
     # example:
     mlc_chat_cli --local-id RedPajama-INCITE-Chat-3B-v1-q4f16_0
-
-    # If CLI is built from source:
-    mlc_chat_cli --local-id LOCAL_ID
-    # example:
     mlc_chat_cli --local-id vicuna-v1-7b-q3f16_0
 
 
diff --git a/scripts/prep_deps.sh b/scripts/prep_deps.sh
index ddfdb11f42..be3726ed64 100755
--- a/scripts/prep_deps.sh
+++ b/scripts/prep_deps.sh
@@ -12,7 +12,6 @@ else
   read answer
   if [ "$answer" != "${answer#[Yy]}" ] ;then 
     curl https://sh.rustup.rs -sSf | sh
-    source "$HOME/.cargo/env"
   else
     echo "Failed installation: the dependency cargo not installed."
     exit 1