|
| 1 | +.. _installation: |
| 2 | + |
| 3 | +============ |
| 4 | +Installation |
| 5 | +============ |
| 6 | + |
| 7 | +Xinference can be installed with ``docker`` on Nvidia, NPU, GCU, and DCU.To run models using Xinference, you will need to pull the image corresponding to the type of device you intend to serve. |
| 8 | + |
| 9 | + |
| 10 | + |
| 11 | +Nvidia |
| 12 | +------------------- |
| 13 | + |
| 14 | +To pull the Nvidia image, run the following command: |
| 15 | + |
| 16 | +.. code-block:: bash |
| 17 | +
|
| 18 | + docker login [email protected] registry.cn-hangzhou.aliyuncs.com |
| 19 | + Password: cre.uwd3nyn4UDM6fzm |
| 20 | + docker pull registry.cn-hangzhou.aliyuncs.com/xinference-prod/xinference-prod:0.0.10-nvidia |
| 21 | +
|
| 22 | +
|
| 23 | +Run Command Example |
| 24 | +^^^^^^^^^^^^^^^^^^^ |
| 25 | + |
| 26 | +To run the container, use the following command: |
| 27 | + |
| 28 | +.. code-block:: bash |
| 29 | +
|
| 30 | + docker run -it \ |
| 31 | + --name Xinf \ |
| 32 | + --network host \ |
| 33 | + --gpus all \ |
| 34 | + --restart unless-stopped \ |
| 35 | + -v </your/home/path>/.xinference:/root/.xinference \ |
| 36 | + -v </your/home/path>/.cache/huggingface:/root/.cache/huggingface \ |
| 37 | + -v </your/home/path>/.cache/modelscope:/root/.cache/modelscope \ |
| 38 | + registry.cn-hangzhou.aliyuncs.com/xinference-prod/xinference-prod:0.0.10-nvidia /bin/bash |
| 39 | +
|
| 40 | +Start Xinference |
| 41 | +^^^^^^^^^^^^^^^^^^^ |
| 42 | + |
| 43 | +After starting the container, navigate to the `/opt/projects` directory inside the container and run the following command: |
| 44 | + |
| 45 | +.. code-block:: bash |
| 46 | +
|
| 47 | + ./xinf-enterprise.sh --host 192.168.10.197 --port 9997 && \ |
| 48 | + XINFERENCE_MODEL_SRC=modelscope xinference-local --host 192.168.10.197 --port 9997 --log-level debug |
| 49 | +
|
| 50 | +The `./xinf-enterprise.sh` script is used to start the Nginx service and write the Xinf service startup address to the configuration file. |
| 51 | + |
| 52 | +The Xinf service startup command can be adjusted according to actual requirements. The `host` and `port` should be adjusted according to your device's configuration. |
| 53 | + |
| 54 | +Once the Xinf service is started, you can access the Xinf WebUI interface by visiting port 8000. |
| 55 | + |
| 56 | +MindIE Series |
| 57 | +------------------- |
| 58 | + |
| 59 | +Version Information |
| 60 | +^^^^^^^^^^^^^^^^^^^ |
| 61 | +- Python Version: 3.10 |
| 62 | +- CANN Version: 8.0.rc2 |
| 63 | +- Operating System Version: ubuntu_22.04 |
| 64 | +- mindie_1.0.RC2 |
| 65 | + |
| 66 | + |
| 67 | +Dependencies |
| 68 | +^^^^^^^^^^^^^^^^^^^ |
| 69 | +For 310I DUO: |
| 70 | +- Driver: Ascend-hdk-310p-npu-driver_24.1.rc2_linux-aarch64.run - `Download <https://obs-whaicc-fae-public.obs.cn-central-221.ovaijisuan.com/cann/mindie/1.0.RC2/310p/Ascend-hdk-310p-npu-driver_24.1.rc2_linux-aarch64.run>`_ |
| 71 | +- Firmware: Ascend-hdk-310p-npu-firmware_7.3.0.1.231.run - `Download <https://obs-whaicc-fae-public.obs.cn-central-221.ovaijisuan.com/cann/mindie/1.0.RC2/310p/Ascend-hdk-310p-npu-firmware_7.3.0.1.231.run>`_ |
| 72 | + |
| 73 | +For 910B: |
| 74 | +- Driver: Ascend-hdk-910b-npu-driver_24.1.rc3_linux-aarch64.run - `Download <https://ascend-repo.obs.cn-east-2.myhuaweicloud.com/Ascend%20HDK/Ascend%20HDK%2024.1.RC3/Ascend-hdk-910b-npu-driver_24.1.rc3_linux-aarch64.run?response-content-type=application/octet-stream>`_ |
| 75 | +- Firmware: Ascend-hdk-910b-npu-firmware_7.5.0.1.129.run - `Download <https://ascend-repo.obs.cn-east-2.myhuaweicloud.com/Ascend%20HDK/Ascend%20HDK%2024.1.RC3/Ascend-hdk-910b-npu-firmware_7.5.0.1.129.run?response-content-type=application/octet-stream>`_ |
| 76 | + |
| 77 | +Download the `.run` packages to the host machine, and then run the following commands to install the drivers and firmware: |
| 78 | + |
| 79 | +.. code-block:: bash |
| 80 | +
|
| 81 | + chmod +x Ascend-hdk-910b-npu-driver_24.1.rc3_linux-aarch64.run |
| 82 | + ./Ascend-hdk-910b-npu-firmware_7.5.0.1.129.run --full |
| 83 | +
|
| 84 | +Once the installation is complete, the output should indicate "successfully," confirming the installation. The firmware installation method is the same. |
| 85 | + |
| 86 | +When Mindie does not start properly, verify that the driver and firmware versions match. Both the driver and firmware must be installed on the host machine and loaded into the Docker container via mounting. |
| 87 | + |
| 88 | +For version upgrades, install the firmware first, then the driver. |
| 89 | + |
| 90 | +Pull the Image |
| 91 | +^^^^^^^^^^^^^^^^^^^ |
| 92 | +For 310I DUO: |
| 93 | + |
| 94 | +.. code-block:: bash |
| 95 | +
|
| 96 | + docker login [email protected] registry.cn-hangzhou.aliyuncs.com |
| 97 | + Password: cre.uwd3nyn4UDM6fzm |
| 98 | + docker pull registry.cn-hangzhou.aliyuncs.com/xinference-prod/xinference-prod:0.0.10-310p |
| 99 | +
|
| 100 | +For 910B: |
| 101 | + |
| 102 | +.. code-block:: bash |
| 103 | +
|
| 104 | + docker login [email protected] registry.cn-hangzhou.aliyuncs.com |
| 105 | + Password: cre.uwd3nyn4UDM6fzm |
| 106 | + docker pull registry.cn-hangzhou.aliyuncs.com/xinference-prod/xinference-prod:0.0.10-910b |
| 107 | +
|
| 108 | +Run Command Example |
| 109 | +^^^^^^^^^^^^^^^^^^^ |
| 110 | +To run the container, use the following command: |
| 111 | + |
| 112 | +.. code-block:: bash |
| 113 | +
|
| 114 | + docker run --name MindIE-Xinf -it \ |
| 115 | + -d \ |
| 116 | + --net=host \ |
| 117 | + --shm-size=500g \ |
| 118 | + --privileged=true \ |
| 119 | + -w /opt/projects \ |
| 120 | + --device=/dev/davinci_manager \ |
| 121 | + --device=/dev/hisi_hdc \ |
| 122 | + --device=/dev/devmm_svm \ |
| 123 | + --entrypoint=bash \ |
| 124 | + -v /usr/local/Ascend/driver:/usr/local/Ascend/driver \ |
| 125 | + -v /usr/local/dcmi:/usr/local/dcmi \ |
| 126 | + -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \ |
| 127 | + -v /usr/local/sbin:/usr/local/sbin \ |
| 128 | + -v /home:/home \ |
| 129 | + -v /root:/root/model \ |
| 130 | + -v /tmp:/tmp \ |
| 131 | + -v </your/home/path>/.xinference:/root/.xinference \ |
| 132 | + -v </your/home/path>/.cache/huggingface:/root/.cache/huggingface \ |
| 133 | + -v </your/home/path>/.cache/modelscope:/root/.cache/modelscope \ |
| 134 | + -e http_proxy=$http_proxy \ |
| 135 | + -e https_proxy=$https_proxy \ |
| 136 | + registry.cn-hangzhou.aliyuncs.com/xinference-prod/xinference-prod:0.0.10-910b |
| 137 | +
|
| 138 | +Start Xinference |
| 139 | +^^^^^^^^^^^^^^^^^^^ |
| 140 | +After starting the container, navigate to the `/opt/projects` directory inside the container and run the following command: |
| 141 | + |
| 142 | +.. code-block:: bash |
| 143 | +
|
| 144 | + ./xinf-enterprise.sh --host 192.168.10.197 --port 9997 && \ |
| 145 | + XINFERENCE_MODEL_SRC=modelscope xinference-local --host 192.168.10.197 --port 9997 --log-level debug |
| 146 | +
|
| 147 | +The `./xinf-enterprise.sh` script starts the Nginx service and writes the Xinf service startup address to the configuration file. |
| 148 | + |
| 149 | +The Xinf service startup command can be adjusted according to your needs. Adjust the `host` and `port` according to your device's configuration. |
| 150 | + |
| 151 | +Once the Xinf service is started, you can access the Xinf WebUI by visiting port 8000. |
| 152 | + |
| 153 | +Supported Models |
| 154 | +^^^^^^^^^^^^^^^^^^^ |
| 155 | + |
| 156 | +When selecting a model execution engine, we recommend using the Mindie model for faster inference speed. Other engines may have slower inference speeds and are not recommended for use. |
| 157 | + |
| 158 | +Currently, Mindie supports the following large language models: |
| 159 | + |
| 160 | +- baichuan-chat |
| 161 | +- baichuan-2-chat |
| 162 | +- chatglm3 |
| 163 | +- deepseek-chat |
| 164 | +- deepseek-coder-instruct |
| 165 | +- llama-3-instruct |
| 166 | +- mistral-instruct-v0.3 |
| 167 | +- telechat |
| 168 | +- Yi-chat |
| 169 | +- Yi-1.5-chat |
| 170 | +- qwen-chat |
| 171 | +- qwen1.5-chat |
| 172 | +- codeqwen1.5-chat |
| 173 | +- qwen2-instruct |
| 174 | +- csg-wukong-chat-v0.1 |
| 175 | +- qwen2.5 series (qwen2.5-instruct, qwen2.5-coder-instruct, etc.) |
| 176 | + |
| 177 | +Embedding Models: |
| 178 | +- bge-large-zh-v1.5 |
| 179 | + |
| 180 | +Rerank Models: |
| 181 | +- bge-reranker-large |
0 commit comments