Skip to content

Commit 0d5d3ab

Browse files
Merge branch 'huggingface:main' into kontext
2 parents c160e7f + 3c8e4ba commit 0d5d3ab

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

46 files changed

+3100
-987
lines changed

.github/workflows/build_documentation.yml

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,10 @@ on:
88
- doc-builder*
99
- v*-release
1010

11+
env:
12+
UV_SYSTEM_PYTHON: 1
13+
UV_TORCH_BACKEND: auto
14+
1115
jobs:
1216
build_documentation:
1317
runs-on: ubuntu-22.04
@@ -21,13 +25,13 @@ jobs:
2125
- uses: actions/checkout@v4
2226
- uses: actions/setup-node@v4
2327
with:
24-
node-version: '18'
28+
node-version: "18"
2529
cache-dependency-path: "kit/package-lock.json"
2630

2731
- name: Set up Python
2832
uses: actions/setup-python@v4
2933
with:
30-
python-version: '3.11'
34+
python-version: "3.11"
3135

3236
- name: Set environment variables
3337
run: |
@@ -45,11 +49,9 @@ jobs:
4549
4650
- name: Setup environment
4751
run: |
48-
python -m pip install --upgrade pip
49-
python -m pip install --upgrade setuptools
50-
python -m pip install git+https://github.com/huggingface/doc-builder
51-
python -m pip install .[quality]
52-
python -m pip install openvino nncf neural-compressor[pt] diffusers accelerate
52+
pip install --upgrade pip uv
53+
uv pip install git+https://github.com/huggingface/doc-builder
54+
uv pip install .[quality] nncf openvino neural-compressor[pt]>3.4 diffusers accelerate
5355
5456
- name: Make documentation
5557
shell: bash

.github/workflows/build_pr_documentation.yml

Lines changed: 22 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -9,10 +9,13 @@ concurrency:
99
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
1010
cancel-in-progress: true
1111

12+
env:
13+
UV_SYSTEM_PYTHON: 1
14+
UV_TORCH_BACKEND: auto
15+
1216
jobs:
1317
build_documentation:
1418
runs-on: ubuntu-22.04
15-
1619
env:
1720
COMMIT_SHA: ${{ github.event.pull_request.head.sha }}
1821
PR_NUMBER: ${{ github.event.number }}
@@ -21,42 +24,34 @@ jobs:
2124

2225
steps:
2326
- uses: actions/checkout@v4
27+
- uses: actions/setup-node@v4
2428
with:
25-
repository: "huggingface/doc-builder"
26-
path: doc-builder
29+
node-version: "18"
30+
cache-dependency-path: "kit/package-lock.json"
2731

28-
- uses: actions/checkout@v4
32+
- name: Set up Python
33+
uses: actions/setup-python@v4
2934
with:
30-
repository: "huggingface/optimum-intel"
31-
path: optimum-intel
32-
33-
- name: Setup Python
34-
uses: actions/setup-python@v5
35-
with:
36-
python-version: 3.9
35+
python-version: "3.11"
3736

3837
- name: Setup environment
3938
run: |
40-
pip install --upgrade pip
41-
pip uninstall -y doc-builder
42-
cd doc-builder
43-
git pull origin main
44-
pip install .
45-
pip install black
46-
cd ..
39+
pip install --upgrade pip uv
40+
uv pip install git+https://github.com/huggingface/doc-builder
41+
uv pip install .[quality] nncf openvino neural-compressor[pt]>3.4 diffusers accelerate
4742
4843
- name: Make documentation
44+
shell: bash
4945
run: |
50-
cd optimum-intel
51-
make doc BUILD_DIR=intel-doc-build VERSION=pr_$PR_NUMBER COMMIT_SHA_SUBPACKAGE=$COMMIT_SHA CLONE_URL=$PR_CLONE_URL
52-
cd ..
53-
54-
- name: Save commit_sha & pr_number
55-
run: |
56-
cd optimum-intel
57-
sudo chmod -R ugo+rwx intel-doc-build
46+
doc-builder build optimum.intel docs/source/ \
47+
--repo_name optimum-intel \
48+
--build_dir intel-doc-build/ \
49+
--version pr_${{ env.PR_NUMBER }} \
50+
--version_tag_suffix "" \
51+
--html \
52+
--clean
5853
cd intel-doc-build
59-
sudo mv optimum.intel optimum-intel
54+
mv optimum.intel optimum-intel
6055
echo ${{ env.COMMIT_SHA }} > ./commit_sha
6156
echo ${{ env.PR_NUMBER }} > ./pr_number
6257

.github/workflows/test_inc.yml

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -38,8 +38,7 @@ jobs:
3838
run: |
3939
pip install --upgrade pip
4040
pip install torch==${{ matrix.torch-version }} torchaudio torchvision --index-url https://download.pytorch.org/whl/cpu
41-
pip install .[neural-compressor,tests] intel-extension-for-pytorch==${{ matrix.torch-version }}
42-
pip install diffusers==0.32.2
41+
pip install .[tests,neural-compressor] intel-extension-for-pytorch==${{ matrix.torch-version }} diffusers==0.32.2
4342
4443
- name: Assert versions
4544
run: |

.github/workflows/test_openvino.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ jobs:
4141
run: |
4242
pip install --upgrade pip
4343
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
44-
pip install .[openvino,openvino-tokenizers,diffusers,tests]
44+
pip install .[openvino,diffusers,tests]
4545
4646
- if: ${{ matrix.transformers-version != 'latest' }}
4747
name: Install specific dependencies and versions required for older transformers

.github/workflows/test_openvino_full.yml

Lines changed: 15 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -23,43 +23,31 @@ jobs:
2323
strategy:
2424
fail-fast: false
2525
matrix:
26-
include:
27-
- python-version: "3.9"
28-
os: "ubuntu-22.04"
29-
transformers-version: "latest"
30-
openvino: "ov-stable"
31-
nncf: "nncf-stable"
32-
- python-version: "3.9"
33-
os: "ubuntu-22.04"
34-
transformers-version: "latest"
35-
openvino: "ov-nightly"
36-
nncf: "nncf-stable"
37-
- python-version: "3.9"
38-
os: "ubuntu-22.04"
39-
transformers-version: "latest"
40-
openvino: "ov-stable"
41-
nncf: "nncf-develop"
42-
- python-version: "3.9"
43-
os: "ubuntu-22.04"
44-
transformers-version: "latest"
45-
openvino: "ov-nightly"
46-
nncf: "nncf-develop"
47-
48-
runs-on: ${{ matrix.os }}
26+
nncf: ["nncf-stable", "nncf-develop"]
27+
openvino: ["ov-stable", "ov-nightly"]
28+
transformers-version: ["latest"]
29+
30+
runs-on: ubuntu-22.04
4931

5032
steps:
51-
- uses: actions/checkout@v4
52-
- name: Setup Python ${{ matrix.python-version }}
33+
- name: Free Disk Space (Ubuntu)
34+
uses: jlumbroso/free-disk-space@main
35+
36+
- name: Checkout code
37+
uses: actions/checkout@v4
38+
39+
- name: Setup Python
5340
uses: actions/setup-python@v5
5441
with:
55-
python-version: ${{ matrix.python-version }}
42+
python-version: 3.9
5643

5744
- name: Install dependencies
5845
run: |
59-
python -m pip install --upgrade pip
46+
pip install --upgrade pip
6047
# Install PyTorch CPU to prevent unnecessary downloading/installing of CUDA packages
6148
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
6249
pip install .[tests,diffusers]
50+
pip uninstall opencv-python -y && pip install opencv-python-headless
6351
6452
- name: Install openvino-nightly
6553
if: ${{ matrix.openvino == 'ov-nightly' }}

.github/workflows/test_openvino_notebooks.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@ jobs:
2929
"optimum_openvino_inference.ipynb",
3030
"question_answering_quantization.ipynb",
3131
"sentence_transformer_quantization.ipynb",
32+
"vision_language_quantization.ipynb",
3233
# "stable_diffusion_hybrid_quantization.ipynb", TODO: update and ran on a powerful cpu
3334
]
3435

@@ -45,6 +46,7 @@ jobs:
4546

4647
- name: Install packages
4748
run: |
49+
sudo apt-get update
4850
sudo apt-get install -y ffmpeg
4951
5052
- name: Install dependencies

.github/workflows/test_openvino_slow.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,10 @@ jobs:
4040
runs-on: ${{ matrix.os }}
4141

4242
steps:
43+
- name: Free Disk Space (Ubuntu)
44+
if: matrix.runs-on == 'ubuntu-22.04'
45+
uses: jlumbroso/free-disk-space@main
46+
4347
- name: Checkout code
4448
uses: actions/checkout@v4
4549

Makefile

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,3 +59,12 @@ doc: build_doc_docker_image
5959
--version_tag_suffix "" \
6060
--html \
6161
--clean
62+
63+
clean:
64+
rm -rf build
65+
rm -rf dist
66+
rm -rf .pytest_cache
67+
rm -rf .ruff_cache
68+
rm -rf .mypy_cache
69+
rm -rf optimum_intel.egg-info
70+
rm -rf *__pycache__

README.md

Lines changed: 16 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -40,31 +40,22 @@ or to install from source including dependencies:
4040
python -m pip install "optimum-intel[extras]"@git+https://github.com/huggingface/optimum-intel.git
4141
```
4242

43-
where `extras` can be one or more of `ipex`, `neural-compressor`, `openvino`, `nncf`.
43+
where `extras` can be one or more of `ipex`, `neural-compressor`, `openvino`.
4444

4545
# Quick tour
4646

4747
## Neural Compressor
4848

49-
Dynamic quantization can be used through the Optimum command-line interface:
49+
Dynamic quantization can be used through the Optimum CLI:
5050

5151
```bash
5252
optimum-cli inc quantize --model distilbert-base-cased-distilled-squad --output ./quantized_distilbert
5353
```
5454
Note that quantization is currently only supported for CPUs (only CPU backends are available), so we will not be utilizing GPUs / CUDA in this example.
5555

56-
To load a quantized model hosted locally or on the 🤗 hub, you can do as follows :
57-
```python
58-
from optimum.intel import INCModelForSequenceClassification
59-
60-
model_id = "Intel/distilbert-base-uncased-finetuned-sst-2-english-int8-dynamic"
61-
model = INCModelForSequenceClassification.from_pretrained(model_id)
62-
```
63-
6456
You can load many more quantized models hosted on the hub under the Intel organization [`here`](https://huggingface.co/Intel).
6557

66-
For more details on the supported compression techniques, please refer to the [documentation](https://huggingface.co/docs/optimum/main/en/intel/optimization_inc).
67-
58+
For more details on the supported compression techniques, please refer to the [documentation](https://huggingface.co/docs/optimum-intel/en/neural_compressor/optimization).
6859

6960
## OpenVINO
7061

@@ -75,28 +66,27 @@ Below are examples of how to use OpenVINO and its [NNCF](https://docs.openvino.a
7566
It is also possible to export your model to the [OpenVINO IR](https://docs.openvino.ai/2024/documentation/openvino-ir-format.html) format with the CLI :
7667

7768
```plain
78-
optimum-cli export openvino --model gpt2 ov_model
69+
optimum-cli export openvino --model meta-llama/Meta-Llama-3-8B ov_llama/
7970
```
8071

8172
You can also apply 8-bit weight-only quantization when exporting your model : the model linear, embedding and convolution weights will be quantized to INT8, the activations will be kept in floating point precision.
8273

8374
```plain
84-
optimum-cli export openvino --model gpt2 --weight-format int8 ov_model
75+
optimum-cli export openvino --model meta-llama/Meta-Llama-3-8B --weight-format int8 ov_llama_int8/
8576
```
8677

8778
Quantization in hybrid mode can be applied to Stable Diffusion pipeline during model export. This involves applying hybrid post-training quantization to the UNet model and weight-only quantization for the rest of the pipeline components. In the hybrid mode, weights in MatMul and Embedding layers are quantized, as well as activations of other layers.
8879

8980
```plain
90-
optimum-cli export openvino --model stabilityai/stable-diffusion-2-1 --dataset conceptual_captions --weight-format int8 ov_model
81+
optimum-cli export openvino --model stabilityai/stable-diffusion-2-1 --dataset conceptual_captions --weight-format int8 ov_model_sd/
9182
```
9283

93-
To apply quantization on both weights and activations, you can find more information in the [documentation](https://huggingface.co/docs/optimum/main/en/intel/optimization_ov).
84+
To apply quantization on both weights and activations, you can find more information in the [documentation](https://huggingface.co/docs/optimum-intel/en/openvino/optimization).
9485

9586
#### Inference:
9687

9788
To load a model and run inference with OpenVINO Runtime, you can just replace your `AutoModelForXxx` class with the corresponding `OVModelForXxx` class.
9889

99-
10090
```diff
10191
- from transformers import AutoModelForSeq2SeqLM
10292
+ from optimum.intel import OVModelForSeq2SeqLM
@@ -112,50 +102,22 @@ To load a model and run inference with OpenVINO Runtime, you can just replace yo
112102
[{'translation_text': "Il n'est jamais sorti sans un livre sous son bras, et il est souvent revenu avec deux."}]
113103
```
114104

115-
If you want to load a PyTorch checkpoint, set `export=True` to convert your model to the OpenVINO IR.
105+
#### Quantization:
116106

117-
```python
118-
from optimum.intel import OVModelForCausalLM
119-
120-
model = OVModelForCausalLM.from_pretrained("gpt2", export=True)
121-
model.save_pretrained("./ov_model")
122-
```
107+
Post-training static quantization can also be applied. Here is an example on how to apply static quantization on a Whisper model using the [LibriSpeech](https://huggingface.co/datasets/openslr/librispeech_asr) dataset for the calibration step.
123108

109+
```python
110+
from optimum.intel import OVModelForSpeechSeq2Seq, OVQuantizationConfig
124111

125-
#### Post-training static quantization:
126-
127-
Post-training static quantization introduces an additional calibration step where data is fed through the network in order to compute the activations quantization parameters. Here is an example on how to apply static quantization on a fine-tuned DistilBERT.
112+
model_id = "openai/whisper-tiny"
113+
q_config = OVQuantizationConfig(dtype="int8", dataset="librispeech", num_samples=50)
114+
q_model = OVModelForSpeechSeq2Seq.from_pretrained(model_id, quantization_config=q_config)
128115

129-
```python
130-
from functools import partial
131-
from optimum.intel import OVQuantizer, OVModelForSequenceClassification, OVConfig, OVQuantizationConfig
132-
from transformers import AutoTokenizer, AutoModelForSequenceClassification
133-
134-
model_id = "distilbert-base-uncased-finetuned-sst-2-english"
135-
model = OVModelForSequenceClassification.from_pretrained(model_id, export=True)
136-
tokenizer = AutoTokenizer.from_pretrained(model_id)
137-
def preprocess_fn(examples, tokenizer):
138-
return tokenizer(
139-
examples["sentence"], padding=True, truncation=True, max_length=128
140-
)
141-
142-
quantizer = OVQuantizer.from_pretrained(model)
143-
calibration_dataset = quantizer.get_calibration_dataset(
144-
"glue",
145-
dataset_config_name="sst2",
146-
preprocess_function=partial(preprocess_fn, tokenizer=tokenizer),
147-
num_samples=100,
148-
dataset_split="train",
149-
preprocess_batch=True,
150-
)
151116
# The directory where the quantized model will be saved
152117
save_dir = "nncf_results"
153-
# Apply static quantization and save the resulting model in the OpenVINO IR format
154-
ov_config = OVConfig(quantization_config=OVQuantizationConfig())
155-
quantizer.quantize(ov_config=ov_config, calibration_dataset=calibration_dataset, save_directory=save_dir)
156-
# Load the quantized model
157-
optimized_model = OVModelForSequenceClassification.from_pretrained(save_dir)
118+
q_model.save_pretrained(save_dir)
158119
```
120+
You can find more information in the [documentation](https://huggingface.co/docs/optimum-intel/en/openvino/optimization).
159121

160122

161123
## IPEX

docs/Dockerfile

Lines changed: 0 additions & 28 deletions
This file was deleted.

0 commit comments

Comments
 (0)