Skip to content

Commit 104d29f

Browse files
authored
Automate pypi publishing (#4239)
<!-- CURSOR_SUMMARY --> > [!NOTE] > **Medium Risk** > Introduces a new automated publishing workflow and modifies dependency-install semantics in CI/Docker, which could cause release or build failures if credentials, tags, or lockfile expectations are misconfigured. > > **Overview** > Adds an automated release pipeline: a new `release.yml` workflow triggers on published GitHub releases, validates the tag matches `unstructured.__version__`, builds via `uv build`, publishes to PyPI using trusted publishing, and *best-effort* uploads the same artifacts to Azure Artifacts via `twine`. > > Across CI, Docker, and Make targets, replaces `uv sync --frozen` with `uv sync --locked` and adds `uv run --no-sync` where `uv sync` already ran to avoid implicit re-syncing; introduces a new `release` dependency group (adds `twine`), bumps version to `0.20.2`, and updates `uv.lock` accordingly. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit c9555c9. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY -->
1 parent 78e21ca commit 104d29f

File tree

11 files changed

+376
-94
lines changed

11 files changed

+376
-94
lines changed

.github/actions/base-cache/action.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,5 +19,5 @@ runs:
1919
- name: Install dependencies
2020
shell: bash
2121
run: |
22-
uv sync --frozen --all-extras --all-groups
22+
uv sync --locked --all-extras --all-groups
2323
make install-nltk-models

.github/workflows/ci.yml

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -122,7 +122,7 @@ jobs:
122122
env:
123123
UNS_API_KEY: ${{ secrets.UNS_API_KEY }}
124124
run: |
125-
uv sync --frozen --group test
125+
uv sync --locked --group test
126126
make install-nltk-models
127127
make test-no-extras CI=true
128128
@@ -162,7 +162,7 @@ jobs:
162162
python-version: ${{ matrix.python-version }}
163163
- name: Install extra dependencies
164164
run: |
165-
uv sync --frozen ${{ matrix.uv-extras }} --group test
165+
uv sync --locked ${{ matrix.uv-extras }} --group test
166166
make install-nltk-models
167167
- name: Install system dependencies
168168
run: |
@@ -250,7 +250,7 @@ jobs:
250250
sudo apt-get install -y tesseract-ocr-kor
251251
sudo apt-get install diffstat
252252
tesseract --version
253-
uv run ./test_unstructured_ingest/test-ingest-src.sh
253+
uv run --no-sync ./test_unstructured_ingest/test-ingest-src.sh
254254
255255
test_json_to_html:
256256
strategy:
@@ -268,7 +268,7 @@ jobs:
268268
OVERWRITE_FIXTURES: "false"
269269
run: |
270270
sudo apt-get install diffstat
271-
uv run ./test_unstructured_ingest/check-diff-expected-output-html.sh
271+
uv run --no-sync ./test_unstructured_ingest/check-diff-expected-output-html.sh
272272
273273
test_json_to_markdown:
274274
strategy:
@@ -286,7 +286,7 @@ jobs:
286286
OVERWRITE_FIXTURES: "false"
287287
run: |
288288
sudo apt-get install diffstat
289-
uv run ./test_unstructured_ingest/check-diff-expected-output-markdown.sh
289+
uv run --no-sync ./test_unstructured_ingest/check-diff-expected-output-markdown.sh
290290
291291
changelog:
292292
runs-on: ubuntu-latest

.github/workflows/ingest-test-fixtures-update-pr.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -89,7 +89,7 @@ jobs:
8989
sudo apt-get install -y tesseract-ocr-kor
9090
sudo apt-get install diffstat
9191
tesseract --version
92-
uv run ./test_unstructured_ingest/test-ingest-src.sh
92+
uv run --no-sync ./test_unstructured_ingest/test-ingest-src.sh
9393
- name: Update HTML fixtures
9494
run: make html-fixtures-update
9595
- name: Update markdown fixtures

.github/workflows/release.yml

Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
name: Pypi Release
2+
3+
on:
4+
release:
5+
types:
6+
- published
7+
8+
permissions:
9+
contents: read
10+
id-token: write # Required for PyPI trusted publishing / attestations
11+
12+
concurrency:
13+
group: release
14+
cancel-in-progress: false
15+
16+
env:
17+
PYTHON_VERSION: "3.12"
18+
19+
jobs:
20+
release:
21+
runs-on: ubuntu-latest
22+
steps:
23+
- uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
24+
25+
- name: Install uv
26+
uses: astral-sh/setup-uv@d4b2f3b6ecc6e67c4457f6d3e41ec42d3d0fcb86 # v5
27+
with:
28+
python-version: ${{ env.PYTHON_VERSION }}
29+
30+
- name: Set up Python
31+
run: uv python install
32+
33+
- name: Install dependencies
34+
run: uv sync --locked --only-group release --no-install-project
35+
36+
- name: Validate version matches release tag
37+
env:
38+
TAG: ${{ github.event.release.tag_name }}
39+
run: |
40+
PKG_VERSION=$(uv run --no-sync python -c "from unstructured.__version__ import __version__; print(__version__)")
41+
if [[ "$TAG" != "$PKG_VERSION" && "$TAG" != "v$PKG_VERSION" ]]; then
42+
echo "Tag '$TAG' does not match package version '$PKG_VERSION'"
43+
exit 1
44+
fi
45+
46+
- name: Build artifact
47+
id: build
48+
run: uv build
49+
50+
- name: Publish package
51+
uses: pypa/gh-action-pypi-publish@ed0c53931b1dc9bd32cbe73a98c7f6766f8a527e # release/v1
52+
53+
# Best-effort: attempt Azure upload even if PyPI fails, but only if build succeeded.
54+
- name: Create .pypirc for Azure Artifacts
55+
if: always() && steps.build.outcome == 'success'
56+
run: |
57+
cat <<EOF > ~/.pypirc
58+
[distutils]
59+
index-servers =
60+
azure
61+
62+
[azure]
63+
repository: https://pkgs.dev.azure.com/${{ secrets.AZURE_ARTIFACTS_FEED }}/_packaging/${{ secrets.AZURE_ARTIFACTS_FEED }}/pypi/upload/
64+
username: ${{ secrets.AZURE_ARTIFACTS_USERNAME }}
65+
password: ${{ secrets.AZURE_ARTIFACTS_PAT }}
66+
EOF
67+
68+
- name: Publish package to Azure Artifacts
69+
if: always() && steps.build.outcome == 'success'
70+
run: |
71+
EXIT_CODE=0
72+
uv run --no-sync twine upload -r azure dist/* || EXIT_CODE=$?
73+
if [[ $EXIT_CODE -eq 0 ]]; then
74+
echo "Successfully published to Azure Artifacts (or already existed)"
75+
else
76+
echo "Azure Artifacts upload failed (exit code: $EXIT_CODE)"
77+
if [[ $EXIT_CODE -eq 1 ]]; then
78+
echo "This may be due to version conflicts or connectivity issues"
79+
fi
80+
echo "Azure Artifacts upload is non-critical - skipping failure"
81+
fi

CHANGELOG.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,10 @@
1+
## 0.20.2
2+
3+
### Enhancements
4+
- Add automated PyPI publishing: new `release.yml` GitHub Actions workflow triggers on GitHub release, builds the package with `uv build`, publishes to PyPI via `pypa/gh-action-pypi-publish`, and uploads to Azure Artifacts via `twine`
5+
- Replace `uv sync --frozen` with `uv sync --locked` across all CI workflows, Dockerfile, and Makefile to fail fast on stale lockfiles
6+
- Add `--no-sync` to all `uv run` and `uv build` commands that follow a prior `uv sync` step to prevent implicit re-syncing
7+
18
## 0.20.1
29

310
### Fixes

Dockerfile

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -73,11 +73,11 @@ ENV UV_COMPILE_BYTECODE=1
7373
ENV UV_PYTHON_DOWNLOADS=never
7474

7575
# Install Python dependencies via uv and download required NLTK packages
76-
RUN uv sync --frozen --all-extras --no-group dev --no-group lint --no-group test && \
76+
RUN uv sync --locked --all-extras --no-group dev --no-group lint --no-group test --no-group release && \
7777
mkdir -p ${NLTK_DATA} && \
78-
uv run $PYTHON -m nltk.downloader -d ${NLTK_DATA} punkt_tab averaged_perceptron_tagger_eng && \
79-
uv run $PYTHON -c "from unstructured.partition.model_init import initialize; initialize()" && \
80-
uv run $PYTHON -c "from unstructured_inference.models.tables import UnstructuredTableTransformerModel; model = UnstructuredTableTransformerModel(); model.initialize('microsoft/table-transformer-structure-recognition')"
78+
uv run --no-sync $PYTHON -m nltk.downloader -d ${NLTK_DATA} punkt_tab averaged_perceptron_tagger_eng && \
79+
uv run --no-sync $PYTHON -c "from unstructured.partition.model_init import initialize; initialize()" && \
80+
uv run --no-sync $PYTHON -c "from unstructured_inference.models.tables import UnstructuredTableTransformerModel; model = UnstructuredTableTransformerModel(); model.initialize('microsoft/table-transformer-structure-recognition')"
8181

8282
ENV PATH="/app/.venv/bin:${PATH}"
8383
ENV HF_HUB_OFFLINE=1

Makefile

Lines changed: 23 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ help: Makefile
1313
## install: install all dependencies via uv
1414
.PHONY: install
1515
install:
16-
@uv sync --frozen --all-extras --all-groups
16+
@uv sync --locked --all-extras --all-groups
1717
@$(MAKE) install-nltk-models
1818

1919
## lock: update and lock all dependencies
@@ -23,7 +23,7 @@ lock:
2323

2424
.PHONY: install-nltk-models
2525
install-nltk-models:
26-
uv run --frozen python -c "from unstructured.nlp.tokenize import download_nltk_packages; download_nltk_packages()"
26+
uv run --no-sync python -c "from unstructured.nlp.tokenize import download_nltk_packages; download_nltk_packages()"
2727

2828

2929
#################
@@ -38,62 +38,62 @@ export UNSTRUCTURED_INCLUDE_DEBUG_METADATA ?= false
3838
test:
3939
CI=$(CI) \
4040
UNSTRUCTURED_INCLUDE_DEBUG_METADATA=$(UNSTRUCTURED_INCLUDE_DEBUG_METADATA) \
41-
uv run --frozen --no-sync pytest -n auto test_${PACKAGE_NAME} --cov=${PACKAGE_NAME} --cov-report term-missing --durations=40
41+
uv run --no-sync pytest -n auto test_${PACKAGE_NAME} --cov=${PACKAGE_NAME} --cov-report term-missing --durations=40
4242

4343
.PHONY: test-no-extras
4444
test-no-extras:
4545
CI=$(CI) \
4646
UNSTRUCTURED_INCLUDE_DEBUG_METADATA=$(UNSTRUCTURED_INCLUDE_DEBUG_METADATA) \
47-
uv run --frozen --no-sync pytest -n auto \
47+
uv run --no-sync pytest -n auto \
4848
test_${PACKAGE_NAME}/partition/test_text.py \
4949
test_${PACKAGE_NAME}/partition/test_email.py \
5050
test_${PACKAGE_NAME}/partition/html/test_partition.py \
5151
test_${PACKAGE_NAME}/partition/test_xml.py
5252

5353
.PHONY: test-extra-csv
5454
test-extra-csv:
55-
CI=$(CI) uv run --frozen --no-sync pytest -n auto \
55+
CI=$(CI) uv run --no-sync pytest -n auto \
5656
test_unstructured/partition/test_csv.py \
5757
test_unstructured/partition/test_tsv.py
5858

5959
.PHONY: test-extra-docx
6060
test-extra-docx:
61-
CI=$(CI) uv run --frozen --no-sync pytest -n auto \
61+
CI=$(CI) uv run --no-sync pytest -n auto \
6262
test_unstructured/partition/test_doc.py \
6363
test_unstructured/partition/test_docx.py
6464

6565
.PHONY: test-extra-epub
6666
test-extra-epub:
67-
CI=$(CI) uv run --frozen --no-sync pytest -n auto test_unstructured/partition/test_epub.py
67+
CI=$(CI) uv run --no-sync pytest -n auto test_unstructured/partition/test_epub.py
6868

6969
.PHONY: test-extra-markdown
7070
test-extra-markdown:
71-
CI=$(CI) uv run --frozen --no-sync pytest -n auto test_unstructured/partition/test_md.py
71+
CI=$(CI) uv run --no-sync pytest -n auto test_unstructured/partition/test_md.py
7272

7373
.PHONY: test-extra-odt
7474
test-extra-odt:
75-
CI=$(CI) uv run --frozen --no-sync pytest -n auto test_unstructured/partition/test_odt.py
75+
CI=$(CI) uv run --no-sync pytest -n auto test_unstructured/partition/test_odt.py
7676

7777
.PHONY: test-extra-pdf-image
7878
test-extra-pdf-image:
79-
CI=$(CI) uv run --frozen --no-sync pytest -n auto test_unstructured/partition/pdf_image
79+
CI=$(CI) uv run --no-sync pytest -n auto test_unstructured/partition/pdf_image
8080

8181
.PHONY: test-extra-pptx
8282
test-extra-pptx:
83-
CI=$(CI) uv run --frozen --no-sync pytest -n auto \
83+
CI=$(CI) uv run --no-sync pytest -n auto \
8484
test_unstructured/partition/test_ppt.py \
8585
test_unstructured/partition/test_pptx.py
8686

8787
.PHONY: test-extra-pypandoc
8888
test-extra-pypandoc:
89-
CI=$(CI) uv run --frozen --no-sync pytest -n auto \
89+
CI=$(CI) uv run --no-sync pytest -n auto \
9090
test_unstructured/partition/test_org.py \
9191
test_unstructured/partition/test_rst.py \
9292
test_unstructured/partition/test_rtf.py
9393

9494
.PHONY: test-extra-xlsx
9595
test-extra-xlsx:
96-
CI=$(CI) uv run --frozen --no-sync pytest -n auto test_unstructured/partition/test_xlsx.py
96+
CI=$(CI) uv run --no-sync pytest -n auto test_unstructured/partition/test_xlsx.py
9797

9898
## check: runs all linters and checks
9999
.PHONY: check
@@ -102,8 +102,8 @@ check: check-ruff check-version
102102
## check-ruff: runs ruff linter and formatter check
103103
.PHONY: check-ruff
104104
check-ruff:
105-
uv run --frozen --no-sync ruff check .
106-
uv run --frozen --no-sync ruff format --check .
105+
uv run --no-sync ruff check .
106+
uv run --no-sync ruff format --check .
107107

108108
.PHONY: check-licenses
109109
check-licenses:
@@ -119,8 +119,8 @@ check-version:
119119
## tidy: auto-format and fix lint issues
120120
.PHONY: tidy
121121
tidy:
122-
uv run --frozen --no-sync ruff format .
123-
uv run --frozen --no-sync ruff check --fix-only --show-fixes .
122+
uv run --no-sync ruff format .
123+
uv run --no-sync ruff check --fix-only --show-fixes .
124124

125125
.PHONY: tidy-shell
126126
tidy-shell:
@@ -135,7 +135,7 @@ version-sync:
135135
## check-coverage: check test coverage meets threshold
136136
.PHONY: check-coverage
137137
check-coverage:
138-
uv run --frozen --no-sync coverage report --fail-under=90
138+
uv run --no-sync coverage report --fail-under=90
139139

140140
##########
141141
# Docker #
@@ -166,10 +166,10 @@ docker-test:
166166
-v ${CURRENT_DIR}/test_unstructured_ingest:/home/notebook-user/test_unstructured_ingest \
167167
$(if $(wildcard uns_test_env_file),--env-file uns_test_env_file,) \
168168
$(DOCKER_IMAGE) \
169-
bash -c "uv sync --frozen --all-extras --group test --no-install-project && \
169+
bash -c "uv sync --locked --all-extras --group test --no-install-project && \
170170
CI=$(CI) \
171171
UNSTRUCTURED_INCLUDE_DEBUG_METADATA=$(UNSTRUCTURED_INCLUDE_DEBUG_METADATA) \
172-
uv run pytest -n auto $(if $(TEST_FILE),$(TEST_FILE),test_unstructured)"
172+
uv run --no-sync pytest -n auto $(if $(TEST_FILE),$(TEST_FILE),test_unstructured)"
173173

174174
.PHONY: docker-smoke-test
175175
docker-smoke-test:
@@ -187,7 +187,7 @@ docker-jupyter-notebook:
187187

188188
.PHONY: run-jupyter
189189
run-jupyter:
190-
uv run jupyter-notebook --NotebookApp.token='' --NotebookApp.password=''
190+
uv run --no-sync jupyter-notebook --NotebookApp.token='' --NotebookApp.password=''
191191

192192

193193
###########
@@ -197,9 +197,9 @@ run-jupyter:
197197
.PHONY: html-fixtures-update
198198
html-fixtures-update:
199199
rm -r test_unstructured_ingest/expected-structured-output-html && \
200-
uv run test_unstructured_ingest/structured-json-to-html.sh test_unstructured_ingest/expected-structured-output-html
200+
uv run --no-sync test_unstructured_ingest/structured-json-to-html.sh test_unstructured_ingest/expected-structured-output-html
201201

202202
.PHONY: markdown-fixtures-update
203203
markdown-fixtures-update:
204204
rm -r test_unstructured_ingest/expected-structured-output-markdown && \
205-
uv run test_unstructured_ingest/structured-json-to-markdown.sh test_unstructured_ingest/expected-structured-output-markdown
205+
uv run --no-sync test_unstructured_ingest/structured-json-to-markdown.sh test_unstructured_ingest/expected-structured-output-markdown

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -147,7 +147,7 @@ Then install all dependencies (base, extras, dev, test, and lint groups):
147147
make install
148148
```
149149

150-
This runs `uv sync --frozen --all-extras --all-groups`, which creates a virtual environment
150+
This runs `uv sync --locked --all-extras --all-groups`, which creates a virtual environment
151151
and installs everything in one step. No need to manually create or activate a virtualenv.
152152

153153
To install only specific document-type extras:

pyproject.toml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -166,6 +166,9 @@ dev = [
166166
lint = [
167167
"ruff>=0.15.0, <1.0.0",
168168
]
169+
release = [
170+
"twine>=6.0.0, <7.0.0",
171+
]
169172

170173
[tool.uv]
171174
required-environments = [

unstructured/__version__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
__version__ = "0.20.1" # pragma: no cover
1+
__version__ = "0.20.2" # pragma: no cover

0 commit comments

Comments
 (0)