Description
Hi,
When trying to build tokenizers
for Python 3.13t, I think I noticed a problem inside xet-core
.
tokenizers
depends on huggingface_hub
, which with recent release introduced xet-core
dependency. When trying to build tokenizers
from source (pip install git+https://github.com/huggingface/tokenizers.git#subdirectory=bindings/python
), build fails since 2nd-order dependency (xet-core
) can't be build. I believe it's because the build system expects pyproject.toml
to be inside main directory. I believe the problem occurs in Python 3.13t only, since it's a version that does not support Limited API (abi3
), therefore the build system can't download hf-xet
package and tries to build it from scratch.
Issue that captures Python 3.13t support in tokenizers
: huggingface/tokenizers#1767
May I ask for confirmation, is this a real problem and shall it be tackled?
Steps to reproduce:
- Install Python 3.13t (e.g. using
pyenv
)
$ curl https://pyenv.run | bash
$ pyenv install 3.13t
$ pyenv global 3.13t
- Build
tokenizers
pip install git+https://github.com/huggingface/tokenizers.git#subdirectory=bindings/python
- Error:
Collecting hf-xet<2.0.0,>=1.1.0 (from huggingface-hub<1.0,>=0.16.4->tokenizers==0.21.2.dev0)
Downloading hf_xet-1.1.0.tar.gz (263 kB)
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... error
error: subprocess-exited-with-error
× Preparing metadata (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [343 lines of output]
[...]
💥 maturin failed
Caused by: Failed to normalize python source path `python`
Caused by: No such file or directory (os error 2)
Error running maturin: Command '['maturin', 'pep517', 'write-dist-info', '--metadata-directory', '/tmp/pip-modern-metadata-u7qvyddu', '--interpreter', '/usr/bin/python']' returned non-zero exit status 1.
Checking for Rust toolchain....
Running `maturin pep517 write-dist-info --metadata-directory /tmp/pip-modern-metadata-u7qvyddu --interpreter /usr/bin/python`