Skip to content

Conversation

@mhabedank
Copy link
Collaborator

@mhabedank mhabedank commented Nov 17, 2024

This PR tries to solve the dependency problems we have when going forward to modern Python version.

Things currently done:

  • moved from outdated build system and requirements files to hatch and pyproject.toml
  • removed torchtext, rewrote lots of tokenizer code.
  • removed importlib
  • bumped numpy, torch, scikit-learn and several other dependencies.

@github-actions
Copy link

github-actions bot commented Nov 17, 2024

Unit Test Results

0 tests   0 ✔️  0s ⏱️
0 suites  0 💤
0 files    0

Results for commit 063fe3b.

♻️ This comment has been updated with latest results.

@thc1006
Copy link

thc1006 commented Nov 1, 2025

Dear @mhabedank and @ethanreidel,

First of all, thank you both for the excellent work on PR #4041! I've carefully studied your technical approach, and I'm impressed by how well-designed the tokenizer refactoring is.

I've recently tested these tokenizer changes in my environment and found they work beautifully. I was wondering if I could contribute some test validation results to help move this PR forward.

I'd be happy to assist in whatever way works best for you:

Option A: If it's convenient, you could add me as a collaborator

  • This way I can directly push test-related improvements
  • All work would stay within this PR
  • You would, of course, remain the primary authors—I'd just be assisting with testing and validation

Option B: I can submit PRs to your branch

  • You could review and decide what to merge
  • This would be more structured, though it might add some review overhead

Here's what I've already completed:

  • ✅ Tested all 5 tokenizers in Docker (Python 3.10)
  • ✅ Verified backward compatibility
  • ✅ Created test results documentation (TEST_RESULTS.md)
  • ✅ All tests passing (5/5)

I noticed the CI shows "0 tests executed"—I'm not sure if it's a configuration issue, but if needed, I'd be happy to help with test-related setup as well.

My intention is purely to help this PR move forward—I have absolutely no desire to take credit or compete. If you don't need additional help or have other considerations, I completely understand and respect that.

Thank you again for your hard work! I look forward to potentially collaborating.

Best regards,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants