Skip to content

WIP Fix pangolin #7676

Draft
bernt-matthias wants to merge 4 commits intogalaxyproject:mainfrom
bernt-matthias:fix-pangolin
Draft

WIP Fix pangolin #7676
bernt-matthias wants to merge 4 commits intogalaxyproject:mainfrom
bernt-matthias:fix-pangolin

Conversation

@bernt-matthias
Copy link
Contributor

@bernt-matthias bernt-matthias commented Feb 12, 2026

@wm75 @pvanheus One of the pangolin tests is failing in the weekly tests. Problem is that the download of up-to-date data fails (in conda) with:

        File "/tmp/pip-build-env-md227tqc/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 333, in get_requires_for_build_wheel
          return self._get_build_requires(config_settings, requirements=[])
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/tmp/pip-build-env-md227tqc/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 301, in _get_build_requires
          self.run_setup()
        File "/tmp/pip-build-env-md227tqc/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 520, in run_setup
          super().run_setup(setup_script=setup_script)
        File "/tmp/pip-build-env-md227tqc/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 317, in run_setup
          exec(code, locals())
        File "<string>", line 4, in <module>
      ModuleNotFoundError: No module named 'pkg_resources'
      [end of output]

In order to fix this we would need to patch pangolin to add --no-build-isolation to the pip call.

IMO it might be better to remove the download functionality completely

  • data should be in the data table (we have a data manager)
  • each job downloads at least 140M

What do you think? Would this be acceptable?

FOR CONTRIBUTOR:

  • I have read the CONTRIBUTING.md document and this tool is appropriate for the tools-iuc repo.
  • License permits unrestricted use (educational + commercial)
  • This PR adds a new tool or tool collection
  • This PR updates an existing tool or tool collection
  • This PR does something else (explain below)

There are two labels that allow to ignore specific (false positive) tool linter errors:

  • skip-version-check: Use it if only a subset of the tools has been updated in a suite.
  • skip-url-check: Use it if github CI sees 403 errors, but the URLs work.

@bernt-matthias bernt-matthias marked this pull request as draft February 12, 2026 17:55
@bernt-matthias bernt-matthias mentioned this pull request Feb 12, 2026
49 tasks
@wm75
Copy link
Member

wm75 commented Feb 12, 2026

No objections against removing the latest data download. We decided to include the functionality during the pandemic because we thought it was just too much burden for instance admins to keep up with then frequent upstream updates and we did not want users to get outdated lineage classification results because of that.
Now that these hectic times are over, the reproducibility aspect is more important.

Interestingly, this might be one of those rare cases where not bumping the wrapper version might be good.
The latest version is the only one (on eu at least that has the issue) so not bumping would allow us to have only fully functional versions. A minor thing though and I doubt many people would run an old version with the latest download option.

@wm75
Copy link
Member

wm75 commented Feb 12, 2026

Oh, and one more thing. Please update the pangolin-data dependency to 1.36, which was the latest when this version of pangolin got released. Something to consider for the auto-updates in the future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this one here and the requirement got unlinked. So the current tool ships with 1.36, but reports 1.26 in the interface.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same for @CONSTELLATIONS_VERSION@, the autoupdate bot just seems to overwrite ("bump") these tokens in the requirements.

@pvanheus
Copy link
Contributor

I think @wm75 is basically correct - the pace of database updates has slowed. That said, I'm trying to find this pip call that you mentioned - is this the one in the build of the conda recipe or somewhere else?

Other groups have taken the approach of making a Docker container with pangolin with its latest data and updating from time to time. Since the most recently pangolin-data releases have been released on 14 January 2026, 10 November 2025, 11 September 2025 and 16 June 2025, it might now be reasonable to envision a "pangolin+pangolin-data" build recipe upstream. I'm not sure how to do that on the bioconda side, but the end result would be a tool that gets updated every few months and doesn't need data table maintenance or the download option.

(P.S. if the download option is removed, it would require an update to the sars-cov-2-pe-illumina-artic-ivar-analysis workflow in IWC)

@bernt-matthias
Copy link
Contributor Author

That said, I'm trying to find this pip call that you mentioned - is this the one in the build of the conda recipe or somewhere else?

https://github.com/cov-lineages/pangolin/blob/88e8128a003f7b96ecce5f7d320572547a303e16/pangolin/utils/update.py#L147

Other groups have taken the approach of making a Docker container with pangolin ...

Distributing reference data via pip / conda is possible, but maybe not the best approach: it's only data and no dependencies.

In the end the package only contains https://github.com/cov-lineages/pangolin-data .. if I'm not wrong?

@wm75
Copy link
Member

wm75 commented Feb 13, 2026

Other groups have taken the approach of making a Docker container with pangolin ...

Distributing reference data via pip / conda is possible, but maybe not the best approach: it's only data and no dependencies.

In the end the package only contains https://github.com/cov-lineages/pangolin-data .. if I'm not wrong?

I would also prefer to keep the system modular. We've done all the hard work to make it modular and tying data together with code when we don't have to seems wrong.

@wm75
Copy link
Member

wm75 commented Feb 13, 2026

(P.S. if the download option is removed, it would require an update to the sars-cov-2-pe-illumina-artic-ivar-analysis workflow in IWC)

This I can do for you.

Copy link
Member

@wm75 wm75 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @bernt-matthias for addressing this - just minor improvements to comment lines.

Comment on lines +52 to +53
<requirement type="package" version="@PANGOLIN_DATA_VERSION@">pangolin-data</requirement>
<requirement type="package" version="@CONSTELLATIONS_VERSION@">constellations</requirement>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<requirement type="package" version="@PANGOLIN_DATA_VERSION@">pangolin-data</requirement>
<requirement type="package" version="@CONSTELLATIONS_VERSION@">constellations</requirement>
<!-- Important: keep the following two versions tokenized since the tokens are reused in the inputs section!>
<requirement type="package" version="@PANGOLIN_DATA_VERSION@">pangolin-data</requirement>
<requirement type="package" version="@CONSTELLATIONS_VERSION@">constellations</requirement>

#end if
## Handle data components to be taken from data tables
## The folder structure pointed to by the data tables can be used
## as is except that cannot symlink the folders themselves since
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## as is except that cannot symlink the folders themselves since
## as is except that we cannot symlink the folders themselves since

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants