Skip to content

Add refresh v3 implementation #241

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 29 commits into
base: main
Choose a base branch
from
Open

Conversation

carlcsaposs-canonical
Copy link
Contributor

@carlcsaposs-canonical carlcsaposs-canonical commented Apr 2, 2025

Uses charm-refresh Python package: https://github.com/canonical/charm-refresh

@carlcsaposs-canonical carlcsaposs-canonical force-pushed the refresh-v3-draft branch 12 times, most recently from 9a114c0 to 9de758b Compare April 3, 2025 15:01
Avoid passing refresh object since we know reconcile will be called later (no need to explicitly call it & have it run twice)
@carlcsaposs-canonical carlcsaposs-canonical force-pushed the refresh-v3-draft branch 7 times, most recently from 3fddbf4 to d9bc860 Compare April 7, 2025 07:08
@carlcsaposs-canonical carlcsaposs-canonical force-pushed the refresh-v3-draft branch 5 times, most recently from d7d94fc to db4ec41 Compare April 10, 2025 07:04
@carlcsaposs-canonical
Copy link
Contributor Author

carlcsaposs-canonical commented Apr 10, 2025

You can test the refresh with these charmhub branches (built on #242):

  • dpe/edge/test-refresh-v3-8.0.40
  • dpe/edge/test-refresh-v3-8.0.41
  • dpe/edge/test-refresh-v3-8.0.41-2 - use to test charm-only refresh from 8.0.41
  • dpe/edge/test-refresh-v3-incompat - simulates incompatible workload or charm version
  • dpe/edge/test-refresh-v3-precheckfail - simulates failed pre check
  • dpe/edge/test-refresh-v3-uncaught - simulates uncaught exception (e.g. bug, incorrect arch, etc.)
  • dpe/edge/test-refresh-v3-uncaught-before-snap-refresh - simulates uncaught exception before snap is refreshed
  • dpe/edge/test-refresh-v3-uncaught-except-upgrade-charm-event - simulates uncaught exception, but doesn't raise on upgrade-charm event (juju will get stuck on rollback if exception raised during upgrade charm event [unless --force-units used on refresh])

or modify the branch locally & re-pack the charm

@sinclert-canonical sinclert-canonical force-pushed the refresh-v3-draft branch 2 times, most recently from 697ae63 to ab25c33 Compare June 5, 2025 14:30
@sinclert-canonical
Copy link
Contributor

Seems like the charm-refresh-build-version package is forcing us to have some v<WORKLOAD_VERSION>/<CHARM_VERSION> tags in the GitHub repository, similarly to what PostgreSQL VM operator has (reference).

Is this right @carlcsaposs-canonical?

@carlcsaposs-canonical
Copy link
Contributor Author

Seems like the charm-refresh-build-version package is forcing us to have some v<WORKLOAD_VERSION>/<CHARM_VERSION> tags in the GitHub repository, similarly to what PostgreSQL VM operator has (reference).

Is this right @carlcsaposs-canonical?

yes, see https://docs.google.com/document/d/1Jv1jhWLl8ejK3iJn7Q3VbCIM9GIhp8926bgXpdtx-Sg/edit?tab=t.0 for more info

@sinclert-canonical
Copy link
Contributor

yes, see https://docs.google.com/document/d/1Jv1jhWLl8ejK3iJn7Q3VbCIM9GIhp8926bgXpdtx-Sg/edit?tab=t.0

Judging by PostgreSQL 16 tags, as well as the proposed contents in the release.yaml workflow, we are in a chicken-and-egg situation here, where the build CI steps will not pass until this PR is merged, and the first v8.0/1.0.0 tag created.

Should we temporarily comment the new charmcraft.yaml contents to verify that integration tests pass?

@carlcsaposs-canonical
Copy link
Contributor Author

Judging by PostgreSQL 16 tags, as well as the proposed contents in the release.yaml workflow, we are in a chicken-and-egg situation here, where the build CI steps will not pass until this PR is merged, and the first v8.0/1.0.0 tag created.

Should we temporarily comment the new charmcraft.yaml contents to verify that integration tests pass?

yes, temporarily reverting charmcraft.yaml changes & temporarily hard-coding charm version in refresh_versions.toml is okay for testing as long as we revert before merge

@sinclert-canonical
Copy link
Contributor

Given this comment on the PostgreSQL refresh v3 implementation, my understanding is that the errors in the last CI run, are due to the refresh-v3 incompatibility to refresh from non refresh-v3 charms. Correct @carlcsaposs-canonical?

Traceback:

Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-mysql-router-0/charm/venv/lib/python3.10/site-packages/charm_refresh/_main.py", line 807, in from_app_databag
    workload=databag["original_workload_version"],
  File "/var/lib/juju/agents/unit-mysql-router-0/charm/venv/lib/python3.10/site-packages/charm_json/_main.py", line 211, in __getitem__
    return _load(parent=self, parent_key=key, data=json.loads(self._databag[key]))
  File "/var/lib/juju/agents/unit-mysql-router-0/charm/venv/lib/python3.10/site-packages/charm_/_main.py", line 117, in __getitem__
    raise KeyError(key)
KeyError: 'original_workload_version'

@carlcsaposs-canonical
Copy link
Contributor Author

due to the refresh-v3 incompatibility to refresh from non refresh-v3 charms

Yes, (without a migration) refresh v3 does not support refreshing from non-refresh v3 charms

The full traceback shows

ValueError: Refresh failed. Automatic recovery not possible. Original versions in app databag are missing or invalid

@sinclert-canonical sinclert-canonical force-pushed the refresh-v3-draft branch 7 times, most recently from 6503990 to a89e90a Compare June 11, 2025 07:38
@sinclert-canonical
Copy link
Contributor

📢 This PR is now ready for review + merge

DISCLAIMERS:

  • I do not advice for nor advice against the use of refresh-v3.
  • I am not the author of this PR nor I indeed to be (god forbid).
  • Any questions about its design, architecture or support must be redirected to the original author.

I made unit + integration tests pass, by relying on a a non-obvious setup, required by how refresh-v3 is implemented:

How to make unit tests pass?

  1. Move every interaction with charm_refresh and snap_lib within the class / function that use them. This is needed in order to properly mock their behavior, following the approach of PostgreSQL VM operator PR.
  2. Mock the contents of the refresh_versions.toml. To this end, I tried to follow the approach taken in the PostgreSQL VM operator, where a session level fixture replaces the contents of the file during the tests duration (reference). However that does not work with pytest-xdist, as session level fixtures are not guaranteed to be run just once. Alternatives:
    • A) Rely on filelock: officially recommended, but impractical due to absurdly high number of tests.
    • B) Mock tomli read: effective, but not recommended nor consistent with PostgreSQL VM operator fixture.

See commits:

How to make integration tests pass?

  1. Adapt every assertion rely on on resume-upgrade language to assert on resume-refresh.
  2. Run the resume-refresh action not in the leader, but on the non-leader unit with the smallest identifier.
  3. Remove the injection of +testupgrade and +testrollback suffixes to the workload version, due to new parsing logic.

See commits:

Temporal changes

Despite the described changes, refresh v3 can only be tested when upgrading from another refresh v3 charm (reference), in a repository which already started publishing charm versions as tags (in the case of MySQL Router: 8.0/X.Y.Z). As this is not the case, we needed to "fake" several behaviors:

  1. The build process will not write the charm version (as there are no tags published yet).
  2. The charm version needs to be hardcoded in the refresh_versions.toml file.
  3. An existing refresh-v3 charm needs to be already available at the channel used for the integration tests. To this end, I published an AMD64 locally pack charm, with the version 8.0/1.0.0 to the dpe/latest/test-refresh-v3 channel.

See commits:


Follow up

Now the whole test suite passes, so this is ready to be merged + a refresh-v3 version of the charm published. WHenever that happens, we must create a follow-up PR in order to revert the [Temp] prefixed commits.

@sinclert-canonical sinclert-canonical marked this pull request as ready for review June 11, 2025 08:46
Copy link
Contributor Author

@carlcsaposs-canonical carlcsaposs-canonical left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you!

(reviewed diff from fb9d59a)

Comment on lines +4 to +5
charm = "8.0/1.1.0"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
charm = "8.0/1.1.0"

this needs to be removed before merging

also, I believe this will affect the integration test—we will need to run force-refresh-start with check-compatibility=false on PR CI (but not after merging) since development builds will be marked as incompatible (so that if a user ever accidentally deploys a development version of the charm, it's marked as incompatible). see postgresql-operator 16/edge for an example test that implements this


logger.info("Running resume-upgrade on the mysql router leader unit")
await run_action(mysql_router_leader_unit, "resume-upgrade")
logger.info("Running resume-refresh on the mysql router leader unit")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logger.info("Running resume-refresh on the mysql router leader unit")
logger.info("Running resume-refresh")

not necessarily leader

@@ -39,6 +35,7 @@
TEST_APP_NAME = APPLICATION_DEFAULT_APP_NAME


@markers.amd64_only
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be helpful for me to rebase #242 so that we can remove the amd64 only marker before merge?

Comment on lines +201 to +202
# charm needs to refresh snap to be able to avoid no-op when upgrading.
# set an old revision of the snap
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider also setting verisons["workload"] to avoid confusion in juju status output of test

Comment on lines +215 to +218
old_version = Version(versions["workload"])
new_version = Version(f"{old_version.major - 1}.{old_version.minor}.{old_version.micro}")
versions["workload"] = str(new_version)
versions["charm"] = "8.0/0.0.0"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: there's no need to update the workload version, setting charm version will be enough—which means we can get rid of the packaging import/dependency in integration tests

Comment on lines +92 to +96
# Create venv in `..` so that git working tree is not dirty
# python3 -m venv ../refresh-version-venv
# source ../refresh-version-venv/bin/activate
# poetry install --only build-refresh-version
# write-charm-version
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Create venv in `..` so that git working tree is not dirty
# python3 -m venv ../refresh-version-venv
# source ../refresh-version-venv/bin/activate
# poetry install --only build-refresh-version
# write-charm-version
# Create venv in `..` so that git working tree is not dirty
python3 -m venv ../refresh-version-venv
source ../refresh-version-venv/bin/activate
poetry install --only build-refresh-version
write-charm-version

this needs to be applied before merge

Comment on lines +57 to +69
def _tomli_load(*args, **kwargs) -> dict:
return {
"charm_major": 1,
"workload": "8.0.0",
"charm": "v8.0/1.0.0",
"snap": {
"name": "charmed-mysql",
"revisions": {
"x86_64": "1",
"aarch64": "1",
},
},
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would patching charm_refresh.snap_name work instead? if so, I think we can write our mocks to not depend on charm_refresh's private api

Assumes snap is installed
"""
snap_name = charm_refresh.snap_name()
snap_unit_path = pathlib.Path(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
snap_unit_path = pathlib.Path(
installed_by_unit = pathlib.Path(

nit

(same for other instances of snap_unit_path)

@paulomach
Copy link
Contributor

@carlcsaposs-canonical , do you mind making the remaining changes? we can streamline the pr execution

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature, UI change, or workload upgrade Libraries: Out of sync
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants