Skip to content

perf!: testing yaml optimisations#2361

Draft
james-garner-canonical wants to merge 16 commits intocanonical:mainfrom
james-garner-canonical:26-03+feat+testing-yaml-optimisations
Draft

perf!: testing yaml optimisations#2361
james-garner-canonical wants to merge 16 commits intocanonical:mainfrom
james-garner-canonical:26-03+feat+testing-yaml-optimisations

Conversation

@james-garner-canonical
Copy link
Contributor

@james-garner-canonical james-garner-canonical commented Mar 3, 2026

This PR takes the most drastic changes possible to address the performance concerns raised in #1498. My thinking is that we can see what this breaks and what the maximum possible performance gains are, and then decide what changes (if any) we should actually make.

Changes

  • Whenever a Context object is created, we load the metadata (trying charmcraft.yaml first and falling back to the legacy files if needed) (so 1x open and 1x YAML load)

This PR caches the file loads performed by _CharmSpec.autoload, so we only read from disc once per file per testing process instead of once per Context creation (e.g. once per test).

  • With each run, we create an ops.CharmMeta object, reading the text of the file written above from metadata.yaml and actions.yaml (and probably in the future config.yaml) and doing a YAML load.

This PR eliminates these extra file loads entirely by creating the CharmMeta object from the already created _CharmSpec object. We still create the temporary directory, as it's where container writes occur.

  • With each run, we create a temporary directory and write the three (this is simulating the Juju state) files there (so 3x open and 3x YAML dump), using the (dictionary) data from the charm spec. This directory is discarded after the run.

This PR drops these file writes. It seems they were only used for creating CharmMeta objects. In principle, charms could assume that these files are exist and try to read them, but apparently this doesn't come up in any of the tests for charms we check.

Running super-tox results in 70 of the 153 charms passing on both main and on this branch.

Two of our tests that explicitly assert on behaviour changed in this PR changes have been updated.

Benchmarking

Benchmark results look promising. The fast benchmarks don't seem to have changed much, while the slowest one (test_many_tests_autoload_meta) goes from ~9.87 ms to ~4.53 ms. That said, these numbers indicate that there don't really seem to be major performance gains on the table here.

Benchmark results on main
--------------------------------------------------------------------------------------------------------- benchmark: 10 tests ---------------------------------------------------------------------------------------------------------
Name (time in us)                                    Min                     Max                   Mean                 StdDev                Median                 IQR            Outliers          OPS            Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_context_explicit_meta_config_actions        10.2490 (1.0)        2,441.0980 (23.90)        12.8702 (1.05)         26.6113 (9.39)        11.6620 (1.0)        0.9920 (1.02)      57;1020  77,699.0609 (0.95)      14458           1
test_context_explicit_meta                       10.4990 (1.02)         298.6600 (2.92)         12.2551 (1.0)           7.0336 (2.48)        11.7120 (1.00)       0.9720 (1.0)        26;173  81,598.7972 (1.0)        2986           1
test_full_state                                  42.5590 (4.15)         102.1420 (1.0)          45.2166 (3.69)          2.8327 (1.0)         44.7840 (3.84)       1.0720 (1.10)      146;276  22,115.7564 (0.27)       4332           1
test_context_autoload_meta                      890.4900 (86.89)      1,532.4740 (15.00)       923.1925 (75.33)        39.5865 (13.97)      917.5610 (78.68)     19.9470 (20.52)       22;24   1,083.1977 (0.01)        864           1
test_run_no_observer                          1,939.4570 (189.23)     5,220.7090 (51.11)     2,186.3798 (178.41)      450.4242 (159.01)   2,072.2960 (177.70)    94.9130 (97.65)       12;15     457.3771 (0.01)        147           1
test_run_observed                             2,098.2141 (204.72)     7,426.1350 (72.70)     2,338.8252 (190.85)      539.9316 (190.61)   2,199.7551 (188.63)    97.3194 (100.12)      16;24     427.5651 (0.01)        235           1
test_deferred_events                          2,197.5910 (214.42)     6,072.2670 (59.45)     2,439.1141 (199.03)      564.1207 (199.15)   2,312.8470 (198.32)   104.4535 (107.46)      10;16     409.9849 (0.01)        189           1
test_lots_of_logs                             3,725.1250 (363.46)   170,508.0220 (>1000.0)   5,195.1735 (423.92)   13,039.0260 (>1000.0)  3,971.1170 (340.52)   275.9275 (283.87)       1;24     192.4864 (0.00)        163           1
test_many_tests_explicit_meta                 4,435.6780 (432.79)    77,010.1980 (753.95)    5,682.7027 (463.70)    6,314.0333 (>1000.0)  4,983.3650 (427.32)   334.6597 (344.30)       1;13     175.9726 (0.00)        131           1
test_many_tests_autoload_meta                 9,500.5250 (926.97)    16,799.5220 (164.47)   10,245.4877 (836.02)    1,166.4413 (411.78)   9,867.5980 (846.14)   267.6975 (275.41)       9;16      97.6039 (0.00)         88           1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark results on this branch
-------------------------------------------------------------------------------------------------------- benchmark: 10 tests ---------------------------------------------------------------------------------------------------------
Name (time in us)                                    Min                     Max                  Mean                 StdDev                Median                 IQR            Outliers          OPS            Rounds  Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_context_explicit_meta_config_actions        10.3290 (1.0)        2,262.8430 (6.69)        12.4424 (1.0)          21.8495 (2.74)        11.5820 (1.0)        0.7020 (1.0)       40;1393  80,370.4021 (1.0)       13371           1
test_context_explicit_meta                       10.3390 (1.00)         338.3950 (1.0)         13.1573 (1.06)          7.9888 (1.0)         12.1420 (1.05)       1.9240 (2.74)       26;330  76,003.6978 (0.95)       2832           1
test_context_autoload_meta                       21.0900 (2.04)       1,410.2950 (4.17)        26.2959 (2.11)         20.7144 (2.59)        23.0830 (1.99)       1.3230 (1.88)     406;1323  38,028.7317 (0.47)       9708           1
test_full_state                                  43.2210 (4.18)         614.7830 (1.82)        46.5991 (3.75)         14.9854 (1.88)        45.6150 (3.94)       1.3230 (1.88)       26;147  21,459.6533 (0.27)       3636           1
test_run_no_observer                            938.7000 (90.88)      4,585.6890 (13.55)    1,134.1832 (91.15)       390.6378 (48.90)    1,073.5930 (92.69)     95.5240 (136.08)       4;10     881.6917 (0.01)        156           1
test_run_observed                             1,081.8780 (104.74)    77,891.7310 (230.18)   1,376.8241 (110.66)    2,717.4299 (340.16)   1,180.6090 (101.93)    81.1320 (115.58)       3;84     726.3092 (0.01)        814           1
test_deferred_events                          1,170.1640 (113.29)   308,802.2290 (912.55)   1,966.3871 (158.04)   13,236.6985 (>1000.0)  1,290.1495 (111.39)    93.6805 (133.45)       1;41     508.5469 (0.01)        540           1
test_lots_of_logs                             2,709.6710 (262.34)   270,928.1870 (800.63)   3,983.4068 (320.15)   15,818.0428 (>1000.0)  2,916.2080 (251.79)   162.9275 (232.10)       1;39     251.0414 (0.00)        287           1
test_many_tests_explicit_meta                 3,411.3160 (330.27)    14,443.7620 (42.68)    4,206.8873 (338.11)    1,626.4021 (203.59)   3,732.5740 (322.27)   501.5410 (714.46)       8;17     237.7054 (0.00)        130           1
test_many_tests_autoload_meta                 4,092.3530 (396.20)   201,823.7970 (596.41)   5,941.3242 (477.51)   14,641.9881 (>1000.0)  4,527.3335 (390.89)   533.8110 (760.43)       1;23     168.3126 (0.00)        182           1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Resolves: #1498

Copy link
Collaborator

@tonyandrewmeyer tonyandrewmeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like most of these changes - need to review the performance numbers a bit. I'm a bit concerned about charms not being able to read the YAML files. I assume that's going to come up in the discussion today.

@dimaqq
Copy link
Contributor

dimaqq commented Mar 4, 2026

stand-up comments: a more useful metric would be if a super-tox run is faster and by how much, assuming that the same number of tests a run and pass.

Merge branch 'main' into 26-03+feat+testing-yaml-optimisations
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Reduce the amount of YAML load/dump calls present in Scenario tests

3 participants