ALEPH-615 Expose resources for executions by olethanh · Pull Request #833 · aleph-im/aleph-vm

olethanh · 2025-09-16T13:38:44Z

Add a new resources field on existing endpoint /v2/about/executions/resources to expose resources for instances

Related ClickUp, GitHub or Jira tickets : ALEPH-615

Opening this PR so we can discuss what info we need and the format.

Examples of what this endpoint will return, this is basically the same endpoint with added resource field.

{
  "decadecadecadecadecadecadecadecadecadecadecadecadecadecadecadeca": {
    "networking": {
      "ipv4_network": "172.16.4.0/24",
      "host_ipv4": "64.227.122.196",
      "ipv6_network": "2a01:e0a:8b1:95e1:3:deca:deca:dec0/124",
      "ipv6_ip": "2a01:e0a:8b1:95e1:3:deca:deca:dec1",
      "ipv4_ip": "172.16.4.2",
      "mapped_ports": {
        "22": {
          "host": 24001,
          "tcp": true,
          "udp": false
        }
      }
    },
    "resources": {
      "vcpus": 1,
      "memory": 512,
      "disk_mib": 2048,
    },
    "status": {
      "defined_at": "2025-09-16 13:35:18.983564+00:00",
      "preparing_at": "2025-09-16 13:35:19.012594+00:00",
      "prepared_at": "2025-07-29 11:11:35.853427",
      "starting_at": "2025-07-29 11:11:35.891511",
      "started_at": "2025-09-16 13:35:19.075488+00:00",
      "stopping_at": null,
      "stopped_at": null
    }
  }
}

Update README.md Minor typo fixed

* Garbate collector: Free disk space from inactive VM Add a script to manually list and remove volume linked to inactive VM. It fetches data from the scheduler and pyaleph main's node as to fetch information on the status of the VM. Then display them to the user to determine if they can be removed safely. JIRA ticket ALEPH-37 * Add diagnostic vm to ignore list

* Feature: Added options to get GPU compatibilities from a settings aggregate. * Fix: Refactored to also return the model name from the aggregate and use the same device_id format. * Fix: Include GPU list and move the VM egress IPv6 check on the connectivity check to start notifying the users about the next requirement. * Fix: Solved code quality issues. * Fix: Put definitive settings aggregate address * Fix: Solved issue with type casting and moved the aggregate check. * Check community payment flow (#751) * Implement community payment check WIP * isort * Check community flow at allocation * Community flow : fix after testing * mod Use singleton for the Setting Aggregate * fix test * Implement community wallet start time --------- Co-authored-by: Olivier Le Thanh Duong <olivier@lethanh.be>

Feature: Upgrade to new `aleph-message` version

the ipv6 egress check was showing ok even while the test was failing and returning result:False

Feature: Create Ubuntu 24.04 QEMU runtime

CRN with bad IPv6 config were having bad metrics, as /status/check/fastapi was too slow which looking further into it , came from /vm/63faf8b5db1cf8d965e6a464a0cb8062af8e7df131729e48738342d956f29ace/internet always taking 5 seconds, even if it ultimately returned positive resurlt. Coincidentally 5 seconds is the timeout configured in check_url inside the diagnostic vm (example_fastapi/main.py) To reproduce issue set the DNS server via environment variable ```env ALEPH_VM_DNS_NAMESERVERS = '[2001:41d0:3:163::1, 9.9.9.9]' ``` that ipv6 server being unreachable and set ip pool to default one ``` ALEPH_VM_IPV6_ADDRESS_POOL = fc00:1:2:3::/64 ``` Solution: Further analysis is that the diagnostic VM tried to connect to the ipv6 DNS server, did timout after 5 seconds then proceeded normally with the ipv4 DNS Put the ipv4 DNS server BEFORE the ipv6 one. Also disable the ipv6 server if ipv6 is not enabled for the VM. That modification was done for instances but failed to be reproduced for programs

* Fix start error in load_persistent_executions Fix parse error that prevented Supervisor from starting when loading persistant executions, as it could not parse the gpu field Traceback (most recent call last): File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/opt/aleph-vm/aleph/vm/orchestrator/__main__.py", line 4, in <module> main() File "/opt/aleph-vm/aleph/vm/orchestrator/cli.py", line 379, in main supervisor.run() File "/opt/aleph-vm/aleph/vm/orchestrator/supervisor.py", line 184, in run asyncio.run(pool.load_persistent_executions()) File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run return loop.run_until_complete(main) File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete return future.result() File "/opt/aleph-vm/aleph/vm/pool.py", line 252, in load_persistent_executions execution.gpus = parse_raw_as(List[HostGPU], saved_execution.gpus) File "pydantic/tools.py", line 74, in pydantic.tools.parse_raw_as obj = load_str_bytes( File "pydantic/parse.py", line 37, in pydantic.parse.load_str_bytes return json_loads(b) File "/usr/lib/python3.10/json/__init__.py", line 339, in loads raise TypeError(f'the JSON object must be str, bytes or bytearray, ' TypeError: the JSON object must be str, bytes or bytearray, not NoneType * Fix packaging issue caused by sevctl

… notify endpoint rejects the VM allocation because a floating point calculation on the community flow percentage. Solution: Apply a floor rule on price calculation according to frontend/CLI price calculation.

This fix issue reported by node owner where if they enable a ipv6 and it's not working, it also break the diagnostic /internet check

Error in CI https://github.com/aleph-im/aleph-vm/actions/runs/13458482852/job/37788353003 pip3 install --progress-bar off --target ./aleph-vm/opt/aleph-vm/ 'aleph-message==0.6' 'eth-account==0.10' 'sentry-sdk==1.31.0' 'qmp==1.1.0' 'aleph-superfluid~=0.2.1' 'sqlalchemy[asyncio]>=2.0' 'aiosqlite==0.19.0' 'alembic==1.13.1' 'aiohttp_cors==0.7.0' 'pyroute2==0.7.12' 'python-cpuid==0.1.0' 'solathon==1.0.2' 'protobuf==5.28.3' Collecting aleph-message==0.6 Downloading aleph_message-0.6.0-py3-none-any.whl (17 kB) Collecting eth-account==0.10 Downloading eth_account-0.10.0-py3-none-any.whl (109 kB) Collecting sentry-sdk==1.31.0 Downloading sentry_sdk-1.31.0-py2.py3-none-any.whl (224 kB) Collecting qmp==1.1.0 Downloading qmp-1.1.0-py3-none-any.whl (11 kB) Collecting aleph-superfluid~=0.2.1 Downloading aleph_superfluid-0.2.1.tar.gz (20 kB) Preparing metadata (setup.py) ... 25l-� �done 25hCollecting sqlalchemy[asyncio]>=2.0 Downloading SQLAlchemy-2.0.38-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.1 MB) Collecting aiosqlite==0.19.0 Downloading aiosqlite-0.19.0-py3-none-any.whl (15 kB) Collecting alembic==1.13.1 Downloading alembic-1.13.1-py3-none-any.whl (233 kB) Collecting aiohttp_cors==0.7.0 Downloading aiohttp_cors-0.7.0-py3-none-any.whl (27 kB) Collecting pyroute2==0.7.12 Downloading pyroute2-0.7.12-py3-none-any.whl (460 kB) Collecting python-cpuid==0.1.0 Downloading python-cpuid-0.1.0.tar.gz (21 kB) Installing build dependencies ... 25l-� �\� �|� �/� �-� �done 25h Getting requirements to build wheel ... 25l-� �error error: subprocess-exited-with-error × Getting requirements to build wheel did not run successfully. │ exit code: 1 ╰─> [40 lines of output] An error occurred while building the project, please ensure you have the most updated version of setuptools, setuptools_scm and wheel with: pip install -U setuptools setuptools_scm wheel Traceback (most recent call last): File /usr/lib/python3/dist-packages/pip/_vendor/pep517/in_process/_in_process.py, line 363, in <module> main() File /usr/lib/python3/dist-packages/pip/_vendor/pep517/in_process/_in_process.py, line 345, in main json_out['return_val'] = hook(**hook_input['kwargs']) File /usr/lib/python3/dist-packages/pip/_vendor/pep517/in_process/_in_process.py, line 130, in get_requires_for_build_wheel return hook(config_settings) File /usr/lib/python3/dist-packages/setuptools/build_meta.py, line 162, in get_requires_for_build_wheel return self._get_build_requires( File /usr/lib/python3/dist-packages/setuptools/build_meta.py, line 143, in _get_build_requires self.run_setup() File /usr/lib/python3/dist-packages/setuptools/build_meta.py, line 158, in run_setup exec(compile(code, __file__, 'exec'), locals()) File setup.py, line 18, in <module> setup( File /usr/lib/python3/dist-packages/setuptools/__init__.py, line 153, in setup return distutils.core.setup(**attrs) File /usr/lib/python3/dist-packages/setuptools/_distutils/core.py, line 109, in setup _setup_distribution = dist = klass(attrs) File /usr/lib/python3/dist-packages/setuptools/dist.py, line 459, in __init__ _Distribution.__init__( File /usr/lib/python3/dist-packages/setuptools/_distutils/dist.py, line 293, in __init__ self.finalize_options() File /usr/lib/python3/dist-packages/setuptools/dist.py, line 836, in finalize_options for ep in sorted(loaded, key=by_order): File /usr/lib/python3/dist-packages/setuptools/dist.py, line 835, in <lambda> loaded = map(lambda e: e.load(), filtered) File /usr/lib/python3/dist-packages/pkg_resources/__init__.py, line 2464, in load self.require(*args, **kwargs) File /usr/lib/python3/dist-packages/pkg_resources/__init__.py, line 2487, in require items = working_set.resolve(reqs, env, installer, extras=self.extras) File /usr/lib/python3/dist-packages/pkg_resources/__init__.py, line 782, in resolve raise VersionConflict(dist, req).with_context(dependent_req) pkg_resources.VersionConflict: (setuptools 59.6.0 (/usr/lib/python3/dist-packages), Requirement.parse('setuptools>=61')) [end of output]

Since Python code runs asynchronously in the same process, sharing the global sys.stdout, prints from an individual call cannot be isolated from other calls.

Was causing issue when instances failed to start and there was no persistent VM Jira ticket ALEPH-436 (part of ALEPH-436 )

Wants= Configures (weak) requirement dependencies on other units. Requires= Similar to Wants=, but declares a stronger requirement dependency. If this unit gets activated, the units listed will be activated as well. If one of the other units fails to activate, and an ordering dependency After= on the failing unit is set, this unit will not be started. Besides, with or without specifying After=, this unit will be stopped (or restarted) if one of the other units is explicitly stopped (or restarted). Documentation: https://www.freedesktop.org/software/systemd/man/latest/systemd.unit.html

Server api1.aleph.im is very slow and outdated (Core i7 from 2018, up to 40 seconds to respond to `/metrics` in the monitoring). We suspect that this causes issues in the monitoring and performance of the network. This branch removes all references to api1 and replaces them with api3 where relevant. Co-authored-by: Bram <cortex@worlddomination.be> Co-authored-by: Olivier Le Thanh Duong <olivier@lethanh.be>

Bumps [sentry-sdk](https://github.com/getsentry/sentry-python) from 1.31 to 2.8.0. - [Release notes](https://github.com/getsentry/sentry-python/releases) - [Changelog](https://github.com/getsentry/sentry-python/blob/master/CHANGELOG.md) - [Commits](getsentry/sentry-python@1.31.0...2.8.0) --- updated-dependencies: - dependency-name: sentry-sdk dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>

Fix: Put the correct aleph-message version on the package generation script.

Adding type annotation for a better code clarity and safety. This has been done on the migration of pydantic branch, but adding it first because it should not break the code and facilitate the merge of the pydantic PR

This bumps their versions to - Kubo 0.23.0 -> 0.33.1 The list of changes regarding Kubo too large to be mentioned here, but I mostly expect performance improvements as the main API has not changed much.

Due to an error reading the denylist. The error was: ``` Error: constructing the node (see log for full detail): error walking /home/ipfs/.config/ipfs/denylists: lstat /home/ipfs/.config/ipfs/denylists: permission denied ``` The [documentation](https://specs.ipfs.tech/compact-denylist-format/) mentions that: > Implementations SHOULD look in /etc/ipfs/denylists/ and > $XDG_CONFIG_HOME/ipfs/denylists/ (default: ~/.config/ipfs/denylists) > for denylist files. I am not sure why this only failed on Ubuntu 22.04 and not Debian 12 or Ubuntu 24.04. My first assumption would be a difference in Systemd.

Needed for some crn

…tible with multiple payment methods.

codecov · 2025-09-18T13:46:08Z

Codecov Report

❌ Patch coverage is 55.55556% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.76%. Comparing base (105c7f0) to head (23087df).

Files with missing lines	Patch %	Lines
src/aleph/vm/orchestrator/utils.py	55.55%	2 Missing and 2 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #833      +/-   ##
==========================================
+ Coverage   64.68%   64.76%   +0.08%     
==========================================
  Files          88       88              
  Lines        8160     8169       +9     
  Branches      734      737       +3     
==========================================
+ Hits         5278     5291      +13     
+ Misses       2653     2647       -6     
- Partials      229      231       +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Co-authored-by: nesitor <amolinsdiaz@yahoo.es>

…nt ones, to avoid getting Out-Of-Memory on the CRNs.

…endpoint.

…f decimal.

odesenfans · 2025-10-21T18:33:11Z

src/aleph/vm/orchestrator/utils.py

+        if getattr(volume, "size_mib", None):
+            disk_size_mib += volume.size_mib
+
+    return disk_size_mib


Why do we need this if it's coming from the message itself? I don't like that the node can report whatever it wants.

Because on the message we don't have the real sizes used on the VMs, like on volumes and on runtime and code. The idea of this PR is to show the real resources used by the VMs. Indeed the PR need tobe finished solving the TODO tasks, I have put on review just to check the format.

From our conversation earlier, the goal of this PR is unclear. Let's not merge it until its purpose is clarified.

gdelfino and others added 30 commits February 17, 2025 14:56

Minor typo in README.md fixed (#752)

cc47d58

Update README.md Minor typo fixed

Upgrade aleph-message version (#753)

7f6ff3e

Feature: Upgrade to new `aleph-message` version

Fix IPv6 egress check on index falesly showing ok

81618bc

the ipv6 egress check was showing ok even while the test was failing and returning result:False

fix typo

bb1158a

Create Ubuntu 24.04 QEMU runtime (#754)

4c4354d

Feature: Create Ubuntu 24.04 QEMU runtime

Default log level to info

c1967c6

Problem: If the user creates a flow from the frontend or the CLI, the…

2de50e6

… notify endpoint rejects the VM allocation because a floating point calculation on the community flow percentage. Solution: Apply a floor rule on price calculation according to frontend/CLI price calculation.

Apply rounding in payment monitoring task too

6055ad6

Fix DiagnosticVM, force ipv4 connexion on /internet check (#760)

f3ef334

This fix issue reported by node owner where if they enable a ipv6 and it's not working, it also break the diagnostic /internet check

Update dep in Makefile

63b7a02

Force setup tool version

346325f

Create an intermediate venv with the proper setup tools version

d1a90aa

Add dept in Dockerfile for ubuntu-22.04 and debian too

467098e

Fix: Stdout stdout caused concurrency issues

1a38ca7

Since Python code runs asynchronously in the same process, sharing the global sys.stdout, prints from an individual call cannot be isolated from other calls.

Fix update_allocation error handling code

155c276

Was causing issue when instances failed to start and there was no persistent VM Jira ticket ALEPH-436 (part of ALEPH-436 )

Fix: Used old versionof python-cpuid

a34b37d

Increment aleph-message package version (#774)

9be96a9

Fix: Put the correct aleph-message version on the package generation script.

Fix: Missing type annotation

8043bf0

Adding type annotation for a better code clarity and safety. This has been done on the migration of pydantic branch, but adding it first because it should not break the code and facilitate the merge of the pydantic PR

Fix: Packages used obsolete Kubo

02db781

This bumps their versions to - Kubo 0.23.0 -> 0.33.1 The list of changes regarding Kubo too large to be mentioned here, but I mostly expect performance improvements as the main API has not changed much.

Fix buggy settings aggregate cache implementation

7444c81

Allow /56 subnet for ipv6 pool (#761)

73013a2

Needed for some crn

nesitor and others added 3 commits September 3, 2025 11:37

Fix: Solved error on message status conditional and fixed tests

c79a8cc

Feature: Implement credits payment method monitoring, same as PAYG.

8bf9c51

Fix: Refactor to unify getting execution price method making it compa…

60a4b41

…tible with multiple payment methods.

olethanh force-pushed the ol-aleph-615-executions-resources branch from d76df62 to c515807 Compare September 18, 2025 13:55

Andres D. Molins and others added 13 commits September 25, 2025 19:21

Feature: Support Credits payment method on the notify endpoint.

3f4ed01

Fix: Replaced the cost field for the new one.

8bc622f

Fix: Improved some method docs

9d8aacf

Fix: Require at least 1 hour of credits to start the instance.

38a0303

Fix: Solved issues on 2 failing tests

7b842b2

Feature: Upgrade aleph_message version to 1.0.5

7c09a87

Fix: Put right signature on model_validation method.

9698fae

Fix getting hold and credit balance of user

e7e58ba

Update src/aleph/vm/orchestrator/payment.py

0df5b82

Co-authored-by: nesitor <amolinsdiaz@yahoo.es>

Add test for user balance check api

37e70c5

Handle edge case if server response is invalid

5e4f6fa

Fix: Disable IPFS service from installation and delete from the curre…

fde619c

…nt ones, to avoid getting Out-Of-Memory on the CRNs.

Fix: Removed line on Readme file

105c7f0

nesitor self-assigned this Oct 21, 2025

nesitor requested a review from odesenfans October 21, 2025 10:54

nesitor force-pushed the ol-aleph-615-executions-resources branch from fee8c98 to 3e33743 Compare October 21, 2025 10:57

olethanh and others added 4 commits October 21, 2025 13:34

WIP ALEPH-615 Expose resources for executions

47f1da9

fix formating

4ff0947

Simplify only adding one resources field instead to create another …

a9e0f2c

…endpoint.

Fix: Solved not passing tests and returned value as integer instead o…

23087df

…f decimal.

nesitor force-pushed the ol-aleph-615-executions-resources branch from 6713f9b to 23087df Compare October 21, 2025 11:34

nesitor marked this pull request as ready for review October 21, 2025 14:21

nesitor changed the title ~~WIP ALEPH-615 Expose resources for executions~~ ALEPH-615 Expose resources for executions Oct 21, 2025

odesenfans requested changes Oct 22, 2025

View reviewed changes

aliel force-pushed the main branch from a8aa651 to 7a0bff1 Compare February 18, 2026 09:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ALEPH-615 Expose resources for executions#833

ALEPH-615 Expose resources for executions#833
olethanh wants to merge 1191 commits intomainfrom
ol-aleph-615-executions-resources

olethanh commented Sep 16, 2025 •

edited by nesitor

Loading

Uh oh!

codecov bot commented Sep 18, 2025 •

edited

Loading

Uh oh!

odesenfans Oct 21, 2025

Uh oh!

odesenfans Oct 27, 2025

Uh oh!

nesitor Oct 28, 2025

Uh oh!

odesenfans Oct 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

Conversation

olethanh commented Sep 16, 2025 • edited by nesitor Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

odesenfans Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

odesenfans Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

nesitor Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

odesenfans Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

olethanh commented Sep 16, 2025 •

edited by nesitor

Loading

codecov bot commented Sep 18, 2025 •

edited

Loading