Conversation
Update README.md Minor typo fixed
* Garbate collector: Free disk space from inactive VM Add a script to manually list and remove volume linked to inactive VM. It fetches data from the scheduler and pyaleph main's node as to fetch information on the status of the VM. Then display them to the user to determine if they can be removed safely. JIRA ticket ALEPH-37 * Add diagnostic vm to ignore list
* Feature: Added options to get GPU compatibilities from a settings aggregate. * Fix: Refactored to also return the model name from the aggregate and use the same device_id format. * Fix: Include GPU list and move the VM egress IPv6 check on the connectivity check to start notifying the users about the next requirement. * Fix: Solved code quality issues. * Fix: Put definitive settings aggregate address * Fix: Solved issue with type casting and moved the aggregate check. * Check community payment flow (#751) * Implement community payment check WIP * isort * Check community flow at allocation * Community flow : fix after testing * mod Use singleton for the Setting Aggregate * fix test * Implement community wallet start time --------- Co-authored-by: Olivier Le Thanh Duong <olivier@lethanh.be>
the ipv6 egress check was showing ok even while the test was failing and returning result:False
Feature: Create Ubuntu 24.04 QEMU runtime
CRN with bad IPv6 config were having bad metrics, as /status/check/fastapi was too slow which looking further into it , came from /vm/63faf8b5db1cf8d965e6a464a0cb8062af8e7df131729e48738342d956f29ace/internet always taking 5 seconds, even if it ultimately returned positive resurlt. Coincidentally 5 seconds is the timeout configured in check_url inside the diagnostic vm (example_fastapi/main.py) To reproduce issue set the DNS server via environment variable ```env ALEPH_VM_DNS_NAMESERVERS = '[2001:41d0:3:163::1, 9.9.9.9]' ``` that ipv6 server being unreachable and set ip pool to default one ``` ALEPH_VM_IPV6_ADDRESS_POOL = fc00:1:2:3::/64 ``` Solution: Further analysis is that the diagnostic VM tried to connect to the ipv6 DNS server, did timout after 5 seconds then proceeded normally with the ipv4 DNS Put the ipv4 DNS server BEFORE the ipv6 one. Also disable the ipv6 server if ipv6 is not enabled for the VM. That modification was done for instances but failed to be reproduced for programs
* Fix start error in load_persistent_executions
Fix parse error that prevented Supervisor from starting when loading
persistant executions, as it could not parse the gpu field
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/opt/aleph-vm/aleph/vm/orchestrator/__main__.py", line 4, in <module>
main()
File "/opt/aleph-vm/aleph/vm/orchestrator/cli.py", line 379, in main
supervisor.run()
File "/opt/aleph-vm/aleph/vm/orchestrator/supervisor.py", line 184, in run
asyncio.run(pool.load_persistent_executions())
File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/opt/aleph-vm/aleph/vm/pool.py", line 252, in load_persistent_executions
execution.gpus = parse_raw_as(List[HostGPU], saved_execution.gpus)
File "pydantic/tools.py", line 74, in pydantic.tools.parse_raw_as
obj = load_str_bytes(
File "pydantic/parse.py", line 37, in pydantic.parse.load_str_bytes
return json_loads(b)
File "/usr/lib/python3.10/json/__init__.py", line 339, in loads
raise TypeError(f'the JSON object must be str, bytes or bytearray, '
TypeError: the JSON object must be str, bytes or bytearray, not NoneType
* Fix packaging issue caused by sevctl
… notify endpoint rejects the VM allocation because a floating point calculation on the community flow percentage. Solution: Apply a floor rule on price calculation according to frontend/CLI price calculation.
This fix issue reported by node owner where if they enable a ipv6 and it's not working, it also break the diagnostic /internet check
Error in CI https://github.com/aleph-im/aleph-vm/actions/runs/13458482852/job/37788353003 pip3 install --progress-bar off --target ./aleph-vm/opt/aleph-vm/ 'aleph-message==0.6' 'eth-account==0.10' 'sentry-sdk==1.31.0' 'qmp==1.1.0' 'aleph-superfluid~=0.2.1' 'sqlalchemy[asyncio]>=2.0' 'aiosqlite==0.19.0' 'alembic==1.13.1' 'aiohttp_cors==0.7.0' 'pyroute2==0.7.12' 'python-cpuid==0.1.0' 'solathon==1.0.2' 'protobuf==5.28.3' Collecting aleph-message==0.6 Downloading aleph_message-0.6.0-py3-none-any.whl (17 kB) Collecting eth-account==0.10 Downloading eth_account-0.10.0-py3-none-any.whl (109 kB) Collecting sentry-sdk==1.31.0 Downloading sentry_sdk-1.31.0-py2.py3-none-any.whl (224 kB) Collecting qmp==1.1.0 Downloading qmp-1.1.0-py3-none-any.whl (11 kB) Collecting aleph-superfluid~=0.2.1 Downloading aleph_superfluid-0.2.1.tar.gz (20 kB) Preparing metadata (setup.py) ... 25l-� �done 25hCollecting sqlalchemy[asyncio]>=2.0 Downloading SQLAlchemy-2.0.38-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.1 MB) Collecting aiosqlite==0.19.0 Downloading aiosqlite-0.19.0-py3-none-any.whl (15 kB) Collecting alembic==1.13.1 Downloading alembic-1.13.1-py3-none-any.whl (233 kB) Collecting aiohttp_cors==0.7.0 Downloading aiohttp_cors-0.7.0-py3-none-any.whl (27 kB) Collecting pyroute2==0.7.12 Downloading pyroute2-0.7.12-py3-none-any.whl (460 kB) Collecting python-cpuid==0.1.0 Downloading python-cpuid-0.1.0.tar.gz (21 kB) Installing build dependencies ... 25l-� �\� �|� �/� �-� �done 25h Getting requirements to build wheel ... 25l-� �error error: subprocess-exited-with-error × Getting requirements to build wheel did not run successfully. │ exit code: 1 ╰─> [40 lines of output] An error occurred while building the project, please ensure you have the most updated version of setuptools, setuptools_scm and wheel with: pip install -U setuptools setuptools_scm wheel Traceback (most recent call last): File /usr/lib/python3/dist-packages/pip/_vendor/pep517/in_process/_in_process.py, line 363, in <module> main() File /usr/lib/python3/dist-packages/pip/_vendor/pep517/in_process/_in_process.py, line 345, in main json_out['return_val'] = hook(**hook_input['kwargs']) File /usr/lib/python3/dist-packages/pip/_vendor/pep517/in_process/_in_process.py, line 130, in get_requires_for_build_wheel return hook(config_settings) File /usr/lib/python3/dist-packages/setuptools/build_meta.py, line 162, in get_requires_for_build_wheel return self._get_build_requires( File /usr/lib/python3/dist-packages/setuptools/build_meta.py, line 143, in _get_build_requires self.run_setup() File /usr/lib/python3/dist-packages/setuptools/build_meta.py, line 158, in run_setup exec(compile(code, __file__, 'exec'), locals()) File setup.py, line 18, in <module> setup( File /usr/lib/python3/dist-packages/setuptools/__init__.py, line 153, in setup return distutils.core.setup(**attrs) File /usr/lib/python3/dist-packages/setuptools/_distutils/core.py, line 109, in setup _setup_distribution = dist = klass(attrs) File /usr/lib/python3/dist-packages/setuptools/dist.py, line 459, in __init__ _Distribution.__init__( File /usr/lib/python3/dist-packages/setuptools/_distutils/dist.py, line 293, in __init__ self.finalize_options() File /usr/lib/python3/dist-packages/setuptools/dist.py, line 836, in finalize_options for ep in sorted(loaded, key=by_order): File /usr/lib/python3/dist-packages/setuptools/dist.py, line 835, in <lambda> loaded = map(lambda e: e.load(), filtered) File /usr/lib/python3/dist-packages/pkg_resources/__init__.py, line 2464, in load self.require(*args, **kwargs) File /usr/lib/python3/dist-packages/pkg_resources/__init__.py, line 2487, in require items = working_set.resolve(reqs, env, installer, extras=self.extras) File /usr/lib/python3/dist-packages/pkg_resources/__init__.py, line 782, in resolve raise VersionConflict(dist, req).with_context(dependent_req) pkg_resources.VersionConflict: (setuptools 59.6.0 (/usr/lib/python3/dist-packages), Requirement.parse('setuptools>=61')) [end of output]
Since Python code runs asynchronously in the same process, sharing the global sys.stdout, prints from an individual call cannot be isolated from other calls.
Was causing issue when instances failed to start and there was no persistent VM Jira ticket ALEPH-436 (part of ALEPH-436 )
Wants=
Configures (weak) requirement dependencies on other units.
Requires=
Similar to Wants=, but declares a stronger requirement dependency.
If this unit gets activated, the units listed will be activated as well. If one of the other units fails to activate, and an ordering dependency After= on the failing unit is set, this unit will not be started. Besides, with or without specifying After=, this unit will be stopped (or restarted) if one of the other units is explicitly stopped (or restarted).
Documentation: https://www.freedesktop.org/software/systemd/man/latest/systemd.unit.html
Server api1.aleph.im is very slow and outdated (Core i7 from 2018, up to 40 seconds to respond to `/metrics` in the monitoring). We suspect that this causes issues in the monitoring and performance of the network. This branch removes all references to api1 and replaces them with api3 where relevant. Co-authored-by: Bram <cortex@worlddomination.be> Co-authored-by: Olivier Le Thanh Duong <olivier@lethanh.be>
Bumps [sentry-sdk](https://github.com/getsentry/sentry-python) from 1.31 to 2.8.0. - [Release notes](https://github.com/getsentry/sentry-python/releases) - [Changelog](https://github.com/getsentry/sentry-python/blob/master/CHANGELOG.md) - [Commits](getsentry/sentry-python@1.31.0...2.8.0) --- updated-dependencies: - dependency-name: sentry-sdk dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>
Fix: Put the correct aleph-message version on the package generation script.
Adding type annotation for a better code clarity and safety. This has been done on the migration of pydantic branch, but adding it first because it should not break the code and facilitate the merge of the pydantic PR
This bumps their versions to - Kubo 0.23.0 -> 0.33.1 The list of changes regarding Kubo too large to be mentioned here, but I mostly expect performance improvements as the main API has not changed much.
Due to an error reading the denylist. The error was: ``` Error: constructing the node (see log for full detail): error walking /home/ipfs/.config/ipfs/denylists: lstat /home/ipfs/.config/ipfs/denylists: permission denied ``` The [documentation](https://specs.ipfs.tech/compact-denylist-format/) mentions that: > Implementations SHOULD look in /etc/ipfs/denylists/ and > $XDG_CONFIG_HOME/ipfs/denylists/ (default: ~/.config/ipfs/denylists) > for denylist files. I am not sure why this only failed on Ubuntu 22.04 and not Debian 12 or Ubuntu 24.04. My first assumption would be a difference in Systemd.
Needed for some crn
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #833 +/- ##
==========================================
+ Coverage 64.68% 64.76% +0.08%
==========================================
Files 88 88
Lines 8160 8169 +9
Branches 734 737 +3
==========================================
+ Hits 5278 5291 +13
+ Misses 2653 2647 -6
- Partials 229 231 +2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
d76df62 to
c515807
Compare
Co-authored-by: nesitor <amolinsdiaz@yahoo.es>
…nt ones, to avoid getting Out-Of-Memory on the CRNs.
fee8c98 to
3e33743
Compare
6713f9b to
23087df
Compare
| if getattr(volume, "size_mib", None): | ||
| disk_size_mib += volume.size_mib | ||
|
|
||
| return disk_size_mib |
There was a problem hiding this comment.
Why do we need this if it's coming from the message itself? I don't like that the node can report whatever it wants.
There was a problem hiding this comment.
Because on the message we don't have the real sizes used on the VMs, like on volumes and on runtime and code. The idea of this PR is to show the real resources used by the VMs. Indeed the PR need tobe finished solving the TODO tasks, I have put on review just to check the format.
There was a problem hiding this comment.
From our conversation earlier, the goal of this PR is unclear. Let's not merge it until its purpose is clarified.
Add a new
resourcesfield on existing endpoint/v2/about/executions/resourcesto expose resources for instancesRelated ClickUp, GitHub or Jira tickets : ALEPH-615
Opening this PR so we can discuss what info we need and the format.
Examples of what this endpoint will return, this is basically the same endpoint with added
resourcefield.