Skip to content
This repository has been archived by the owner on Aug 19, 2024. It is now read-only.

Generate libtorrent typing stubs #40

Closed
qstokkink opened this issue Apr 30, 2024 · 6 comments · Fixed by #73
Closed

Generate libtorrent typing stubs #40

qstokkink opened this issue Apr 30, 2024 · 6 comments · Fixed by #73
Assignees
Labels
enhancement New feature or request

Comments

@qstokkink
Copy link
Owner

With the introduction of type checking in Tribler Experimental, it is becoming painfully obvious that Python does not understand libtorrent's Boost types.

If we continue like this, long-term we will need to (project-wide!) either ignore typing 😞, cast everything 😞, or provide type stubs manually 😞. This would make me a very sad developer.

It would be much more convenient if we had a tool to analyze and convert libtorrent's Boost types (this information is already available). The result should be Python type stub files (.pyi).

I'll start with a proof-of-concept and report back.

@qstokkink qstokkink added the enhancement New feature or request label Apr 30, 2024
@qstokkink qstokkink self-assigned this Apr 30, 2024
@qstokkink
Copy link
Owner Author

Alright. There is a way but it's not pretty. All of the "easy" ways using Python introspection seem to drop the return values of functions.

My POC is for Ubuntu and requires a Python installation (the POC uses 3.8) and castxml (apt install castxml).

Setup

In a fresh working directory, I fetched the following files:

  • The GitHub repo arvidn/libtorrent. I extracted libtorrent/bindings/python/src/ into the working dir ./ and libtorrent/include/ into ./libtorrent/.
  • The GitHub repo boostorg/python. I extracted include/boost/ into ./boost/.
  • The Boost 1.85.0 distribution from the Boost website.

Generation

For each .cpp file I called castxml, for example for create_torrent.cpp:

castxml -cxx-isystem . -cxx-isystem /usr/include/python3.8 -cxx-isystem ./boost_1_85_0 --castxml-gccxml create_torrent.cpp

This creates some huuuuuge xml files (dozens of megabytes each). However, they describe all of the methods in the libtorrent module quite verbosely. For example, I can fetch the constructors with the name create_torrent:

<Constructor id="_125014" name="create_torrent" context="_26737" access="public" location="f456:229" file="f456" line="229" explicit="1">
    <Argument name="fs" type="_125071" location="f456:229" file="f456" line="229"/>
    <Argument name="piece_size" type="_22825" location="f456:229" file="f456" line="229" default="0"/>
    <Argument name="flags" type="_26736" location="f456:230" file="f456" line="230" default="{}"/>
</Constructor>
<Constructor id="_125015" name="create_torrent" context="_26737" access="public" location="f456:231" file="f456" line="231" explicit="1">
    <Argument name="ti" type="_154775" location="f456:231" file="f456" line="231"/>
</Constructor>
<Constructor id="_125016" name="create_torrent" context="_26737" access="public" location="f456:235" file="f456" line="235" explicit="1" inline="1" attributes="deprecated">
    <Argument name="fs" type="_125071" location="f456:235" file="f456" line="235"/>
    <Argument name="piece_size" type="_22825" location="f456:235" file="f456" line="235"/>
    <Argument type="_22825" location="f456:236" file="f456" line="236"/>
    <Argument name="flags" type="_26736" location="f456:236" file="f456" line="236" default="{}"/>
    <Argument type="_22825" location="f456:236" file="f456" line="236" default="-1"/>
</Constructor>
<Constructor id="_125070" name="create_torrent" context="_26737" access="public" location="f456:111" file="f456" line="111" inline="1" artificial="1">
    <Argument type="_197759" location="f456:111" file="f456" line="111"/>
</Constructor>

The reference type for name="fs" above is _125071, corresponding to this entry:

<ReferenceType id="_125071" type="_26898" size="64" align="64"/>

The type of the referencetype id, in turn, is then:

<Class id="_26898" name="file_storage" context="_4766" location="f471:220" file="f471" line="220" members="_125602 _125603 _125604 _125605 _125606 _125607 _125608 _125609 _125610 _125611 _125612 _125613 _125614 _125615 _125616 _125617 _125618 _125619 _125620 _125621 _125622 _125623 _125624 _125625 _125626 _125627 _125628 _125629 _125630 _125631 _125632 _125633 _125634 _125635 _125636 _125637 _125638 _125639 _125640 _125641 _125642 _125643 _125644 _125645 _125646 _125647 _125648 _125649 _125650 _125651 _125652 _125653 _125654 _125655 _125656 _125657 _125658 _125659 _125660 _125661 _125662 _125663 _125664 _125665 _125666 _125667 _125668 _125669 _125670 _125671 _125672 _125673 _125674 _125675 _125676 _125677 _125678 _125679 _125680 _125681 _125682 _125683 _125684 _125685 _125686 _125687 _125688 _125689 _125690 _125691 _125692 _125693 _125694 _125695 _125696 _125697 _125698 _125699 _125700 _125701 _125702 _125703 _125704 _125705 _125706 _125707 _125708 _125709 _125710 _125711 _125712 _125713 _125714 _125715 _125716 _125717 _125718" size="1408" align="64"/>

In other words, the file_storage class.

@qstokkink qstokkink removed their assignment Apr 30, 2024
@qstokkink
Copy link
Owner Author

This seems like a lot of fun but should probably not be given high priority.

@qstokkink
Copy link
Owner Author

Actually, considering the fact that this would be helpful in the mid-short term, this might be a good side-project next to the ruff changes of #38.

@qstokkink qstokkink self-assigned this May 1, 2024
@qstokkink
Copy link
Owner Author

I got pretty far but, ultimately, my POC approach could not deal with converters.cpp. Manually correcting the current approach would (almost) be no different from creating the stubs by hand.

Next POC approach would be to use ANTLR4 to generate an AST of the libtorrent source: https://github.com/antlr/grammars-v4/blob/master/cpp/Python3/transformGrammar.py

@qstokkink qstokkink removed their assignment May 14, 2024
@qstokkink
Copy link
Owner Author

qstokkink commented May 29, 2024

I attempted to use the CPP14 ANTLR4 grammar to parse the source files. My observation is that the grammar does not understand the macros in the source files. This implies that I would first have to run the preprocessor (g++ -E) to get rid of the "non-pure" cpp code before parsing the grammar.

Having to actually run the compiler on the code is something I was trying to avoid. However, if there is no way around this, another option is to wrap, decorate, stub, or mock Boost.Python to output types directly. In an initial attempt, I want to add an __annotations__ field*, next to the existing __doc__ that contains the signature, here: https://github.com/boostorg/python/blob/0474de0f6cc9c6e7230aeb7164af2f7e4ccf74bf/src/object/function.cpp#L690 Similarly to __doc__, we can generate the function signature (mostly) using function_doc_signature.

*Actually, we should be able to set __signature__ to a Signature object, according to https://docs.python.org/3/library/inspect.html#inspect.signature

CPython implementation detail: If the passed object has a __signature__ attribute, we may use it to create the signature. The exact semantics are an implementation detail and are subject to unannounced changes. Consult the source code for current semantics.

@qstokkink
Copy link
Owner Author

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant