[GR-67626] Optimize host inlining of attribute lookup in GraalPy #524

graalvmbot · 2025-07-25T13:22:12Z

No description provided.

* GetAttributeNode was just an extra unnecessary wrapper around GetFixedAttributeNode.

* Seems host inlining was somehow already removing those as the node cost does not change before and after, when looking at e.g. com.oracle.graal.python.nodes.object.GetClassNodeGen.execute com.oracle.graal.python.nodes.object.GetClassNodeGen$Inlined.execute com.oracle.graal.python.builtins.objects.object.ObjectBuiltinsFactory$ClassNodeFactory$ClassNodeGen.execute

* The generated GetClassNodeGen.fallbackGuard_ had a node cost of 316. * Much better for host inlining: com.oracle.graal.python.nodes.object.GetClassNodeGen#execute 688 -> 556 com.oracle.graal.python.nodes.object.GetClassNodeGen.Inlined#execute 1366 -> 1230 com.oracle.graal.python.builtins.objects.object.ObjectBuiltinsFactory.ClassNodeFactory.ClassNodeGen#execute 579 -> 485 Truffle host inlining completed after 2 rounds. Graph cost changed from 439 to 688 after inlining: Root[com.oracle.graal.python.nodes.object.GetClassNodeGen.execute] INLINE com.oracle.graal.python.nodes.object.GetClassNodeGen$GetPythonObjectClassNodeGen.executeImpl(Node, PythonAbstractObject) cost 213, subTreeCost 201, treeInvokes 2, reason within budget] CUTOFF com.oracle.graal.python.nodes.object.GetClassNode$GetPythonObjectClassNode.getPythonObject(Node, PythonObject, HiddenAttr$ReadNode) cost -, subTreeCost -, treeInvokes -, reason method annotated with @InliningCutoff] CUTOFF com.oracle.graal.python.nodes.object.GetClassNode$GetPythonObjectClassNode.getNativeObject(...) cost -, subTreeCost -, treeInvokes -, reason method annotated with @InliningCutoff] INLINE com.oracle.graal.python.nodes.object.GetClassNodeGen.fallbackGuard_(int, Node, Object) cost 316, subTreeCost 316, treeInvokes 0, reason within budget] CUTOFF com.oracle.graal.python.nodes.object.GetClassNode.getForeign(Object, GetRegisteredClassNode) cost -, subTreeCost -, treeInvokes -, reason method annotated with @InliningCutoff] -> Truffle host inlining completed after 2 rounds. Graph cost changed from 426 to 556 after inlining: Root[com.oracle.graal.python.nodes.object.GetClassNodeGen.execute] INLINE com.oracle.graal.python.nodes.object.GetClassNodeGen$GetPythonObjectClassNodeGen.executeImpl(Node, PythonAbstractObject) cost 213, subTreeCost 201, treeInvokes 2, reason within budget] CUTOFF com.oracle.graal.python.nodes.object.GetClassNode$GetPythonObjectClassNode.getPythonObject(Node, PythonObject, HiddenAttr$ReadNode) cost -, subTreeCost -, treeInvokes -, reason method annotated with @InliningCutoff] CUTOFF com.oracle.graal.python.nodes.object.GetClassNode$GetPythonObjectClassNode.getNativeObject(...) cost -, subTreeCost -, treeInvokes -, reason method annotated with @InliningCutoff] CUTOFF com.oracle.graal.python.nodes.object.GetClassNode.getForeign(Object, GetRegisteredClassNode) cost -, subTreeCost -, treeInvokes -, reason method annotated with @InliningCutoff] * Using `@Cached IsForeignObjectNode isForeignObjectNode` + `isForeignObjectNode.execute(inliningTarget, object)` is a lot worse: com.oracle.graal.python.nodes.object.GetClassNodeGen#execute 688 -> 698 com.oracle.graal.python.nodes.object.GetClassNodeGen.Inlined#execute 1366 -> 1637 com.oracle.graal.python.builtins.objects.object.ObjectBuiltinsFactory.ClassNodeFactory.ClassNodeGen#execute 579 -> 622 Truffle host inlining completed after 2 rounds. Graph cost changed from 461 to 698 after inlining: Root[com.oracle.graal.python.nodes.object.GetClassNodeGen.execute] INLINE com.oracle.graal.python.nodes.object.GetClassNodeGen$GetPythonObjectClassNodeGen.executeImpl(Node, PythonAbstractObject) cost 213, subTreeCost 201, treeInvokes 2, reason within budget] CUTOFF com.oracle.graal.python.nodes.object.GetClassNode$GetPythonObjectClassNode.getPythonObject(Node, PythonObject, HiddenAttr$ReadNode) cost -, subTreeCost -, treeInvokes -, reason method annotated with @InliningCutoff] CUTOFF com.oracle.graal.python.nodes.object.GetClassNode$GetPythonObjectClassNode.getNativeObject(...) cost -, subTreeCost -, treeInvokes -, reason method annotated with @InliningCutoff] INLINE com.oracle.graal.python.nodes.object.IsForeignObjectNodeGen$Inlined.execute(Node, Object) cost 253, subTreeCost 223, treeInvokes 0, reason within budget] INLINE com.oracle.graal.python.nodes.object.IsForeignObjectNodeGen$Inlined.fallbackGuard_(int, Node, Object) cost 140, subTreeCost -, treeInvokes -, reason null] CUTOFF com.oracle.graal.python.nodes.object.GetClassNode.getForeign(Node, Object, IsForeignObjectNode, GetRegisteredClassNode) cost -, subTreeCost -, treeInvokes -, reason method annotated with @InliningCutoff]

* It makes no difference for graalpy --experimental-options --engine.Compilation=false graalpython/com.oracle.graal.python.benchmarks/python/harness.py graalpython/com.oracle.graal.python.benchmarks/python/micro/attribute-access.py --live-results -i 10 50

… to TypeBuiltins.GetattributeNode * Faster on graalpy --experimental-options --engine.Compilation=false graalpython/com.oracle.graal.python.benchmarks/python/harness.py graalpython/com.oracle.graal.python.benchmarks/python/micro/attribute-access.py --live-results -i 10 50 AVG (all runs): 1.237s -> 1.218s

…avoid extra Object[] allocations in interpreter

* CastToTruffleStringChecked1Node is little overhead.

* Much better for host inlining as CastToTruffleStringChecked*Node has a 1 @InliningCutoff but CastToTruffleStringNode has 3 @InliningCutoff.

…ingCutoff * And use it in LookupAttributeInMRONode.Dynamic. * Much better for host inlining, TruffleString.EqualNode is huge for host inlining.

…ributeNode

…t-inline it * GetPythonObjectClassNode needs 3 references, so it might be best to not DSL-inline to save footprint. OTOH, this case is frequent and not DSL-inlining it also means not host inlining it due to GR-66471. So we DSL-inline it in order to host-inline it. * Actually the DSL is smart and creates a Specialization data class here, so there is no footprint overhead and this is win-win.

* It generates more than 300 method calls with host inlining and blows up the budget. * The common case by far is to not call this node at all, as the attribute is typically available directly on the module, and there is no need to call __getattr__.

* These methods all lead to unwind, better to cutoff early and spend the host inlining budget on the non-exception cases. * Host inlining already CUTOFF some method because of "leads to unwind" but it does not realize all of this logic leads to unwind.

* Some usages used the uncached DynamicObjectLibrary which has a @TruffleBoundary on getShapeFlags(). Actually it seems to not have the boundary due to having a copy in the code, but that's brittle and wouldn't work anymore with a @GenerateUncached node instead. * The library is used on various classes which might have different shapes, so the caching on the Shape seems not very useful.

…ject#pythonClass * This simplifies GetClassNode a lot, and it still folds in single-context mode. * In multi-context mode it would basically never fold as the object wouldn't be constant. * This saves 2 reads (getShape().getFlags() & CLASS_CHANGED_FLAG) in multi-context mode. * This saves a DynamicObject read (1-2 reads: getShape() + Location#get potentially in extension array) when the class has been changed. * Use Shape#getDynamicType() instead of a constant property and flags: just a field read from the Shape instead of a property map lookup + Location#get. Also easier to keep in sync as everything is in a single field. * Do not make the cached reference to the class weak in GetClassNode, the Shape through the constant property already referenced it strongly and the Shape still does with dynamicType. * Much better for host inlining of GetClassNode and notably 2 @InliningCutoff vs 3 before: com.oracle.graal.python.nodes.object.GetClassNodeGen#execute 598 -> 482 com.oracle.graal.python.nodes.object.GetClassNodeGen.Inlined#execute 2151 -> 2035 com.oracle.graal.python.builtins.objects.object.ObjectBuiltinsFactory.ClassNodeFactory.ClassNodeGen#execute 657 -> 554 Truffle host inlining completed after 2 rounds. Graph cost changed from 426 to 598 after inlining: Root[com.oracle.graal.python.nodes.object.GetClassNodeGen.execute] INLINE com.oracle.graal.python.nodes.object.GetClassNodeGen$GetPythonObjectClassNodeGen$Inlined.executeImpl(Node, PythonAbstractObject) cost 376, subTreeCost 243, treeInvokes 2, reason within budget] CUTOFF com.oracle.graal.python.nodes.object.GetClassNode$GetPythonObjectClassNode.doReadHiddenAttrNode(Node, PythonObject, HiddenAttr$ReadNode) cost -, subTreeCost -, treeInvokes -, reason method annotated with @InliningCutoff] CUTOFF com.oracle.graal.python.nodes.object.GetClassNode$GetPythonObjectClassNode.doNativeObject(...) cost -, subTreeCost -, treeInvokes -, reason method annotated with @InliningCutoff] CUTOFF com.oracle.graal.python.nodes.object.GetClassNode.getForeign(Object, GetRegisteredClassNode) cost -, subTreeCost -, treeInvokes -, reason method annotated with @InliningCutoff] -> Truffle host inlining completed after 2 rounds. Graph cost changed from 363 to 482 after inlining: Root[com.oracle.graal.python.nodes.object.GetClassNodeGen.execute] INLINE com.oracle.graal.python.nodes.object.GetClassNodeGen$GetPythonObjectClassNodeGen$Inlined.executeImpl(Node, PythonAbstractObject) cost 319, subTreeCost 186, treeInvokes 1, reason within budget] CUTOFF com.oracle.graal.python.nodes.object.GetClassNode$GetPythonObjectClassNode.doNativeObject(...) cost -, subTreeCost -, treeInvokes -, reason method annotated with @InliningCutoff] CUTOFF com.oracle.graal.python.nodes.object.GetClassNode.getForeign(Object, GetRegisteredClassNode) cost -, subTreeCost -, treeInvokes -, reason method annotated with @InliningCutoff]

…cType() * DynamicObjectLibrary#setDynamicType() does not allow null, and also has a non-null default (ObjectType.DEFAULT).

…ngle @InliningCutoff

…nMRONode in a different node * Better for host inlining of GetFixedAttributeNodeGen.execute: Before: Root[com.oracle.graal.python.nodes.attributes.GetFixedAttributeNodeGen.execute] INLINE com.oracle.graal.python.nodes.attributes.GetFixedAttributeNode.doIt(...) cost 59, subTreeCost 6333, treeInvokes 42, reason within budget] After: Root[com.oracle.graal.python.nodes.attributes.GetFixedAttributeNodeGen.execute] INLINE com.oracle.graal.python.nodes.attributes.GetFixedAttributeNode.doIt(...) cost 59, subTreeCost 4355, treeInvokes 24, reason within budget]

* Avoiding a LookupAttributeInMRONode.Dynamic is not worth this duplication.

* Instead of having to do this optimization in callers of GetDictIfExistsNode.

…ObjectNode * So the correct semantics and the native type optimization is always used, reviewing usages revealed several incorrect usages. * Subclasses make host inlining unable to see through any `@Cached ReadAttributeFromObjectNode`. * Add ReadAttributeFromModuleNode for the simple case of reading from a PythonModule. * Remove extra specialization for PythonModule in ReadAttributeFromObjectNode: * The specialization below readObjectAttribute() does exactly the same optimization, so it is redundant. * Also checked by running micro/function-call-sized.py on jvm-ce-libgraal with and without --engine.Compilation=false that the performance is the same. * Add execute() overloads for ReadAttributeFromObjectNode, more efficent and also clearer which kind of input is passed.

* Using the correct subclass is prone to errors. * Subclasses prevent host inlining.

eregon added 22 commits July 25, 2025 14:01

Replace GetAttributeNode by GetFixedAttributeNode

41828cc

* GetAttributeNode was just an extra unnecessary wrapper around GetFixedAttributeNode.

Add env files to conveniently dump host inlining decisions

1ee811c

Use CompilerAsserts.neverPartOfCompilation() instead of comment

934b69f

Replace CastToTruffleStringCheckedNode by arity-specific variants to …

cd4adc4

…avoid extra Object[] allocations in interpreter

Fix docs

2c45520

Remove duplicate specialization in ModuleBuiltins.ModuleGetattributeNode

6af4f17

* CastToTruffleStringChecked1Node is little overhead.

Use CastToTruffleStringChecked*Node instead of CastToTruffleStringNode

cb57430

* Much better for host inlining as CastToTruffleStringChecked*Node has a 1 @InliningCutoff but CastToTruffleStringNode has 3 @InliningCutoff.

Remove redundant GetFixedAttributeNode#executeObject() method

aff2ed9

Add StringUtils.EqualNode to wrap TruffleString.EqualNode with @Inlin…

b70a37a

…ingCutoff * And use it in LookupAttributeInMRONode.Dynamic. * Much better for host inlining, TruffleString.EqualNode is huge for host inlining.

Align ThreadLocalBuiltins.GetAttributeNode with ObjectBuiltins.GetAtt…

ad5150e

…ributeNode

Add MergedObjectTypeModuleGetAttributeNode

f985f1b

Rename GetPythonObjectClassNode for clarity in host inlining decisions

9eab536

Add @InliningCutoff in PRaiseNode

91fe7dc

* These methods all lead to unwind, better to cutoff early and spend the host inlining budget on the non-exception cases. * Host inlining already CUTOFF some method because of "leads to unwind" but it does not realize all of this logic leads to unwind.

GetPythonObjectClassNode#doInitialClass should replace doConstantClass

381032a

oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Jul 25, 2025

eregon added 7 commits July 25, 2025 17:50

Use PNone.NO_VALUE instead of null for DynamicObjectLibrary#setDynami…

6f6cd13

…cType() * DynamicObjectLibrary#setDynamicType() does not allow null, and also has a non-null default (ObjectType.DEFAULT).

Merged generic cases in LookupAttributeInMRONode.Dynamic to have a si…

6ffc113

…ngle @InliningCutoff

Merge the first two specializations of LookupAttributeInMRONode

f8d5de0

Remove duplicated specialization in PyObjectGetMethod

1508e7c

* Avoiding a LookupAttributeInMRONode.Dynamic is not worth this duplication.

Use @NonIdempotent instead of passing an extra dynamic argument

7afc211

Remove duplicate specializations in GenericSetAttrNode and callers

d4f6fb8

eregon added 5 commits July 25, 2025 17:50

Optimize IsTypeNode by avoiding native call for native objects

f9d738f

Optimize GetDictIfExistsNode for native types

c85a6a9

* Instead of having to do this optimization in callers of GetDictIfExistsNode.

Avoid subclasses in WriteAttributeToObjectNode

e9e79b3

* Using the correct subclass is prone to errors. * Subclasses prevent host inlining.

Document StgDictObject

8b10695

graalvmbot force-pushed the bd/host-inlining-getattr branch from 7671234 to 8b10695 Compare July 25, 2025 16:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[GR-67626] Optimize host inlining of attribute lookup in GraalPy #524

[GR-67626] Optimize host inlining of attribute lookup in GraalPy #524

Uh oh!

graalvmbot commented Jul 25, 2025

Uh oh!

Uh oh!

[GR-67626] Optimize host inlining of attribute lookup in GraalPy #524

Are you sure you want to change the base?

[GR-67626] Optimize host inlining of attribute lookup in GraalPy #524

Uh oh!

Conversation

graalvmbot commented Jul 25, 2025

Uh oh!

Uh oh!