Skip to content

Conversation

@j2kun
Copy link
Collaborator

@j2kun j2kun commented Dec 1, 2025

This PR improves the OpenFHE C++ interpreter by reducing the overhead of the interpreter. The main issues were:

  • Copying ciphertexts and plaintexts a lot:
    • when reading from the environment or storing to the environment
    • when constructing a TypedCppValue or converting from a TypedCppValue to a specific variant
    • whenever handling a tensor op
  • The overhead of having large variants with many options

This change introduces a few things to deal with this:

  • Using shared_ptr to store values (and in TypedCppValue) which avoids copying
  • Using std::move in the context of insert_or_assign
  • Using a set of type-specific DenseMap for each type considered (e.g., llvm::DenseMap<Value, std::shared_ptr<std::vector<int>>> intVectors instead of having a single map that stores all values.
  • Creating a static operationDispatchTable that avoids the overhead of type switching across all ops at visit time (in favor of hashing the TypeID associated with an Operation subclass)
  • Checking if a tensor op's destination tensor is not used after, and does an in-place mutation of the tensor instead of creating a copy. This was a big part of the overhead of the packing loops, which run an ISL loop and insert a single element in the innermost body (creating n^2 or more copies to pack a single plaintext)

This also includes a patch of #2421 in order to get the lenet example, but changes the lenet_main.cpp to take any file as input (so I could pass in truncated lenet openfhe IR for faster iteration).

To do the profiling:

# build in opt mode with debug symbols enabled
# (note for smaller examples I disable openmp to make the report cleaner,
# but the full lenet example takes a long time in this mode)
bazel build \
-c opt --copt=-g --linkopt=-lprofiler \
tests/Examples/openfhe/ckks/lenet:lenet_binary

# Run the binary and generate the profile
CPUPROFILE=prof.out bazel-bin/tests/Examples/openfhe/ckks/lenet/lenet_binary \
bazel-bin/tests/Examples/openfhe/ckks/lenet/lenet_heir_opt.mlir

# Convert the profile dump to a readable format
pprof --list=mlir::heir::openfhe::Interpreter::visit \
./bazel-bin/tests/Examples/openfhe/ckks/lenet/lenet_binary \
prof.out > profile.txt

This will produce a per-line runtime breakdown of the visit operations of the interpreter

Another view is

pprof --text --lines --focus=mlir::heir::openfhe::Interpreter \
./bazel-bin/tests/Examples/openfhe/ckks/lenet/lenet_binary \
prof.out > interpreter_focus.txt

Which gives a table of all runtime data for functions in the Interpreter namespace.

Running the above, this PR produces:

Interpreting function: lenet__generate_crypto_context ...
lenet__generate_crypto_context time: 0 seconds
Interpreting function: lenet__configure_crypto_context ...
Function returned 1 values
lenet__configure_crypto_context time: 18 seconds
Interpreting function: lenet__encrypt__arg0 ...
lenet__encrypt__arg0 time: 1 seconds
Interpreting function: lenet ...
lenet time: 91 seconds
Interpreting function: lenet__decrypt__result0 ...
lenet__decrypt__result0 time: 0 seconds
PROFILE: interrupts/evictions/bytes = 39553/8234/1086088

Profile: https://gist.github.com/j2kun/4ff4c2c9740b7f6cf3f3f5f370fda5b5
line-by-line annotation: https://gist.github.com/j2kun/7110641e3eaec18b3b8e6ec948e63c07

From the above profiles, it's clear that the overhead of dispatching and copying is eliminated, and the only remaining source of overhead I can see is the liveness checking, which is only 10ms.

@j2kun j2kun marked this pull request as draft December 1, 2025 17:11
@j2kun j2kun force-pushed the overhead-investigation branch 6 times, most recently from 3016e59 to 564b32c Compare December 2, 2025 17:29
@j2kun j2kun marked this pull request as ready for review December 2, 2025 17:36
@j2kun j2kun requested a review from asraa December 2, 2025 17:36
Copy link
Collaborator

@asraa asraa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick comment before I start reviewing - I agree liveness checking is only a small portion of the result. But if it happens to scale, I think that we could also annotate the IR with liveness info before we interpret it. (e.g. for each operation, add a list attr for each operand on whether its dead after that operation)

Copy link
Collaborator

@asraa asraa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you!


// Only used for type-agnostic block arguments (func args, iter args, etc.)
void Interpreter::storeTypedValue(Value v, const TypedCppValue& typedVal) {
std::visit(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i know you mentioned type switching on the MLIR value is faster, so is it faster here to do an MLIR type switch to retrieve the right map and then assign it to the variant TypedCppValue?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of constexpr here means that the types are evaluated at compile time, so the only overhead comes from the variant (not the if statement). I'm not sure how this compares to a TypeSwitch...

@j2kun j2kun force-pushed the overhead-investigation branch from 564b32c to 65083e0 Compare December 4, 2025 04:45
@j2kun j2kun added the pull_ready Indicates whether a PR is ready to pull. The copybara worker will import for internal testing label Dec 4, 2025
@copybara-service copybara-service bot merged commit 681aed9 into google:main Dec 4, 2025
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pull_ready Indicates whether a PR is ready to pull. The copybara worker will import for internal testing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants