OpenFHE interpreter overhead reduction (lenet example) #2445
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR improves the OpenFHE C++ interpreter by reducing the overhead of the interpreter. The main issues were:
This change introduces a few things to deal with this:
shared_ptrto store values (and in TypedCppValue) which avoids copyinginsert_or_assignDenseMapfor each type considered (e.g.,llvm::DenseMap<Value, std::shared_ptr<std::vector<int>>> intVectorsinstead of having a single map that stores all values.operationDispatchTablethat avoids the overhead of type switching across all ops at visit time (in favor of hashing the TypeID associated with an Operation subclass)This also includes a patch of #2421 in order to get the
lenetexample, but changes thelenet_main.cppto take any file as input (so I could pass in truncated lenet openfhe IR for faster iteration).To do the profiling:
This will produce a per-line runtime breakdown of the visit operations of the interpreter
Another view is
pprof --text --lines --focus=mlir::heir::openfhe::Interpreter \ ./bazel-bin/tests/Examples/openfhe/ckks/lenet/lenet_binary \ prof.out > interpreter_focus.txtWhich gives a table of all runtime data for functions in the Interpreter namespace.
Running the above, this PR produces:
Profile: https://gist.github.com/j2kun/4ff4c2c9740b7f6cf3f3f5f370fda5b5
line-by-line annotation: https://gist.github.com/j2kun/7110641e3eaec18b3b8e6ec948e63c07
From the above profiles, it's clear that the overhead of dispatching and copying is eliminated, and the only remaining source of overhead I can see is the liveness checking, which is only 10ms.