replace old patching method by more neat one #2
+25
−30
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi, thanks for this package! Current patching approach uses
rax
register to storehooked_PyEval_EvalFrameEx
address. In order to preserve state,rax
is pushed right before loading it withhooked_PyEval_EvalFrameEx
address. Additionallynop
is placed insidehooked_PyEval_EvalFrameEx
acting as a placeholder for poppingrax
. All instructions from the beginning ofhooked_PyEval_EvalFrameEx
up to the address ofnop
are shifted by 1 down to insertpop rax
as the very first instruction to restore state right after the jump. Instead of that we can skip using registers at all by manipulating stack. Steps are:hooked_PyEval_EvalFrameEx
. Since pushed value is 64 bits, only first 32 bits are correct. Remaining are garbage so far.hooked_PyEval_EvalFrameEx
address (previously garbage).hooked_PyEval_EvalFrameEx
64bit address on top of the stack is popped by cpu and then it jumps to that address.As a result: no state saving/restoration is needed and
nop
approach can be dropped.This is done in such way because you cannot jump to absolute 64bit immediate address on x86_64 and you cannot push 64bit immediate value either.
You can also push 4 words (16bits) that represents function address and preform return. You cannot push 2x32 however. Matter of preference whether you do it like that or as I did but my way uses less memory.