-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More optimizations #101
base: master
Are you sure you want to change the base?
More optimizations #101
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3813,22 +3813,27 @@ NdGetEffectiveAddrAndOpMode( | |
static const ND_UINT8 szLut[3] = { ND_SIZE_16BIT, ND_SIZE_32BIT, ND_SIZE_64BIT }; | ||
ND_BOOL w64, f64, d64, has66; | ||
|
||
if ((ND_CODE_64 != Instrux->DefCode) && !!(Instrux->Attributes & ND_FLAG_IWO64)) | ||
// Branchless form of (ND_CODE_64 != Instrux->DefCode) && !!(Instrux->Attributes & ND_FLAG_IWO64) | ||
if (((ND_CODE_64 ^ Instrux->DefCode) * (Instrux->Attributes & ND_FLAG_IWO64)) != 0) | ||
{ | ||
// Some instructions ignore VEX/EVEX.W field outside 64 bit mode, and treat it as 0. | ||
Instrux->Exs.w = 0; | ||
} | ||
|
||
// Extract the flags. | ||
w64 = (0 != Instrux->Exs.w) && !(Instrux->Attributes & ND_FLAG_WIG); | ||
// Branchless form of (0 != Instrux->Exs.w) && !(Instrux->Attributes & ND_FLAG_WIG) | ||
w64 = (0 != Instrux->Exs.w) * !(Instrux->Attributes & ND_FLAG_WIG); | ||
|
||
// In 64 bit mode, the operand is forced to 64 bit. Size-changing prefixes are ignored. | ||
f64 = 0 != (Instrux->Attributes & ND_FLAG_F64) && (ND_VEND_AMD != Instrux->VendMode); | ||
// Branchless form of 0 != (Instrux->Attributes & ND_FLAG_F64) && (ND_VEND_AMD != Instrux->VendMode) | ||
f64 = 0 != ((Instrux->Attributes & ND_FLAG_F64) * (ND_VEND_AMD ^ Instrux->VendMode)); | ||
|
||
// In 64 bit mode, the operand defaults to 64 bit. No 32 bit form of the instruction exists. Note that on AMD, | ||
// only default 64 bit operands exist, even for branches - no operand is forced to 64 bit. | ||
d64 = (0 != (Instrux->Attributes & ND_FLAG_D64)) || | ||
(0 != (Instrux->Attributes & ND_FLAG_F64) && (ND_VEND_AMD == Instrux->VendMode)); | ||
// Branchless form of (0 != (Instrux->Attributes & ND_FLAG_D64)) || | ||
// (0 != (Instrux->Attributes & ND_FLAG_F64) && (ND_VEND_AMD == Instrux->VendMode)); | ||
d64 = (Instrux->Attributes & ND_FLAG_D64) + | ||
((Instrux->Attributes & ND_FLAG_F64) * (ND_VEND_AMD == Instrux->VendMode)); | ||
|
||
// Check if 0x66 is indeed interpreted as a size changing prefix. Note that if 0x66 is a mandatory prefix, | ||
// then it won't be interpreted as a size changing prefix. However, there is an exception: MOVBE and CRC32 | ||
|
@@ -4427,11 +4432,19 @@ NdDecodeWithContext( | |
Instrux->EncMode = ND_ENCM_LEGACY; // Assume legacy encoding by default. | ||
|
||
// Fetch the instruction bytes. | ||
for (opIndex = 0; | ||
opIndex < ((Size < ND_MAX_INSTRUCTION_LENGTH) ? Size : ND_MAX_INSTRUCTION_LENGTH); | ||
opIndex++) | ||
if (Size < ND_MAX_INSTRUCTION_LENGTH) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This just moves the condition from the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is very likely GCC doing something stupid. I'll test clang too and see what happens. |
||
{ | ||
Instrux->InstructionBytes[opIndex] = Code[opIndex]; | ||
for (opIndex = 0; opIndex < Size; opIndex++) | ||
{ | ||
Instrux->InstructionBytes[opIndex] = Code[opIndex]; | ||
} | ||
} | ||
else | ||
{ | ||
for (opIndex = 0; opIndex < ND_MAX_INSTRUCTION_LENGTH; opIndex++) | ||
{ | ||
Instrux->InstructionBytes[opIndex] = Code[opIndex]; | ||
} | ||
} | ||
|
||
if (gPrefixesMap[Instrux->InstructionBytes[0]] != ND_PREF_CODE_NONE) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect the branch predictors in modern CPUs to be quite competent at predicting the branches here, since they operate on
INSTRUX
state that will very often be the same, allowing for a very high prediction rate.I understand that these changes may remove some conditional moves or branches, but is it really worth it? At some point, one has to decide whether the (small) increase in performance is worth the less readable code. I don't think this is the case here. In addition, depending on the architecture, the multiplication may be a new source of overhead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multiplication pipelines easier than branches. I wasn't entirely happy with this commit either but it did make it faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The compiler may even ditch the multiplication altogether, and rely on conditional instructions (such as
CMOVcc
orSETcc
), but this is not the point. The point is that it shouldn't really matter, since modern branch prediction should easily handle these branches with minimal overhead. These statements are not a performance bottleneck.