Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FEX: Implements new sampling based stats #4291
base: main
Are you sure you want to change the base?
FEX: Implements new sampling based stats #4291
Changes from all commits
5845474
a1b800a
6c66636
f6d9d3d
13ebda4
3bdc69d
46dca8a
832f9c2
a9b28f9
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you adding a new set of files here instead of keeping it to a single set of files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is frontend code that is shared between FEXInterpreter, wow64, and arm64ec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this frontend code but the existing Profiler infrastructure in FEXCore not, despite also being Linux specific?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess my problem is that the 6 files (if I'm counting correctly) are placed in seemingly random locations: The existing Linux-specific code probably shouldn't be in FEXCore since the Windows-specific equivalent isn't there either.
(Not sure if moving it makes more sense than moving the Windows code to FEXCore.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That code should also get moved to the frontend at some point as long as the
TraceObject
handlers can still get called from FEXCore.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's some heavy logic in here, so this should explain what it's doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you preferring a custom-implemented linked list over flat memory? This seems neither more convenient for the implementation nor like it would improve performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sparsity in the allocation buffer is a concern, so this is how I'm keeping the implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you elaborate what exactly the problem is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reader side would need to walk the entire allocated buffer space to ensure it doesn't miss any slots, instead of the average case of a hundred or so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What scenario exactly would trigger this? I don't know what your reader side looks like, but it sounds like a bitmask and/or active slot count would easily resolve this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just going to leave it as a linked list. We can rev the version number and change it to a bitmask in the future if someone is feeling up to writing that code in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the choice between linked lists and flat memory doesn't seem to be so important, why is the focus on weakly-ordered atomics considered critical?
(I don't agree using a custom data structure here is the way forward since it makes up a good chunk of the complexity in this PR, but on top of that the performance targets don't seem consistent here as-is.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the atomics I don't want the writer side to be affected by cacheline sharing problems at all. Semantically these are two independent things. The containing data which is
FEXCore::Profiler::ThreadStats
and the way to walk the list of data.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand they are different things, but it seems arbitrary to want this (with no explicit motivation) while using a data structure with poor performance characteristics in the same module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, which is why we leave some of the "poor performance" problems on the reader side which has something like 500ms to sample the data, even though this is going to be hundreds of nanoseconds regardless.