Skip to content

Commit f690d99

Browse files
htyumemfrob
authored and
memfrob
committed
[CSSPGO] Top-down processing order based on full profile.
Use profiled call edges to augment the top-down order. There are cases that the top-down order computed based on the static call graph doesn't reflect real execution order. For example: 1. Incomplete static call graph due to unknown indirect call targets. Adjusting the order by considering indirect call edges from the profile can enable the inlining of indirect call targets by allowing the caller processed before them. 2. Mutual call edges in an SCC. The static processing order computed for an SCC may not reflect the call contexts in the context-sensitive profile, thus may cause potential inlining to be overlooked. The function order in one SCC is being adjusted to a top-down order based on the profile to favor more inlining. 3. Transitive indirect call edges due to inlining. When a callee function is inlined into into a caller function in LTO prelink, every call edge originated from the callee will be transferred to the caller. If any of the transferred edges is indirect, the original profiled indirect edge, even if considered, would not enforce a top-down order from the caller to the potential indirect call target in LTO postlink since the inlined callee is gone from the static call graph. 4. #3 can happen even for direct call targets, due to functions defined in header files. Header functions, when included into source files, are defined multiple times but only one definition survives due to ODR. Therefore, the LTO prelink inlining done on those dropped definitions can be useless based on a local file scope. More importantly, the inlinee, once fully inlined to a to-be-dropped inliner, will have no profile to consume when its outlined version is compiled. This can lead to a profile-less prelink compilation for the outlined version of the inlinee function which may be called from external modules. while this isn't easy to fix, we rely on the postlink AutoFDO pipeline to optimize the inlinee. Since the survived copy of the inliner (defined in headers) can be inlined in its local scope in prelink, it may not exist in the merged IR in postlink, and we'll need the profiled call edges to enforce a top-down order for the rest of the functions. Considering those cases, a profiled call graph completely independent of the static call graph is constructed based on profile data, where function objects are not even needed to handle case #3 and case 4. I'm seeing an average 0.4% perf win out of SPEC2017. For certain benchmark such as Xalanbmk and GCC, the win is bigger, above 2%. The change is an enhancement to https://reviews.llvm.org/D95988. Reviewed By: wmi, wenlei Differential Revision: https://reviews.llvm.org/D99351
1 parent 51fe854 commit f690d99

File tree

10 files changed

+170
-201
lines changed

10 files changed

+170
-201
lines changed

llvm/include/llvm/Transforms/IPO/ProfiledCallGraph.h

Lines changed: 47 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@
1313
#include "llvm/ADT/StringMap.h"
1414
#include "llvm/ADT/StringRef.h"
1515
#include "llvm/ProfileData/SampleProf.h"
16+
#include "llvm/ProfileData/SampleProfReader.h"
1617
#include "llvm/Transforms/IPO/SampleContextTracker.h"
1718
#include <queue>
1819
#include <set>
@@ -40,40 +41,41 @@ struct ProfiledCallGraphNode {
4041
class ProfiledCallGraph {
4142
public:
4243
using iterator = std::set<ProfiledCallGraphNode *>::iterator;
43-
ProfiledCallGraph(StringMap<FunctionSamples> &ProfileMap,
44-
SampleContextTracker &ContextTracker) {
45-
// Add all profiled functions into profiled call graph.
46-
// We only add function with actual context profile
47-
for (auto &FuncSample : ProfileMap) {
48-
FunctionSamples *FSamples = &FuncSample.second;
49-
addProfiledFunction(FSamples->getName());
44+
45+
// Constructor for non-CS profile.
46+
ProfiledCallGraph(StringMap<FunctionSamples> &ProfileMap) {
47+
assert(!FunctionSamples::ProfileIsCS && "CS profile is not handled here");
48+
for (const auto &Samples : ProfileMap) {
49+
addProfiledCalls(Samples.second);
5050
}
51+
}
5152

52-
// BFS traverse the context profile trie to add call edges for
53-
// both samples calls as well as calls shown in context.
53+
// Constructor for CS profile.
54+
ProfiledCallGraph(SampleContextTracker &ContextTracker) {
55+
// BFS traverse the context profile trie to add call edges for calls shown
56+
// in context.
5457
std::queue<ContextTrieNode *> Queue;
55-
Queue.push(&ContextTracker.getRootContext());
58+
for (auto &Child : ContextTracker.getRootContext().getAllChildContext()) {
59+
ContextTrieNode *Callee = &Child.second;
60+
addProfiledFunction(Callee->getFuncName());
61+
Queue.push(Callee);
62+
}
63+
5664
while (!Queue.empty()) {
5765
ContextTrieNode *Caller = Queue.front();
5866
Queue.pop();
59-
FunctionSamples *CallerSamples = Caller->getFunctionSamples();
60-
61-
// Add calls for context, if both caller and callee has context profile.
67+
// Add calls for context. When AddNodeWithSamplesOnly is true, both caller
68+
// and callee need to have context profile.
69+
// Note that callsite target samples are completely ignored since they can
70+
// conflict with the context edges, which are formed by context
71+
// compression during profile generation, for cyclic SCCs. This may
72+
// further result in an SCC order incompatible with the purely
73+
// context-based one, which may in turn block context-based inlining.
6274
for (auto &Child : Caller->getAllChildContext()) {
6375
ContextTrieNode *Callee = &Child.second;
76+
addProfiledFunction(Callee->getFuncName());
6477
Queue.push(Callee);
65-
if (CallerSamples && Callee->getFunctionSamples()) {
66-
addProfiledCall(Caller->getFuncName(), Callee->getFuncName());
67-
}
68-
}
69-
70-
// Add calls from call site samples
71-
if (CallerSamples) {
72-
for (auto &LocCallSite : CallerSamples->getBodySamples()) {
73-
for (auto &NameCallSite : LocCallSite.second.getCallTargets()) {
74-
addProfiledCall(Caller->getFuncName(), NameCallSite.first());
75-
}
76-
}
78+
addProfiledCall(Caller->getFuncName(), Callee->getFuncName());
7779
}
7880
}
7981
}
@@ -89,6 +91,7 @@ class ProfiledCallGraph {
8991
ProfiledFunctions[Name] = ProfiledCallGraphNode(Name);
9092
}
9193
}
94+
9295
void addProfiledCall(StringRef CallerName, StringRef CalleeName) {
9396
assert(ProfiledFunctions.count(CallerName));
9497
auto CalleeIt = ProfiledFunctions.find(CalleeName);
@@ -98,6 +101,25 @@ class ProfiledCallGraph {
98101
ProfiledFunctions[CallerName].Callees.insert(&CalleeIt->second);
99102
}
100103

104+
void addProfiledCalls(const FunctionSamples &Samples) {
105+
addProfiledFunction(Samples.getFuncName());
106+
107+
for (const auto &Sample : Samples.getBodySamples()) {
108+
for (const auto &Target : Sample.second.getCallTargets()) {
109+
addProfiledFunction(Target.first());
110+
addProfiledCall(Samples.getFuncName(), Target.first());
111+
}
112+
}
113+
114+
for (const auto &CallsiteSamples : Samples.getCallsiteSamples()) {
115+
for (const auto &InlinedSamples : CallsiteSamples.second) {
116+
addProfiledFunction(InlinedSamples.first);
117+
addProfiledCall(Samples.getFuncName(), InlinedSamples.first);
118+
addProfiledCalls(InlinedSamples.second);
119+
}
120+
}
121+
}
122+
101123
private:
102124
ProfiledCallGraphNode Root;
103125
StringMap<ProfiledCallGraphNode> ProfiledFunctions;

llvm/include/llvm/Transforms/IPO/SampleContextTracker.h

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,6 @@
1818
#include "llvm/ADT/SmallVector.h"
1919
#include "llvm/ADT/StringMap.h"
2020
#include "llvm/ADT/StringRef.h"
21-
#include "llvm/Analysis/CallGraph.h"
2221
#include "llvm/IR/DebugInfoMetadata.h"
2322
#include "llvm/IR/Instructions.h"
2423
#include "llvm/ProfileData/SampleProf.h"
@@ -124,7 +123,6 @@ class SampleContextTracker {
124123
ContextTrieNode &getRootContext();
125124
void promoteMergeContextSamplesTree(const Instruction &Inst,
126125
StringRef CalleeName);
127-
void addCallGraphEdges(CallGraph &CG, StringMap<Function *> &SymbolMap);
128126
// Dump the internal context profile trie.
129127
void dump();
130128

llvm/lib/Transforms/IPO/SampleContextTracker.cpp

Lines changed: 0 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -568,26 +568,4 @@ ContextTrieNode &SampleContextTracker::promoteMergeContextSamplesTree(
568568

569569
return *ToNode;
570570
}
571-
572-
// Replace call graph edges with dynamic call edges from the profile.
573-
void SampleContextTracker::addCallGraphEdges(CallGraph &CG,
574-
StringMap<Function *> &SymbolMap) {
575-
// Add profile call edges to the call graph.
576-
std::queue<ContextTrieNode *> NodeQueue;
577-
NodeQueue.push(&RootContext);
578-
while (!NodeQueue.empty()) {
579-
ContextTrieNode *Node = NodeQueue.front();
580-
NodeQueue.pop();
581-
Function *F = SymbolMap.lookup(Node->getFuncName());
582-
for (auto &I : Node->getAllChildContext()) {
583-
ContextTrieNode *ChildNode = &I.second;
584-
NodeQueue.push(ChildNode);
585-
if (F && !F->isDeclaration()) {
586-
Function *Callee = SymbolMap.lookup(ChildNode->getFuncName());
587-
if (Callee && !Callee->isDeclaration())
588-
CG[F]->addCalledFunction(nullptr, CG[Callee]);
589-
}
590-
}
591-
}
592-
}
593571
} // namespace llvm

0 commit comments

Comments
 (0)