llvm-project/llvm/lib/Transforms/Scalar
Adam Nemet c520822dbf [JumpThreading] Only write back branch-weight MDs for blocks that originally had PGO info
Currently the pass updates branch weights in the IR if the function has
any PGO info (entry frequency is set).  However we could still have
regions of the CFG that does not have branch weights collected (e.g. a
cold region).  In this case we'd use static estimates.  Since static
estimates for branches are determined independently, they are
inconsistent.  Updating them can "randomly" inflate block frequencies.

I've run into this in a completely cold loop of h264ref from
SPEC.  -Rpass-with-hotness showed the loop to be completely cold during
inlining (before JT) but completely hot during vectorization (after JT).

The new testcase demonstrate the problem.  We check array elements
against 1, 2 and 3 in a loop.  The check against 3 is the loop-exiting
check.  The block names should be self-explanatory.

In this example, jump threading incorrectly updates the weight of the
loop-exiting branch to 0, drastically inflating the frequency of the
loop (in the range of billions).

There is no run-time profile info for edges inside the loop, so branch
probabilities are estimated.  These are the resulting branch and block
frequencies for the loop body:

                check_1 (16)
            (8) /  |
            eq_1   | (8)
                \  |
                check_2 (16)
            (8) /  |
            eq_2   | (8)
                \  |
                check_3 (16)
            (1) /  |
       (loop exit) | (15)
                   |
              (back edge)

First we thread eq_1 -> check_2 to check_3.  Frequencies are updated to
remove the frequency of eq_1 from check_2 and then from the false edge
leaving check_2.  Changed frequencies are highlighted with * *:

                check_1 (16)
            (8) /  |
           eq_1~   | (8)
           /       |
          /     check_2 (*8*)
         /  (8) /  |
         \  eq_2   | (*0*)
          \     \  |
           ` --- check_3 (16)
            (1) /  |
       (loop exit) | (15)
                   |
              (back edge)

Next we thread eq_1 -> check_3 and eq_2 -> check_3 to check_1 as new
back edges.  Frequencies are updated to remove the frequency of eq_1 and
eq_3 from check_3 and then the false edge leaving check_3 (changed
frequencies are highlighted with * *):

                  check_1 (16)
              (8) /  |
             eq_1~   | (8)
             /       |
            /     check_2 (*8*)
           /  (8) /  |
          /-- eq_2~  | (*0*)
  (back edge)        |
                  check_3 (*0*)
            (*0*) /  |
         (loop exit) | (*0*)
                     |
                (back edge)

As a result, the loop exit edge ends up with 0 frequency which in turn makes
the loop header to have maximum frequency.

There are a few potential problems here:

1. The profile data seems odd.  There is a single profile sample of the
loop being entered.  On the other hand, there are no weights inside the
loop.

2. Based on static estimation we shouldn't set edges to "extreme"
values, i.e. extremely likely or unlikely.

3. We shouldn't create profile metadata that is calculated from static
estimation.  I am not sure what policy is but it seems to make sense to
treat profile metadata as something that is known to originate from
profiling.  Estimated probabilities should only be reflected in BPI/BFI.

Any one of these would probably fix the immediate problem.  I went for 3
because I think it's a good policy to have and added a FIXME about 2.

Differential Revision: https://reviews.llvm.org/D24118

llvm-svn: 280713
2016-09-06 16:08:33 +00:00
..
ADCE.cpp [ADCE] Add control dependence computation 2016-08-24 00:10:06 +00:00
AlignmentFromAssumptions.cpp Add some comments linking back to PR28400. 2016-08-08 07:03:49 +00:00
BDCE.cpp [PM] Normalize FIXMEs for missing PreserveCFG to have the same wording. 2016-06-28 00:54:12 +00:00
CMakeLists.txt code hoisting pass based on GVN 2016-07-15 13:45:20 +00:00
ConstantHoisting.cpp This implements a more optimal algorithm for selecting a base constant in 2016-07-14 07:44:20 +00:00
ConstantProp.cpp Don't remove side effecting instructions due to ConstantFoldInstruction 2016-07-22 04:54:44 +00:00
CorrelatedValuePropagation.cpp CVP. Turn marking adds as no wrap (introduced by r278107) off by default 2016-08-18 16:08:35 +00:00
DCE.cpp Consistently use FunctionAnalysisManager 2016-08-09 00:28:15 +00:00
DeadStoreElimination.cpp limit the number of instructions per block examined by dead store elimination 2016-08-26 16:34:27 +00:00
EarlyCSE.cpp [EarlyCSE] Optionally use MemorySSA. NFC. 2016-08-31 19:24:10 +00:00
FlattenCFGPass.cpp Scalar: Remove some implicit ilist iterator conversions, NFC 2015-10-13 18:26:00 +00:00
Float2Int.cpp [PM] Normalize FIXMEs for missing PreserveCFG to have the same wording. 2016-06-28 00:54:12 +00:00
GVN.cpp Refactor replaceDominatedUsesWith to have a flag to control whether to replace uses in BB itself. 2016-09-01 23:26:48 +00:00
GVNHoist.cpp GVN-hoist: invalidate MD cache (PR29144) 2016-08-27 02:48:41 +00:00
GuardWidening.cpp Consistently use FunctionAnalysisManager 2016-08-09 00:28:15 +00:00
IndVarSimplify.cpp Revert -r278269 [IndVarSimplify] Eliminate zext of a signed IV when the IV is known to be non-negative 2016-08-22 13:12:07 +00:00
InductiveRangeCheckElimination.cpp [IRCE] Switch over to LLVM_DUMP_METHOD. NFCI. 2016-08-18 15:55:49 +00:00
JumpThreading.cpp [JumpThreading] Only write back branch-weight MDs for blocks that originally had PGO info 2016-09-06 16:08:33 +00:00
LICM.cpp revert r280429 and r280425: 2016-09-02 01:59:27 +00:00
LLVMBuild.txt Update libdeps in LLVMipo and LLVMScalarOpts, corresponding to r245940. 2015-08-25 17:11:17 +00:00
LoadCombine.cpp [LoadCombine] Simplify code with a brace init. NFC. 2016-08-06 12:11:11 +00:00
LoopDataPrefetch.cpp [PM] Port LoopDataPrefetch to new pass manager 2016-08-13 04:11:27 +00:00
LoopDeletion.cpp Consistently use LoopAnalysisManager 2016-08-09 00:28:52 +00:00
LoopDistribute.cpp [LoopInfo] Add verification by recomputation. 2016-08-31 19:26:19 +00:00
LoopIdiomRecognize.cpp Target independent codesize heuristics for Loop Idiom Recognition 2016-08-11 18:28:33 +00:00
LoopInstSimplify.cpp Consistently use LoopAnalysisManager 2016-08-09 00:28:52 +00:00
LoopInterchange.cpp Use range algorithms instead of unpacking begin/end 2016-08-11 21:15:00 +00:00
LoopLoadElimination.cpp Use the range variant of transform instead of unpacking begin/end 2016-08-12 04:32:42 +00:00
LoopRerollPass.cpp ADT: Give ilist<T>::reverse_iterator a handle to the current node 2016-08-30 00:13:12 +00:00
LoopRotation.cpp Replace "fallthrough" comments with LLVM_FALLTHROUGH 2016-08-17 05:10:15 +00:00
LoopSimplifyCFG.cpp Consistently use LoopAnalysisManager 2016-08-09 00:28:52 +00:00
LoopStrengthReduce.cpp [LoopStrenghtReduce] Refactoring and addition of a new target cost function. 2016-08-17 13:24:19 +00:00
LoopUnrollPass.cpp [LoopUnroll] Use OptimizationRemarkEmitter directly not via the analysis pass 2016-08-26 15:58:34 +00:00
LoopUnswitch.cpp Cleanup : Use metadata preserving API for branch creation 2016-09-03 22:26:11 +00:00
LoopVersioningLICM.cpp Rename LoopAccessAnalysis to LoopAccessLegacyAnalysis /NFC 2016-07-08 20:55:26 +00:00
LowerAtomic.cpp [PM] Remove support for omitting the AnalysisManager argument to new 2016-06-17 00:11:01 +00:00
LowerExpectIntrinsic.cpp [Profile] handle select instruction in 'expect' lowering 2016-09-02 22:03:40 +00:00
LowerGuardIntrinsic.cpp [PM] Port LowerGuardIntrinsic to the new PM. 2016-07-28 22:08:41 +00:00
MemCpyOptimizer.cpp [MemCpy] Add comments for r279769 2016-08-25 21:03:46 +00:00
MergedLoadStoreMotion.cpp Consistently use FunctionAnalysisManager 2016-08-09 00:28:15 +00:00
NaryReassociate.cpp Convert some depth first traversals to depth_first 2016-08-19 22:06:23 +00:00
PartiallyInlineLibCalls.cpp Consistently use FunctionAnalysisManager 2016-08-09 00:28:15 +00:00
PlaceSafepoints.cpp Apply clang-tidy's modernize-loop-convert to most of lib/Transforms. 2016-06-26 12:28:59 +00:00
Reassociate.cpp [Reassociate] Add additional debug output. NFC. 2016-08-30 13:58:35 +00:00
Reg2Mem.cpp Apply clang-tidy's modernize-loop-convert to most of lib/Transforms. 2016-06-26 12:28:59 +00:00
RewriteStatepointsForGC.cpp [statepoints][experimental] Add support for live-in semantics of values in deopt bundles 2016-08-31 15:12:17 +00:00
SCCP.cpp [SCCP] Don't delete side-effecting instructions 2016-08-24 18:10:21 +00:00
SROA.cpp [SROA] Remove incorrect assertion 2016-08-22 18:49:42 +00:00
Scalar.cpp [EarlyCSE] Change C API pass interface for EarlyCSE w/ MemorySSA 2016-09-01 15:07:46 +00:00
Scalarizer.cpp Scalarizer: Support scalarizing intrinsics 2016-07-25 20:02:54 +00:00
SeparateConstOffsetFromGEP.cpp Partially revert 279331, as we modify this instruction in the loop 2016-08-19 22:18:38 +00:00
SimplifyCFGPass.cpp Consistently use FunctionAnalysisManager 2016-08-09 00:28:15 +00:00
Sink.cpp Consistently use FunctionAnalysisManager 2016-08-09 00:28:15 +00:00
SpeculativeExecution.cpp [PM] Port SpeculativeExecution to the new PM 2016-08-01 21:48:33 +00:00
StraightLineStrengthReduce.cpp Convert some depth first traversals to depth_first 2016-08-19 22:06:23 +00:00
StructurizeCFG.cpp Use the range variant of find instead of unpacking begin/end 2016-08-11 22:21:41 +00:00
TailRecursionElimination.cpp Use the range variant of find/find_if instead of unpacking begin/end 2016-08-12 03:55:06 +00:00