forked from OSchip/llvm-project
'OK_NonUniformConstValue' to identify operands which are constants but not constant splats. The cost model now allows returning 'OK_NonUniformConstValue' for non splat operands that are instances of ConstantVector or ConstantDataVector. With this change, targets are now able to compute different costs for instructions with non-uniform constant operands. For example, On X86 the cost of a vector shift may vary depending on whether the second operand is a uniform or non-uniform constant. This patch applies the following changes: - The cost model computation now takes into account non-uniform constants; - The cost of vector shift instructions has been improved in X86TargetTransformInfo analysis pass; - BBVectorize, SLPVectorizer and LoopVectorize now know how to distinguish between non-uniform and uniform constant operands. Added a new test to verify that the output of opt '-cost-model -analyze' is valid in the following configurations: SSE2, SSE4.1, AVX, AVX2. llvm-svn: 201272 |
||
|---|---|---|
| .. | ||
| IPA | ||
| AliasAnalysis.cpp | ||
| AliasAnalysisCounter.cpp | ||
| AliasAnalysisEvaluator.cpp | ||
| AliasDebugger.cpp | ||
| AliasSetTracker.cpp | ||
| Analysis.cpp | ||
| BasicAliasAnalysis.cpp | ||
| BlockFrequencyInfo.cpp | ||
| BranchProbabilityInfo.cpp | ||
| CFG.cpp | ||
| CFGPrinter.cpp | ||
| CMakeLists.txt | ||
| CaptureTracking.cpp | ||
| CodeMetrics.cpp | ||
| ConstantFolding.cpp | ||
| CostModel.cpp | ||
| Delinearization.cpp | ||
| DependenceAnalysis.cpp | ||
| DomPrinter.cpp | ||
| DominanceFrontier.cpp | ||
| IVUsers.cpp | ||
| InstCount.cpp | ||
| InstructionSimplify.cpp | ||
| Interval.cpp | ||
| IntervalPartition.cpp | ||
| LLVMBuild.txt | ||
| LazyCallGraph.cpp | ||
| LazyValueInfo.cpp | ||
| LibCallAliasAnalysis.cpp | ||
| LibCallSemantics.cpp | ||
| Lint.cpp | ||
| Loads.cpp | ||
| LoopInfo.cpp | ||
| LoopPass.cpp | ||
| Makefile | ||
| MemDepPrinter.cpp | ||
| MemoryBuiltins.cpp | ||
| MemoryDependenceAnalysis.cpp | ||
| ModuleDebugInfoPrinter.cpp | ||
| NoAliasAnalysis.cpp | ||
| PHITransAddr.cpp | ||
| PostDominators.cpp | ||
| PtrUseVisitor.cpp | ||
| README.txt | ||
| RegionInfo.cpp | ||
| RegionPass.cpp | ||
| RegionPrinter.cpp | ||
| ScalarEvolution.cpp | ||
| ScalarEvolutionAliasAnalysis.cpp | ||
| ScalarEvolutionExpander.cpp | ||
| ScalarEvolutionNormalization.cpp | ||
| SparsePropagation.cpp | ||
| TargetTransformInfo.cpp | ||
| Trace.cpp | ||
| TypeBasedAliasAnalysis.cpp | ||
| ValueTracking.cpp | ||
README.txt
Analysis Opportunities:
//===---------------------------------------------------------------------===//
In test/Transforms/LoopStrengthReduce/quadradic-exit-value.ll, the
ScalarEvolution expression for %r is this:
{1,+,3,+,2}<loop>
Outside the loop, this could be evaluated simply as (%n * %n), however
ScalarEvolution currently evaluates it as
(-2 + (2 * (trunc i65 (((zext i64 (-2 + %n) to i65) * (zext i64 (-1 + %n) to i65)) /u 2) to i64)) + (3 * %n))
In addition to being much more complicated, it involves i65 arithmetic,
which is very inefficient when expanded into code.
//===---------------------------------------------------------------------===//
In formatValue in test/CodeGen/X86/lsr-delayed-fold.ll,
ScalarEvolution is forming this expression:
((trunc i64 (-1 * %arg5) to i32) + (trunc i64 %arg5 to i32) + (-1 * (trunc i64 undef to i32)))
This could be folded to
(-1 * (trunc i64 undef to i32))
//===---------------------------------------------------------------------===//